CES 2020: Cartica AI’s Simpler Solution for Autonomous Vision
The Tel Aviv-based startup is promoting its novel object-recognition software as a way to reduce processing power and cost for autonomous vehicles.
The wide rollout of autonomous vehicles (AVs) will require advanced artificial intelligence (AI) and significant computing horsepower. The efficiency and effectiveness of the software behind safe AV operation will play a large role in AV cost and production timelines. Tel Aviv-based startup Cartica AI is bringing a novel and leaner object recognition software to the automotive space that could significantly reduce the cost, required computing power and electrical loads for advanced driver-assist systems (ADAS), hastening adoption of safety and autonomous features.
An automotive spinoff of advanced AI developer Cortica, Cartica originally leveraged approximately 40 employees and 250 patents from its parent company to develop a successful proof-of-concept tailored to AVs. The start-up displayed its technology at this year’s Consumer Electronics Show (CES) in Las Vegas, and we spoke with several Cartica principals – including new board-of-directors member Karl-Thomas Neumann – to discuss Cartica’s unique alternative to AI deep learning.
An electronics engineer, Neumann already has an impressive automotive resume, including previous CEO positions with Opel, Volkswagen’s China Group and Continental Automotive Systems. Working as both a board member and advisor, Neumann noted his goal is to serve as an industry insider to help the startup secure a Tier-1 integrator for its software. Neumann is also investing in the startup, joining other concerns with a financial stake in Cartica AI that include BMW iVentures, Continental and Toyota AI Ventures.
AVs, AI and deep learning
According to Neumann, Cartica is entering a space already dominated by MobileEye, another Israel-based supplier that has developed an AV vision system powered by deep learning. “If you want to be five star or whatever star NCAP is driving us to, this technology is pretty expensive because it uses a lot of computing power, and 70% of them are based on Mobile Eye technology,” Neuman explained. “Which means they need to have a special Mobile Eye chip, and special, very expensive Mobile Eye software. The world is waiting for some real alternatives to this, and this is what we want to build Cartica to be.”
Deep learning is a form of machine learning that involves teaching an algorithm by presenting it large amounts of data, with its intelligence improved as the data set is increased. An example would be training an algorithm to recognize a stop sign, accomplished by showing the system a set of stop-sign images. The greater the set of images presented, the “smarter” the system becomes and more capable of recognizing a stop sign.
“What happens with deep learning, as we know as it’s in our NCAP devices and in our autonomous cars, is we take it through our network and we teach it with pictures,” Neumann said. “We show them traffic signs, and we say this is a traffic sign, this is a pedestrian, and this is a border line, so that they understand the environment.”
“The problem is if half the traffic sign is covered, or it’s bent or it's not perfect, then these systems have problems,” Neumann said. “They need to be trained with thousands and thousands of pictures, for what they call the tail end or the edge, which is things which look like traffic signs, but are not exactly perfect. So the answer to that problem is, you teach even more. So you need more processing power, more power consumption and so forth.”
The software Cartica is developing takes an entirely new approach to an AI system that is tasked with object recognition. In lieu of a brute-force deep-learning program that involves feeding more data into a system until it can recognize a sufficient envelope of use cases, Cartica is employing a new learning technique with what it calls “signatures,” which it likens to cues that mammals process in the brain’s cortex (the origin of the parent company’s name) to recognize objects.
“For every live signal that comes from sensors – such as cameras, radars, lidar, ultrasonic – a signature holds all the information the system can extract from the signal to represent the content,” Barak Matzkevich, Cartica’s COO explained. “A signature is basically a series of numbers, taking in a huge space of 30 billion numbers just like we have 30 billion neurons in our brain. When I see something, certain neurons are firing up. A signature holds those numbers that are firing up to represent what the system sees in the input signal.”
This “unsupervised” learning approach relies less on the need to train the system for all possible scenarios, and instead mimics the brain’s early development. “When we're a newborn, before you even understand anything, you structure your world. You say, ‘OK, that's a horizontal line, that’s a vertical line, those are circles. You structure your world into patterns called signatures,” Neumann explained. “The first time you show a baby a glass of water and you say, ‘this is a glass’ they can connect signatures. If these signatures come together then they call it a glass. Even if you turn that glass upside down, if the glass looks a little bit odd, they would still say it's a glass.”
“They call it unsupervised learning – it's not totally unsupervised, your parents had to show you a stop sign once or twice before you got it – but you don't need to be told by your parents at every stop sign for years. You knew it after you saw two,” Neumann said. “Even if you only see half of it, even if it's upside down, even if the red is not the exact red, even if the word stop would be written with handwriting on it, you would say, ‘This is a stop sign.' You do this with signatures.”
“You don't look at every pixel and try to bring them together. You look at these lines and how lines fit together, you look at color patterns, very basic structures,” Neumann said. “If you feed a picture into a neural network, you feed pixels into that network. And if you teach a system to learn what a stop sign is, if deviating a little bit from the standard symbol and it looks a little odd, you have to retrain it with the odd one, show it in different angles, different colors, different light situations. By only dealing with signatures, we dramatically reduce the complexity of the problem.”
More spoof proof, sensor agnostic
Cartica AI’s software is sensor agnostic, transforming sensor data to generic compressed signatures in an unsupervised manner to permit a broad range of perceptual tasks. This includes a host of vision-based AV recognition tasks such as identifying other vehicles, pedestrians, signage, road markings, etc. Because it’s not relying on a specific database of taught examples, in theory, it would be less susceptible to “spoofing” – a failure state of deep learning where small changes in an object’s appearance prevent it from being recognized. This could make Cartica’s system more effective in real-time, edge-case scenarios.
“There are certain stickers which you can place on a stop sign – you wouldn't even say, ‘okay, that’s a little debris on the stop sign’ – but all these new systems fail. They get so confused by this pattern that they can't see the stop sign anymore. They might think it's a giraffe or something,” Neumann said. “So that's another advantage. Even if I would cover up that stop sign with a very odd picture of anything, you would still say, ‘This is a stop sign.’ Try that with any computer.”
According to Neumann, Cartica’s software is also sensor agnostic, making it applicable for sensor fusion, a crucial trait as autonomy levels increase. “For the signatures it doesn't matter where they come from, they're all combined in the same logic. Any sensor can create a signature, and the signature feeds into your recognition. Pretty much like your eyes, ears and touch do it. It's a very powerful mechanism to do sensor fusion, which we’ll need for the next generation,” he said.