I will be talking about something rather interesting today: the AutoPilot technology that powers the Tesla Model S and Model X electric vehicles. The feature itself has gained a lot of fame in the recent months thanks to its obvious novelty value and the fact that it is the first hands-off, self-driving technology on the market today. Tesla’s Autopilot enables the latest Model S and Model X to autonomously change lanes, follow vehicles and curves, along with the usual combination of accident prevention technology such as emergency steering and braking. To-date however, the ‘behind the scenes’ of the technological marvel have remained rather shrouded in mystery – something I hope to change in this editorial.
Tesla’s luxury electric cars have quickly become the darling of the automobile market. The hype is at an all time high – and for good measure too – since they are nothing less than a technological marvel. The nature of these cars is boldly different from most other production vehicles out there. Tesla’s Semi-Autonomous driving technology is just one example of the multitude of exciting features that are currently available only to a Tesla owner.
Unfortunately for tech enthusiasts, there haven’t been a lot of details easily available on the workings of the technology so far. Presentations by Nvidia and interviews of Elon Musk tend to give a hint to what they might be, but as you will find out in this article, some of the hints have been interpreted inaccurately by various publications, while some diligent observers were on the right path. Without any further ado, here is the breakdown of today’s piece:
- Basic Autopilot Implementations and DNNs: A brief introduction to some Autopilot sensing approaches as well as Deep Neural Networks and Machine Learning.
- Tesla’s Digital Cockpit – Nvidia’s VCM: A cursory look at the hardware behind the infotainment system and instrument cluster aboard the latest Tesla models.
- Tesla’s Autopilot System – MobilEye: Looking at the word’s first DNN employed (in this capacity) on the road. We will be diving deep into the primary technology powering Tesla’s autopilot.
- Mobileye Continued – Hardware Specifications and Overview
- What the future holds: A look at what we can expect with future hardware upgrades to Tesla EVs.
Before we begin, I would like to point out that all information presented in this article has been independently verified and comes from sources knowledgeable about the matter. The utmost care has been taken to ensure that all relevant companies were queried about any important details. That said, we accept the possibility that there might be any human error or typographic mistake – any such peculiarity (if any) will be corrected at the earliest
I think a (very) basic introduction on general Autopilot technology and DNNs is in order. Those already familiar with these can skip this portion. Advanced Driver Assist Systems (ADAS) are becoming more and more prevalent in our cars – but most of them are hidden just out of sight. While some autonomous vehicles like Google’s Self Driving car have obtrusive sensors on their roofs, not every semi-autonomous car is made the same.
Looking at the various approaches to achieving autopilot
An automobile car tech consists of three different components: the sensors, the hardware back-end and the software back-end. There are three broad categories of sensors, various types of processors and various types of software back-ends as well – and we will be focusing primarily on Tesla’s approach.
Lets begin by comparing the different array of sensing devices: these are the RADAR/Ultrasonics, LIDAR and your average camera. All approaches have their own advantages and disadvantages. Until previously the LIDAR approach was the most popular one, albeit costly; but the trend has gradually begun shifting to a camera based approach for various reasons.
Lets start with the RADAR. This piece of equipment can easily detect cars and moving objects, but unfortunately it is unable to detect lanes or motion-less objects. This means that it is not very good at detecting pedestrians and stationary humans. It is a very good sensor to have as a redundant device – but not the ideal primary sensor.
The LIDAR can not detect lanes but is able to detect humans reasonably well – but comes at a much higher cost. The expensive piece of equipment has a large foot print and can break the bank for some price points. Models with high enough resolution to offer high reliability are usually even more expensive.
The last (and latest ) approach on the other hand is the camera system. This is the primary sensor (in conjunction with a front facing Radar) used in Tesla vehicles. A camera system is your average wide angled camera equipped on the front or in a surround configuration on the car. Unlike a RADAR and LIDAR a camera sensing equipment is only as good as the software (the camera resolution matters but not as much as you would expect) processing the inputs – a primary component of which is a DNN. You can think of DNNs as the virtual “brain” on the chip – which interprets results from the camera and identifies lane markings, obstructions, animals and so on. A DNN is not only capable of doing pretty much everything a RADAR and LIDAR can do but is also able to do much more – like read signs, detect traffic lights, road composition etc et all. We’ll cover this in more depth in the later sections.
A short (high level) introduction to DNNs
Now lets talk about Deep Neural Networks or DNNs for short. The way a neural network works is low level code, so I can only provide a very simplified explanation. Neural Networks were thought of first as a way to perfectly simulate the Human and Animal nervous system where a neuron fires for any object ‘recognized’. The reasoning went so: if we could replicate the trigger process with virtual ‘neurons’ we should be able to achieve ‘true’ machine learning and eventually even Artificial Intelligence. The first DNN was created by Google.
The project was called Google Brain and consisted of around 1000 Servers and some 2000 CPUs. It consumed 600 000 Watts of power (drops in the ocean that is server level power consumption) and cost 5 Million dollars to create. The project worked. The objective was successful. Within the course of a few days the A.I. learned to tell humans apart from cats. It did this by watching Youtube videos. For three days. The project was eventually shelved due to very high costs of scalability. Oh it worked, but it was too slow.
In more recent times, Nvidia managed to accomplish, what Google did, with just 3 servers. Each Server only had 4 GPUs running, thats 12 GPUs in total. It consumed 4000 Watts and cost only 33, 000 Dollars. This is a setup that an amateur with deep pockets can recreate easily. Or an a low funded research lab. Basically you could now get Google Brain’s power 100 times cheaper with 100 times less power consumption, with the added benefit of scalability.
But how exactly does a DNN function? Well, the human brain recognizes objects through its edges, it doesn’t see pixels, it sees edges. A DNN tries to recreate how a Human Brain functions by programming it to only recognize edges. A ton of code is added and then begins the Unsupervised ‘Machine Learning’ time period. In this, the DNN is given material, which is either images or videos or data in any other form.
One by one, the virtual neurons are created, unsupervised and unprogrammed, that recognize a specific edge. When enough time has passed it can distinguish between whatever the DNN was told to look out for. The ‘intelligence’ of the DNN depends on its processing power and the time spent ‘learning’. Now that we have that out of the way, lets move on to the insides of the Tesla Model X and S.
Before we jump into the details of the Autopilot technology, there is something else that is worth a look into as well – Nvidia’s technology in Tesla Vehicles. This is the tech that powers the digital cockpit inside the Model S and Model X. Although the company has been very serious about ADAS (Automated Driver Assist Systems) and DNNs on the road, it actually (currently) only supplies Tesla with the necessary hardware to drive the digital cockpit. This means that the 17.5 inch infotainment screen and the instrument cluster panel behind the steering wheel.
Nvidia’s last financials indicated a very strong growth of their automotive department. The reason is ofcourse that their Tegra chips are increasingly in demand as the ultimate choice to power digital cockpit systems for various automobile vendors. Nvidia has also been aspiring to break into the ADAS business with its new Drive PX chip. Infact, Elon Musk was present at CES this year when the CEO of Nvidia demonstrated the capabilities of the Drive PX board. This caused many to speculate that the board was present inside the Tesla models. This is however, not true, and you would be forgiven for thinking that the latest Tesla vehicles contain the module.
Tesla vehicles do not (at the time of writing) have Nvidia’s Drive PX/CX board and instead rely on the Tegra K1 (VCM) to power the digital cockpit
We were also able to find out the exact Tegra model present in the new Tesla EVs. Previously, the cars possessed a Tegra 3 processor, but all modern Model Xs and Model S cars will have the Nvidia VCM (Visual Computing Module). NVIDIA has a long-standing relationship with many automakers including Audi, Volkswagen and BMW. These partners are now using the Tegra Visual Computing Module (VCM) for their infotainment systems. The VCM is a highly flexible platform, incorporating an automotive-grade NVIDIA Tegra mobile processor with dedicated audio, video and image processors. Nvidia revealed their VCM module a year or so back and it consists of the Tegra K1 with a grand total of 384 GFlops. To put that into perspective, that is more power than last generation consoles (PS3/Xbox 360).
Now, interestingly, when the VCM was revealed, it was stated to serve two purposes: driving infotainment systems and providing visual driving aid. The infotainment capabilities include not only advanced material rendering (the speed gauges and instrument cluster behind the steering wheel) but actual gaming as well. This 3D rendering capability is being effectively leveraged in the indicators on the instrument panel that turn on and off during Autopilot. The dynamic lane markings and the car detection that is shown on screen is another example of this.
The VCM can be easily inserted into a standard 1-DIN automotive enclosure (the size of a car stereo). This is also the same component that allows the Tesla’s car’s dashboard infotainment system to be updated over the air, putting it on a different life cycle than the rest of the vehicle. The chip has 192 Kepler based cores and a TDP of around 5W. The Tegra K1 was the first mobile chip by Nvidia to have CUDA implementation. It is the first mobile chip to surpass the 300 Gigaflops mark in Single Precision computing, thanks to its general purpose, GPGPU CUDA Cores.
The driving aid portion of the chip was stated to be capable of pedestrian detection, adaptive cruise control, collision detection, land departure warning and blind spot monitoring. However, the chip ended up being used solely for driving graphics and general housekeeping on the Tesla vehicles. All the bulk of ADAS and Semi-Autonomous driving fell onto a chip from a completely different vendor. This vendor is the magician behind the magical act of Tesla’s Autopilot and is very well known in the automobile industry. I am referring ofcourse, to Mobileye.
Those who are diligent enough will know that the company Mobileye powers the self-driving capabilities of the Tesla vehicles. I will go over the features of this system in this part whereas the second part of the Mobileye section will contain a hardware overview as well as a comparison to Nvidia offerings.
Introduction to Autonomous Driving and ADAS
Mobileye refers to driving automation in three broad milestones.
The first milestone is ADAS or Automated Driver Assistance Systems. An ADAS system assumes that the driver is in control of the car for most of the time but will provide assistance or emergency capabilities. This includes the likes of AEP (Automatic Emergency Braking), Adaptive Cruise Control, Collision Avoidance Systems and similar features. This is something that is now part and parcel of most high end (and mainstream) vehicles with a select few (including Tesla) even having advanced ADAS like Emergency Auto Steering.
The second milestone is Semi-Autonomous driving, something only Tesla can claim at the moment, and consists of the car driving itself (hands-off the steering wheel) with the driver being a necessary requirement for regular monitoring. In this case, the car will handover control to the human in various scenarios. The element of the human driver is assumed to be an active participant in the process – albeit one which doesn’t interfere for some (if not most) of the time. Basically, if you crash the car while on Autopilot – you are responsible.
The last and final milestone is fully Autonomous Driving, in which a car can go from Point A to Point B without any human monitoring necessary and can tackle all sorts of scenarios on its own. The role of the human driver here is one that is completely passive and should remain passive for the duration of the trip. It is this future that automobile companies are striving for and companies like Mobileye and Google are racing towards.
Mobileye plans to complete the Semi-Autonomous phase of self-driving cars by 2018, during which the capability of a semi-autonomous car will be increased from Highways, to Country Roads and finally City Roads. Note that the Tesla Autopilot currently does not work on roads where the lane markings aren’t clear – even though Mobileye is perfectly capable of holistic path planning without any markings.
Tesla’s autopilot system is unique in many ways – but one of the first things worth mentioning is that:
Tesla’s Autopilot, powered by Mobileye, is the world’s first DNN deployed on the road
I think the best way to tackle the process that goes on behind the scenes is to simply break it down in parts. Please note that while Tesla uses a plethora of software the primary bulk of ADAS and Semi-Autonomous driving is handled by Mobileye’s chip. The process shown above is a high level diagram of how the Tesla Autopilot system functions. Some of it includes algorithmic functions such as motion segmentation, ego-motion, camera solving etc, but the really interesting part is the DNN based functions. As mentioned above, Mobileye has deployed the first DNN on the road with Tesla EVs and is responsible for the following (major) jobs:
- Free Space Pixel Labeling
- Holistic Path Planning
- General Object Detection
- Sign Detection
A very pertinent point to make here is that there is a difference between the core Mobileye DNN and the system Tesla is using to ‘learn’ – they are not the same. To reiterate:
The system Tesla EVs use to make the autopilot ‘learn’ over time is an implementation of their own design and not related to Mobileye
One of the primary things that the Tesla Model X and Model S are capable of is 3D Modelling of vehicles on the road. Owners might have guessed as much, since the quaint little cars that appear on the instrument cluster are generated after this is done. Not to mention that the “follow-the-car” approach Tesla so happily utilizes is dependent upon this. All this is done ofcourse, by machine learning. The DNN in question was trained with various side , front and rear sides of various cars until it is able to detect them to a reasonable accuracy and consequently construct a 3D model of the same (just a plain box showing the area occupied in real 3D space).
Free Space Pixel Labeling is simply put, recognizing the area on-camera which is obstruction free. It is also the area upon which the car will be allowed to go. This allowable area is shaded in green in the images below (real output generated by the Mobileye DNN). As you can see, road edges, and vehicle edges are being correctly recognized with very little visual cues. This part of the system is very critical because if executed incorrectly it could result in the car veering off-road, into an object or worse.
Given below are representations of the Holistic Path Planning capability of the Mobilye chip which allows it to decide the way forward with very little visual cues. This is the process which tells the car where to drive to and controls the steering. Anyone with knowledge of how these things work would agree that it is remarkable that the processor is able to distinguish the road without any high contrast lane markings. Note that this feature is only partially available on the Tesla (probably because the manufacturer has decided to play it safe) and only works with clear lane markings.
Of course, driving isn’t just about flooring the accelerator and steering (though some might argue otherwise); situational and contextual awareness is something that is very crucial. This is where Mobileye’s Object Detection Capabilities come in: something that is a much more traditional implementation of DNNs. In this case, the chip on-board the Tesla is capable of identifying over 250 signs in more than 50 countries. These include everything from turn signs to speed limits. The system is also capable of identifying and interpreting traffic lights, road markings and general items such as traffic cones. It even has the capability to detect large animals that appear suddenly on the road – and of course, human pedestrians.
Last but not least is the capability to detect the road surface as well as any debris present. This allows the Tesla to be aware of not only what kind of road the car is traveling on (highway vs country side etc) but would also allow it to detect debris and other undesirables such as potholes on the road (and consequently avoid them). The DNN based system is even able to identify the types of tarmac and road composition and adjust steering and electronic stabilization accordingly – something that will be part and parcel of tomorrow’s smart car.
While we are on the topic of general features it is worth mentioning that most of these capabilities were originally designed to run a Monocular setup – which means that it was designed to function with only one primary camera. This has since been expanded to bigger and expansive surround configurations which provide much more visual coverage. This offers unparalleled flexibility and reliability to the car owner. Thanks to the increases in processing power offered by the current generation Mobileye chip, complete 360 Surround awareness is now part of the package although most of the ADAS and driving simulation still uses the camera and radar setup on the front.
Many of these capabilities remain dormant for the time being, until Tesla deems them ready for activation
Before we delve into the nitty gritty of the hardware involved I would like to point out that many of these features have been adapted by Tesla and will only be activated/modified back to their original state once it feels the time is right. The system is approaching zero tolerance for mistakes – but it is very rightly said to be a beta program. As the system “learns” and becomes more adept at navigating without human assistance, this should change, but until then, Tesla owners should be proud of the fact that they are driving nothing less than an absolute technological marvel. Here is a gif that surfaced a while back showcasing what a Tesla ‘sees’. Readers should be able to spot the various types of DNN based techniques at work here:
The chip behind the magic of Autopilot on the Tesla is the Mobileye EyeQ3 processor
This is the part, much of our hardware audience has been waiting for. Where we explore not only the hardware side of things behind Tesla’s autopilot but compare them to alternatives as well. The EyeQ3 debuted in November 2014 with the Tesla Model S and has had over 9 more launches this year. Based on the 40nm process, the tiny silicon packs 4 Cores clocked at 500 Mhz, with 64 MAC (Multiply and Accumulate Units) per core. With TDP of just 2.5W and a utilization of 80%, the chip can deliver upto 102 Million Multiply and Accumulate Operations per Second. The processor architecture has been built from the ground up to focus on specific ADAS processing (naturally).
MobilEye EyeQ3 Vs. Tegra X1 Specification Comparison
|Clock Speed||0.5 Ghz||1 Ghz|
To give a better context, given below is a comparison with Nvidia’s Tegra X1. The recent processor, based on the much more advanced 20nm process rocks 256 CUDA core which have about 2 MAC per core. With a utilization rate of 28% the chip can deliver 143 Million Multiply and Accumulate Operations per Second. The TDP as we know is 10W which makes it far less efficient than the Mobileye offering. The EyeQ3 has a die size of just 42mm² whileas the Tegra X1 has a die size of 126mm². Here is a comparison table of the two processors (utilization measured using cuDNN (Nvidia’s implementation of DNN) on “GTX 980” with 2048 Maxwell Cuda cores):
The EyeQ3 chip is a 4+4 design with 4 MIPS cores and 4 VMP cores (Vector Microcode Processors). The SoC is compatible with LPDDR1/LPDDR2/DDR2/DDR3 SDRAM and capable of executing both 32 bit and 64 bit instructions. The interconnect is clocked at 530 Mhz. Both the CPU and VMP are clocked at 500 Mhz. The successor to this chip will be the EyeQ4 processor which has an estimated arrival year of 2018.
I am sure most of us will remember the Drive PX board Nvidia showed off at CES this year. Well as it turns out, the board in question was also an Audi zFAS board and had the EyeQ3 chip on board – which handles all of the actual ADAS functionality. The same board is featured on Nvidia’s website as well. The company has marketed Drive PX as an alternative to Mobileye and maintains that the board does not use the EyeQ3 chip.
Mobileye plans to introduce the EyeQ4 chip by 2018 – a truly worth upgrade with 14 computing cores, out of which 10 are accelerators. You will have your usual 4 cores as well as accelerators of three different kinds: PMA (Programmable Macro Array), VMP (Vector Microcode Processors) and MPCs (Multithreaded Processing Clusters). The actual count will be 4 Cores, 6 VMPs, 2 PMAs and 2 MPCs for the EyeQ4. This will result in approximately 1.26 Billion MAC/s or 2.5 Tflops. And here is the best part – all of this will be done while staying within the TDP target of 3 Watts.
|Clock Speed||1 Ghz||0.75 Ghz||1|
As you can see from the diagram the EyeQ4 is exponentially more complex than its predecessor and will be able to handle diverse loads. Then EyeQ4 has a utilization percentage of 96% – which is about 4 times higher than any GPU in GPGPU mode. It is 10 times more powerful than the EyeQ3 and will be able to process input from a maximum of 10 cameras at 36 frames per second.
The EyeQ4 has had a single design win so far (currently unknown) which means that the honor to power the future self-driving Tesla’s could be up for grabs for rival manufacturers. Nvidia’s offerings were relatively under powered these rounds and no doubt, 16nm FinFET based offerings will be much more superior not only in terms of efficiency but customized compute as well.
That said, I remain skeptical that the company will be able to score a design win with Tesla in the future, considering the enormous amount of experience gap between Mobileye and the GPU manufacturer. Unlike Nvidia, whose architecture is usually designed for an all rounded purpose and runs DNNs in a GPGPU environment, Mobileye’s architecture is very explicitly designed for a single purpose: to execute ADAS and self driving functionality.
What the future holds
I am sure most of our readers will have seen the video of the amazing Tesla save in which the car engages Automatic Emergency Braking, successfully averting a head on collision that most humans would have been unable to avoid. This is probably a very good example of what Elon Musk was talking about when he stated that human drivers could be very well banned in the future – an accident was averted because a human was not in control. To be fair to humans however, there are various scenarios in which an AI powered car would be completely incapable to respond but a human would be able to. While most of us would enjoy Self-Driving functionality, the painted future where human driving is banned – is frankly, a little scary.
Talking on concrete grounds, and looking at the exponential increase in processing capabilities and progress being made in “leaps and bounds” (to quote Mobileye) it looks like the first hints of fully autonomous driving might be here by 2018. Tesla, in conjunction with Mobileye, has been a pioneer in bringing self-driving capabilities to the roads – taking the step that most manufacturers hesitated to take for fear of legal embitterment. In the parameters that these systems are configured for, they are without a doubt, much safer than a human driver. However, fully autonomous driving involves catering for 100% of the sample size of scenario; and while modern cars have proven to be more than capable of semi-autonomous driving, full autonomy is a whole different ball game and quite a long time off.
As far as the future of the hardware goes, as I mentioned before, Mobileye’s next chip – the EyeQ4 has yet to get a design win from Tesla. Although it has the highest chance to do so than any other alternative. Elon Musk has also been very active in Nvidia presentations. This means that there is a possibility that the future Teslas could very well be powered by green. A single vendor solution would have its benefits as opposed to hybrid system that Tesla is currently using. Ofcourse it is also equally likely that Tesla will simply use Nvidia based solutions for redundancy and added features on top of the Mobileye setup – something that should ensure higher safety once we go towards fully autonomous vehicles. Needless to say, I am speculating here, since it is just as likely that the exact same setup remains with just added sensors and an upgrade to EyeQ4.
I think that pretty much covers everything. We are finally seeing the first vestiges of something resembling the promised future. Where cars are fully autonomous and manual driving is frowned upon. We are also slowly but steadily moving towards a point in time where human driving will be outlawed or heavily restricted – the only question that has yet to be answered is when. DNNs are becoming more and more precise and the massive leap between progressive generations will quickly result in the day where it would be very hard to distinguish a DNN based intelligence from a true human intellect. That time is still very far away, but without a shadow of a doubt, it is coming. And on that (hopefully) ominous note, I would like to end this editorial.