ARM Machine Learning Chips Bring Mobile AI Down from the Cloud
Andrew Wheeler posted on February 15, 2018 |

UK-based Arm Holdings is known far and wide among microprocessor manufacturers for having creating the industry standard chip designs that enabled mobile computing to be both possible and powerful. Yesterday, Arm released an announcement saying that they've created a new mobile chip that can run machine learning algorithms on mobile computing devices. 

Project Trillium is poised to add machine learning to almost every type of mobile device. (Image courtesy of Arm Holdings.)
Project Trillium is poised to add machine learning capabilities to almost every type of mobile device sans an internet connection. (Image courtesy of Arm Holdings.)

Project Trillium

"Project Trillium", as it's known internally at Arm, consists of two dissimilar processors: a Machine Learning (ML) processor, and an Object Detection (OD) processor. A third component is a group of neural network (NN) software libraries that serve specialized code to each of the two co-processors so they work in tandem with the device's CPU and GPU. 

Currently, to run machine learning algorithms from a mobile computing device means connecting to powerful cloud servers, such as the Cloud Machine Learning Engine from Google, or Machine Learning on Amazon Web Services (AWS). Bringing machine learning into the hardware of a mobile device is beneficial because it eliminates the inherent latency experienced from sending and receiving information to and from cloud servers. This also enables machine learning tasks to be performed solely on the hardware in mobile devices without the need for a strong internet connection.

Arm ML Processor and Arm OD Processor Specs

The Arm ML processor has a scalable architecture (to expand to IoT and other "edge" computing devices) and runs nearly five trillion operations per second (TOPs) on 1-2 watts of power. This is what enables it to perform inference machine learning locally instead of leveraging cloud platforms. 

The new hardware from Project Trillium will allow mobile computing devices to run inference tasks. Training is done mostly by hardcore GPUs, like those offered by NVIDIA. (Image courtesy of Arm Holdings Inc.)
The new hardware from Project Trillium will allow mobile computing devices to run inference tasks with no internet connection, surpassing the ability of its mobile Mali GPUs to perform such tasks. Training, another machine learning task, is done mostly by hardcore GPUs, like those offered by NVIDIA. (Image courtesy of Arm Holdings Inc.)

The Arm Object Detection (OD) processor is a second generation device, evolved from a computer vision processor currently used in Hive security cameras. Amazingly, the OD processor can detect objects as small as 50 x 60 pixels and up, processing them in real-time HD at 60 frames per second. It can also detect an almost unlimited number of objects per frame, so dealing with the busiest coral reef, or soccer stadium, is no problem.

Training versus Inference

Training

Project Trillium will enable machine learning algorithms to run locally on mobile devices using predetermined software libraries with neural networks that have already been trained. In machine learning, to train a neural network just means establishing parameters which are determined by using categorized examples of inputs and desired outputs. 

Inference

Having these trained neural network software libraries will allow Project Trillium's hardware to run inference, another machine learning task which leverages the previously trained parameters of stored neural networks to recognize, classify and process new and unknown input.

The Importance of Neural Network Software

The rise in popularity of using Neural Networks (NN) to catalyze data analytics for IoT edge devices has two benefits: less energy consumption for data transmission and reduced latency.

Arm NN software was built to fill to gaps between the entire line of Arm Cortex CPUs, Arm Mali GPUs, ML processors and NN frameworks like TensorFlow, Android NN, and TensorFlow. This allows developers to harness all the hardware’s capabilities using Arm NN SDK.

Arm NN SDK

The Arm NN SDK is basically a group of open-source Linux software and tools to program IoT edge devices with machine learning workloads connecting Arm Cortex CPUs, Arm Mali GPUs or the Arm Machine Learning processor.

By using the Compute Library, developers using the Arm NN SDK can target programmable cores like Mali GPUs and Cortex-A CPUs in an efficient manner. The Arm SDK also includes support for the Arm Machine Learning processor and Cortex-M CPUs, which are accessed by the CMSIS NN.

CMSIS-NN

This collection of neural network kernels was developed to decrease the amount of memory used on Arm Cortex-M processor cores by neural networks on IoT edge devices. As a result, neural network inference from CMSIS-NN kernels is more than four times better in runtime/throughput and almost five times as energy efficient.

Caffe will be the first framework supported by the ARM NN SDK, then TensorFlow with others to follow. Taking networks from these frameworks and translating them to internal Arm NN format and will deploy them through the Compute Library on Cortex-A CPUs and Mali GPUs (Mali-G71 and Mali-G72).

Arm NN for NNAPI

NNAPI, Google’s interface for accelerating neural networks on Android devices, runs neural network workloads on Android devices CPU cores, but has a Hardware Abstraction Layer (HAL) that is able to target other processor types like GPUs. The Arm NN for Android NAPI uses this HAL for Mali GPUs.

Tensor Flow Lite

TensorFlow Lite is a new API for Android that was announced in May at Google I/O. Expected to have a dramatic effect on the deployment of neural networks for Arm-based platforms that support Android, this machine learning framework is an inference engine with standardized support via NNAPI in Android and can target a range of accelerators from Arm, including the Mali GPUs.

Neural network inference is supported by a model representing a neural network which is provided by a machine learning framework like TensorFlowLite. The Android NN Runtime schedules how the graph will run, whether its on a CPU or any device registered to support neural network computation. The selected device will be given a model to run, break the workload down into a few crucial operations, and run the inference process on the model. This produces the final result for use by the application.

Availability

Project Trillium will be launching two weeks prior to the Mobile World Congress 2018 event. Attendees will be able to see the Arm Object Detection processor in smart cameras and IP security. 

Project Trillium will herald an acceleration of interconnection, automation and new AI computing power to almost any mobile or IoT device imaginable.

Recommended For You