How Engineers Are Using TinyML to Build Smarter Edge Devices

A peek at the complex decisions made by TinyML researchers and designers.

Engineers are using some pretty wild tools to optimize the current generation Internet of Things (IoT) devices for the integration of machine learning. IoT devices sit on the very edge of networks, gathering data through an array of miniaturized computing devices (typically microcontrollers) with embedded sensors. These are known as smart sensor applications in the field of edge computing. The hardware is tiny. The electricity is ultra-limited. Efficiency of circuitry is everything. Computing power is limited by the tininess of the hardware.

It’s no mystery why engineers have been exploring the use of tools like machine learning and computer vision to improve the efficiency of these small and ubiquitous IoT devices. The hardware of IoT devices is pretty straightforward: just microcontrollers (MCUs) and sensors. MCUs are everywhere now—they’re already in our cars, watches and appliances. These devices are so small and inexpensive that they can run on the power of a single solar cell for very long stretches of time.

With 5G mobile wireless networks rolling out across the globe, the number of IoT devices is set to expand greatly, as will the number of sensors and microcontrollers. As these IoT devices become more numerous, the amount of data they generate will also vastly increase. Since machine learning models are extremely good at parsing out useful data points from an increasing sea of data streams, parties with a vested interest in data are finding new ways to incorporate machine learning into IoT devices.

The implementation of machine learning on microprocessors is called tiny machine learning, or TinyML. TinyML refers to the approaches, tools and techniques that engineers use to implement machine learning on devices in the mW power range and below.

TinyML and Energy Efficiency

(Image courtesy of Wikipedia.)

(Image courtesy of Wikipedia.)

With IoT edge devices, energy efficiency is crucial to optimize computing power for sorting data with TinyML techniques. Some TinyML techniques use software compilers to compress neural networks and improve the efficiency of an IoT device’s ability to sort out useful targeted data from large amounts of extraneous data. Making use of properly filtered target data requires an exponential improvement in the accuracy of the collected target data.

This is accomplished in part by placing machine learning directly on a sensor.

Tech Giants Are Placing Machine Learning on Sensors

Companies such as ARM, Qualcomm, Google and Microsoft are attempting to bring AI inference to the very edge of networks and place them on sensors. If designers and engineers succeed in moving a machine learning model from a network gateway to a sensor, the hope is that three things will dramatically improve: latency, privacy and data filtration.

Many demos in the TinyML space center on voice recognition on tiny microcontrollers or the detection of people with computer vision on a small sensor. But small groups of researchers and a few small startups are writing software that can optimize existing machine learning models (in the form of neural networks) to run on sensors—without sacrificing accuracy—by leveraging the cloud for neural network compression.

There are three key areas besides software compilers and compressing neural networks that are showing a lot of potential to researchers in the TinyML space: in-memory computing, large-scale deployment of smart-data sensors at the edge of networks, and making smart design decisions across the complete algorithm-architecture stack.

The Potential of In-Memory Computing for TinyML

Deep learning has become quite popular in recent years, and the demand for algorithms that can perform machine learning processing with greater speed and efficiency is steadily increasing. Researchers interested in TinyML are attempting to leverage in-memory computing to achieve this for IoT devices at the edge of networks.

In-memory computing generally refers to a combination of computing software and hardware (RAM) that allows RAM data to be processed in parallel across a group of networked computers. In TinyML, the advantages of harnessing in-memory computing are its ability to parallel process computations called matrix-vector multiplications (MVMs), which are central to deep learning. There are digital accelerators that can be optimized to a certain rate that is static in deep learning due to the nature of MVMs, and TinyML researchers are hoping to bypass these limitations by using in-memory computing.

There are a few reasons for this: in-memory computing TinyML prototypes have demonstrated 10 times the normal energy efficiency and throughput compared to optimized digital accelerators. The engineers of theses prototypes performed a clever hardware design maneuver: by structurally aligning dense 2D memory arrays with the dataflow of MVMs, the data does not move around as much and the cost of computation is offset through this organization into highly parallel operations.

Large-Scale Deployment of Smart Data Sensors at the Very Edge of Networks

When designing a chip for mW class inferences, deep learning-based approaches in TinyML are showing promise in resolving network bandwidth bottlenecks. If a sensor on an MCU is taking in a large amount of raw data, deep learning techniques can reduce and refine the data into highly qualitative data. Such data requires far less network bandwidth. But reducing and refining the raw data into highly qualitative data that uses less bandwidth comes at a price, computationally speaking. Limiting the amount of energy that supports deep learning’s computational complexity means testing every possible optimization configuration when designing a chip for mW class inference. This type of systematic optimization must be performed throughout the design and engineering process to ensure the highest levels of computational precision with ultra efficient circuitry.

Optimizing at the System and Application Level to Balance Computational Precision and Efficient Circuitry for the Best Outcomes

How do you keep track of algorithmic and chip architecture in TinyML? There is the software component, where neural network model designers have many options to choose from and tweak, including designing shallow wide low precision networks, deep narrow networks, and binary networks.

On the hardware side, chip and circuit architects have many design options, including multilevel memory hierarchies, MAC-centric streaming architectures, variable precision digital processing, and in-memory processing (from in-memory computing). Making decisions while keeping multiple optimizations of multiple algorithm-architecture stacks in mind is just half the tightrope act involved in TinyML. Every single design decision has to be calibrated with a full understanding of its impact on the final system and application. Getting the design process right can be unbelievably tense and frustrating, and this is only the beginning.

Bottom Line

Advances in ultra low power machine learning technologies and applications require a variety of complicated tools to manage chaotic datasets and mind-boggling catch-22s. But the potential applications and innovations are virtually limitless, especially as our mobile wireless technologies expand into the next generation of 5G.