TetraMem Integrates Energy-Efficient In-Memory Computing with Andes RISC-V Vector Processor

Facebook
Twitter
LinkedIn

By Wenbo Yin, Vice President of IC Design, TetraMem Inc.

Introduction
The rapid proliferation of artificial intelligence (AI) across a growing number of hardware applications has driven an unprecedented demand for specialized compute acceleration not met by conventional von Neumann architectures. Among the competing alternatives, one showing the greatest promise is analog in-memory computing (IMC). Unleashing the potential of multi-level Resistive RAM (RRAM) is making the promise more real today than in the past. Leading this development, TetraMem, Inc., a Silicon Valley based startup, is addressing the fundamental challenges holding this solution back. The company’s unique IMC that employs multi-level RRAM technology provides more efficient, low-latency AI processing that meets the growing needs of modern applications in AR/VR, mobile, IoT, and beyond.

Background on the Semiconductor Industry
The semiconductor industry has seen significant advancements over the past few decades, particularly in response to the burgeoning needs of AI and machine learning (ML). Innovations in chip design have pushed the boundaries of performance and efficiency. However, several intrinsic persistent challenges remain, such as the von Neumann bottleneck and memory wall, which limits data transfer rates between the CPU and memory, and the escalating power consumption and thermal management issues associated with advanced node technologies.

In-memory computing (IMC) represents a ground-breaking computing paradigm shift in how data processing is accomplished. Traditional computing architectures separate memory and processing units, resulting in significant data transfer overheads, especially for the data centric AI applications. On the other hand, IMC integrates memory and processing within the same physical location, enabling faster and more efficient data computations with a crossbar array architecture to further eliminate the large quantity of intermediate data from those matrix operations. This approach is particularly beneficial for AI and ML applications, where large-scale data processing and real-time analytics are critical.

Selecting a suitable memory device for IMC is crucial. Traditional memory technologies like SRAM and DRAM are not optimized for in-memory operations due to their device and cell constraints and their volatility idiosyncrasies. RRAM, with its high density, multilevel capability and non-volatility with superior retention, overcomes these challenges with no refresh needed. The working principle of RRAM involves adjusting the resistance level of the memory cell through controlled voltage or current, mimicking the behavior of synapses in the human brain. This capability makes RRAM particularly suited for analog in-memory computing.

TetraMem has focused its efforts on multi-level RRAM (memristor) technology, which offers several advantages over traditional single level cell memory technologies. RRAM’s ability to store multiple bits per cell and perform efficient matrix multiplications in situ makes it an ideal candidate for IMC. This technology addresses many of the limitations of conventional digital computing, such as bandwidth constraints and power inefficiency.

The RRAM programmable circuit element remembers its last stable resistance level. This resistance level can be adjusted by applying voltage or current. Changes in magnitude and direction of voltage and current applied to the element alters its conductance, thus changing its resistivity. Akin to how a human neuron functions, this mechanism has diverse applications: memory, analog neuron, and, at TetraMem, in-memory computing. The operation of an RRAM is driven by ions. With control of the conductive filament size, ion concentration and height, different multi-levels for cell resistance can be precisely achieved.

Data processed in the same physical location as it is stored with minimum intermediate data movement and storage results in low power consumption. Massive parallel computing by crossbar array architecture with device-level grain cores yields high throughput. And computing by physical laws in this way (Ohm’s law and Kirchhoff’s current law) produces low latency. TetraMem’s nonvolatile compute in-memory cell reduces power consumption by orders of magnitude over a conventional digital von Neumann architecture.

Notable Achievements
TetraMem has achieved significant milestones in the development of RRAM technology. Notably, the company has demonstrated an unprecedented device with 11 bits per cell, achieving over 2,000 levels in a single element. This level of precision represents a major breakthrough in memory compute technology.

Recent publications in prestigious journals such as Nature1 and Science2 highlight TetraMem’s innovative approaches. Techniques to improve cell noise performance and to enhance multi-level IMC have been key areas of advancement. For example, TetraMem has developed proprietary algorithms to suppress random telegraph noise, resulting in superior memory retention and endurance characteristics for RRAM cells.

Operation of IMC
TetraMem’s IMC technology utilizes a crossbar architecture, where each cross-point in the array corresponds to a programmable RRAM memory cell. This configuration allows for highly parallel operations, which are essential for neural network computations. During a Vector-Matrix Multiplication (VMM) operation, input activations are applied to the crossbar array, and the resulting computations are collected on the bit lines. This method significantly reduces the need to transfer data between memory and processing units, thereby enhancing computational efficiency.

Real-World Applications
TetraMem’s first evaluation SoC through the commercial fab process, the MX100 chip (see figure) exemplifies the practical applications of its IMC technology. The chip has been demonstrated in various on-chip demos, showcasing its capabilities in real-world scenarios. One notable demo, the Pupil Center Net (PCN), illustrates the chip’s application in AR/VR for face tracking and authentication monitoring in autonomous vehicles.

To facilitate the adoption of its technology, TetraMem provides a comprehensive Software Development Kit (SDK). This SDK enables developers to define edge AI models seamlessly. Furthermore, the integration with Andes Technology Inc.’s NX27V RISC-V CPU with Vector extensions streamlines operations, making it easier for customers to deploy TetraMem’s solutions in their products.

The TetraMem IMC design is great for matrix multiplication but not as efficient in other functions such as vector or scalar operations. These operations are used frequently in neural networks.  For these functions, Andes provides the flexibility of a CPU plus a vector engine as well as an existing SoC reference design and a mature compiler and library to accelerate our time to market.

TetraMem collaborated with Andes Technology to integrate its IMC technology with Andes’ RISC-V CPU with Vector Extensions. This partnership enhances the overall system performance, providing a robust platform for a variety of AI tasks. The combined solution leverages the strengths of both companies, offering a flexible and high-performance architecture.

Looking ahead, TetraMem is poised to introduce the MX200 chip based on 22nm, which promises even greater performance and efficiency. This chip is designed for edge inference applications, offering low-power, low-latency AI processing. The MX200 is expected to open new market opportunities, particularly in battery-powered AI devices where energy efficiency is paramount.

Conclusion
TetraMem’s advancements in in-memory computing represent a significant leap forward in the field of AI hardware. By addressing the fundamental challenges of conventional computing, TetraMem is paving the way for more efficient and scalable AI solutions. As the company continues to innovate and collaborate with industry leaders like Andes Technology, the future of AI processing looks promising. TetraMem’s solution not only enhances performance but also lowers the barriers to entry for adopting cutting-edge AI technologies.

  1. “Thousands of conductance levels in memristors monolithically integrated on CMOS”, Nature, Mar 2023 https://rdcu.be/c8GWo
  2. “Programming memristor arrays with arbitrarily high precision for analog computing”, Science, Feb 2024 https://www.science.org/doi/10.1126/science.adi9405