AndesCore™ A46MP(V)

32 bit RISC-V Multicore Processor with 256-bit VLEN and AMM

AndesCore™ A46MP(V) Overview (Preliminary)

  • 2 different packages with or without vector: A46MPV, A46MP
  • in-order dual-issue 8-stage CPU core with up to 256-bit VLEN
  • Symmetric multiprocessing up to 16 cores
  • Private Level-2 cache
  • Shared L3 cache and coherence support
  • Dual scalar and vector load/store unit
  • Enhanced sharable High bandwidth vector local memory (HVM)
  • AndeStar™ V5 Instruction Set Architecture (ISA)
    • Compliant to RISC-V RV32 GCBPV + CMO extensions
    • Andes performance extension
    • Andes CoDense™ extension for further compaction of code size
    • support RVA22 mandatory features relevant for RV32 processors
  • Separately licensable Andes Custom Extension™ (ACE) for customized scalar and vector instruction
  • Branch predication to speed up control code
  • Linux-capable Memory Management Unit (MMU)
  • Physical Memory Protection (PMP) and programmable Physical Memory Attribute (PPMA)
  • Andes-enhanced Platform-Level Interrupt Controller (PLIC) for a wide range of system events and real-time performance
  • Multiprocessing up to 16 cores with hardware managed data coherence
  • Configurable VPU vector length (VLEN) and datapath length (DLEN)
  • BF16 full arithmetic mode for scalar and vector
  • Andes Matrix Multiply Extension for fast matrix multiply computation
  • Platform-Level Interrupt Controller (PLIC) support with easy arrangement of preemptive interrupts
  • ECC or Parity for SRAM error protection
  • StackSafe™ hardware to help measuring stack size, and detecting runtime overflow/underflow
  • Versatile configurations to tradeoff between core size and performance requirements
  • PowerBrake and WFI (Wait For Interrupt) for different power saving occasions

AndesCore™ A46MP(V) 32-bit multicore CPU IP is an 8-stage superscalar processor with Vector Processing Unit (VPU) based on AndeStar™ V5 architecture and Andes Matrix Multiply (AMM) extension. It supports RISC-V standard “G (IMA-FD)”, “ZC” compression, “B” bit manipulation, DSP/SIMD ‘P’ (draft), “V” (vector), CMO (cache management) extensions, Andes performance enhancements, plus Andes Custom Extension™ (ACE) for user-defined instructions. It supports all RVA22 profile mandatory ISAs relevant for RV32. It features MMU for Linux based applications, dynamic branch prediction for efficient branch execution, dual-issue of common instruction pairs, level-1 private instruction/data caches, private level-2 cache and local memories for low-latency accesses. The A46MP(V) symmetric multiprocessor supports up to 16 cores and a shared level-3 cache controller. Coherence Manger ensures data coherence among CPU accesses and IO transactions from external bus managers. All caches are non-blocking with prefetch support. The A46MPV have a powerful VPU with up to 256b VLEN and Matrix unit, is excellent for computations involving large arrays of data.

Applications

  • Computer Vision and Image Processing
  • Digital Signal Processing
  • Machine/Deep Learning Acceleration
  • Real-time Control
  • Networking

Block Diagram

Development Tools

  • AndeSight™ Integrated Development Environment (Eclipse-based)
  • AndeShape™ FPGA Development Boards
  • COPILOT: Automation tool for Andes Custom Extension™
  • AndesClarity™: Processor Pipeline Analyzer and Visualizer
  • AndeSoft™ NN Library: Optimized for RISC-V DSP/SIMD and Vector extension

Key Features and Performance

AndeStar™ V5 Architecture

Key FeaturesBenefits
RISC-V RV32 GCBPV+CMO
Support all RVA22 mandatory features relevant for RV32
  • State-of-the art ISA from latest developments in computer architecture
  • Industry standard and open architecture
Andes Extended InstructionsAndes exclusive performance and functionality enhancements
MMU and Sv32 virtual memory translationFor Linux and advanced operating systems with protection between kernel and user program
32-Bit CPU architectureHigh performance vector core with small code size and gate count
Machine (M), User (U) and Supervisor (S) Privilege levelsFull privilege protections

CPU Core

Key FeaturesBenefits
>6.3 CoreMark/MHz, >3.9 DMIPS/MHz, >4.3 SpecInt2006/GhzSuperior performance-per-MHz
8-stage in-order superscalar pipelineSuperior performance-efficiency, while allowing for high speeds

Extensive branch predication features

  • Branch Target Buffer (BTB)
  • Branch Histroy Table (BHT)
  • Return Address Stack (RAS)
  • Branch Target Buffer and Branch History Table to speed up control codes
  • Return Address Stack to speeds up procedure returns

MMU (Memory Management Unit)

  • Sv32 virtual-memory systems
  • 4/8-entry fully associative ITLB/DTLB
  • 32-512 entry 4-way set-associative shared TLB
  • Hardware page table walker
  • Virtual memory support for full address space and easy code/data sharing
  • Support for full-featured OS such as Linux
  • Protection of supervisor and user privilege
  • Hardware for fast address translation
Physical Memory Protection (PMP), configurable up to 32 regionsBasic read/write/execute memory protection with minimum cost
Programmable Physical Memory Attribute (PMA), configurable up to 16 regions

Configurable memory attributes:

  • Memory, I/O, None
  • Cacheable/Non-cacheable
  • Read/write/read & write allocate, no allocate
  • Access fault for non-existent regions
Performance monitorsProgram code performance tuning
StackSafe™ hardware stack protection
  • Easy identification of stack size threshold during development
  • Hardware error detection of stack overflow and underflow at runtime
PowerBrake technologyPerformance throttling to digitally reduce power consumption

* DMIPS/MHZ follow Dhrystone’s no-inline ground rules, best performances 

Memory Subsystems

Key FeaturesBenefits

Level-1 I-Cache & D-Cache

  • Size: 16KB to 64KB
  • Cache line size: 64 bytes
  • Set associativity: 4-way

Level-2 Private Unified Cache

  • Configurable from 64KB to 512KB
  • 64-byte cache line size
  • 8-way, pseudo random replacement

Level-3 Shared Cache

  • Configurable from 256KB to 32MB
  • 64-byte cache line size
  • 16-way, pseudo random replacement
  • Accelerating accesses to slow memories
  • Flexible cache configurations
  • Accelerate performance with larger private 2nd level cache
  • Large shared L3 cache

ILM & DLM

  • Size: 4KB to 16MB
  • Scalar core access only
  • SRAM interface support
  • Bus managers accessed by AXI subordinate port
  • For deterministic and efficient program execution
  • Flexible size selection to fit diversified needs

HVM

  • Size: 32KB to 4GB
  • Accessible by both VPU and scalar core
  • Fast vector access with DLEN data width
  • Subordinate port for external DMA
  • When MMU is configured, HVM is addressed by translated physical memory, therefore it can also work with virtual memory under Linu
  • Local memory for vector data, especially for AI model, can work with external DMA to hide the latency
  • High speed direct access
  • Sharable by up to 16 cores with multiple banks design
Optional ECC error protection with SRAM interfaceCode and data integrity protection
Bus Manager port: AXI with 128/256-bit data, I/D joint or separate busHigh throughput with wide data path
BUS Subordinate Port: AXI with 128/256-bit data, for ILM/DLM accessesEfficient data transfer between CPU and SoC masters
Core/bus clock ratio of N:1Simplified SoC integration

Multicore Cache Coherence

Key FeaturesBenefits
  • Support up to 16 cores
  • MESI cache coherence protocol
  • 128/256-bit I/O coherence port for cacheless bus managers
  • Symmetric multicore and shared L3 cache controller with cache coherence, and I/O coherence for bus managers without caches
  • Convenient and efficient interface for SoCs with rich I/O transactions

Vector Processing Unit (VPU)

Key FeaturesBenefits
  • RISC-V V-extension (RVV) 1.0 spec
  • Custom RVV instructions based on ACE-RVV
  • LMUL supporting 1, 2, 4, 8, 1/2, 1/4, 1/8
  • BF16 data type support for all floating point computations
 Standard and Custom RISC-V vector support
  • Vector dual issue
  • Available 2nd multiply-add unit
  • Two vector instructions can be issued at same cycle
  • up to 2 MAC units for maximum performance
  • Configurable VLEN/DLEN from 128/256 bits with 1:1 or 2:1 ratio
  • Multiple independent vector execution units for parallel execution
  • Dual Independent memory access paths with dual RVV load/store and Andes Streaming Port (ASP) load/store
  • Addressing a wide range of compute requirements with different area/performance trade-off
  • High Speed memory accesses to CoProcessor memory in addition to the standard memory hierarchy with DLEN data width

Andes Matrix Multiply Extension (AMM)

Key FeaturesBenefits
  • 2D load/store
  • Upto 16*16 Matrix Multiply
  • INT8 data type
Accelerate AI computation

Platform-Level Interrupt Controller (PLIC)

Key FeaturesBenefits

Implements RISC-V PLIC specification

  • Up to 1023 interrupt sources
  • Up to 255 interrupt priority levels
Interrupt handling for SoC with multiple processors

Enhanced interrupt features

  • Priority-based preemption
  • Selectable edge trigger or level trigger
Complete hardware preemption support

Debug Support

Key FeaturesBenefits
Implements RISC-V debug specifications ver 1.0Supported by industry debug tool suppliers
JTAG Debug PortIndustry-standard support
Embedded Debug Module with up to 8 triggersFlexible configurations to tradeoff between gate count and debugging capabilities
Exception redirection supportEntering debugger upon selected exceptions without using breakpoints
RISC-V Trace 1.0 Instruction Trace interfaceSupported by Andes tools