AndesCore™ NX27V Processor

64-bit CPU with RISC-V Vector Extension

AndesCore™ NX27V Overview

  • AndeStar™ V5 Instruction Set Architecture (ISA), compliant to RISC-V technology
  • RISC-V vector extension
  • Vector Processing Unit (VPU) boost the performance of AI, AR/VR, computer vision, cryptography, and multimedia processing
  • Andes extensions, architected for performance and functionality enhancements
  • Separately licensable Andes Custom Extension™ (ACE) for customized acceleration
  • 64-bit CPU architecture, enabling software to utilize the memory spaces far beyond 4G bytes imposed by 32-bit CPUs
  • 16/32-bit mixable instruction format for compacting code density
  • Branch predication to speed up control code
  • Return Address Stack (RAS) to speed up procedure returns
  • Physical Memory Protection (PMP) and Programmable Physical Memory Attributes (PMA)
  • MemBoost for heavy memory transactions
  • Flexibly configurable Platform-Level Interrupt Controller (PLIC) for supporting wide range of system event scenarios
  • Enhancement of vectored interrupt handling for real-time performance
  • Advanced CoDense™ technology to further reduce code size on top of “C” extension

The 64-bit NX27V is a vector processor with 5-stage scalar pipeline that supports the latest RISC-V specification, including the IMAFD standard instructions, “C” 16-bit compression instructions, “P” DSP extension instructions, “V” vector extension instructions and “N” for user-level interrupts. It brings enhanced performance in memory subsystem with higher memory bandwidth and memory latency reduction by supporting multiple outstanding data access. NX27V features branch prediction, instruction and data caches, local memories, ECC error protection, and Andes Custom Extension™ (ACE) to add proprietary instructions to accelerate performance/power consumption critical spots. It also includes vectored and preemptive interrupts to serve diversified system events. AXI data bus for wide data access, PowerBrake and WFI mode for rich power management, and JTAG debug interface and trace interface for software development support. NX27V contains powerful Vector Processing Unit (VPU). It is ideal for applications with large arrays of data such as machine/deep learning, AR/VR, cryptography, multimedia processing, networking and scientific computing.

Applications

  • Machine Learning, Deep Learning, AR/VR
  • Multimedia processing
  • Cryptography
  • Networking and scientific computing

Block Diagram

Development Tools

  • AndeSight™ Integrated Development Environment
  • COPILOT: Custom-OPtimized Instruction deveLOpment Tool for ACE
  • ICE debugging hardware

Key Features and Performance

AndeStar™ V5 Architecture

Key FeaturesBenefits
RISC-V RV64GC-N-P-V ISA
  • State-of-the art ISA from latest developments in computer architecture
  • Industry standard and open architecture
64-bit CPU architecture Enabling software to utilize the memory spaces far beyond the 4G byte limit of 32-bit CPUs
RISC-V V-extension instructions with versatile operations Boost the performance of AI, AR/VR, computer vision, cryptography, and multimedia processing
Andes Extended Instructions Andes exclusive performance and functionality enhancements
Andes Custom Extension™ (ACE) option to create customized instructions for software acceleration
  • Add customized instruction extensions to facilitate Domain-Specific Architecture/Acceleration (DSA)
  • Boost application performance significantly, at the same time maintain the programmability
  • Powerful constructs are available to define high level instruction
  • ACE design is based on Verilog and C languages which are familiar to the designers
  • The COPILOT tool automatically generates the extended CPU and software toolchain
  • Streaming port for intelligent  data access control
16/32-bit mixable instruction formatFor compact code density
32 general-purpose registersFor better code size and performance
Machine (M),User (U) Embedded systems with privilege protections

CPU Core

Key FeaturesBenefits
3.52 Coremark/MHz, 2.09 DMIPS/MHz*Superior performance-per-MHz
5-stage pipeline, with a full-cycle reserved for critical SRAM accessesSuperior performance-efficiency, while allowing for high speeds

Extensive branch predication features

  • Branch Target Buffer (BTB): 32, 64, 128 or 256-entry
  • Branch History Table (BHT): 256-entry, with 8-bit branch history
  • Return Address Stack (RAS): 4-entry
  • Branch Target Buffer and Branch History Table to speed up control codes
  • Return Address Stack to speeds up procedure returns

Vector Processing Unit (VPU)

  • Support RISC-V V-extension spec. including vector loads and stores, vector integer arithmetic, vector fixed-point, vector floating-point, vector reduction operations, vector mask, vector permutation, vector dot-product and Andes extended format bfloat16 and int4  
  • 32 vector registers
  • Data formats:
    • SEW supported: int8, int16, int32, int64, fp16, fp32, fp64
    • Extension formats: bfloat16 and int4
  • Support LMUL 1, 2, 4, 8, 1/2, 1/4, 1/8
  • Configurable VLEN, SIMD and Memory Width: combinations of 128-bit, 256-bit, 512-bit
  • Functional units chainable, most fully pipelined
  • Independent memory access paths for RVV load/store and ACE Streaming Port load/store
  • Over 300+ vector instructions, covering load/store, integer, fixed-point/ floating-point operations
  • 2x and 4x data expansion arithmetic
  • Out of order execution
  • Extended format for AI applications
Physical Memory Protection (PMP), 16 regionsBasic read/write/execute memory protection with minimum cost
Programmable Physical Memory Attribute (PMA), 16 regions

Configurable memory attributes:

  • Memory, I/O, None
  • Cacheable/Non-cacheable
  • Write-back/Write-through
  • Read/write/read & write allocate, no allocate
  • Access fault for non-existent regions
Performance monitorsProgram code performance tuning
StackSafe™ hardware stack protection
  • Easy identification of stack size threshold during development
  • Hardware error detection of stack overflow and underflow at runtime

Multiplier options

  • Fast multiplier: pipelined, 2-cycle
  • Small multipliers: producing 1, 2, 4, or 8 bits per cycle
Option to choose between speed and area according to application's requirements
PowerBrake technologyPerformance throttling to digitally reduce power consumption
QuickNap™ technologyFast power-down/wake-up support for caches

* BSP v5.0.0, DMIPS/MHZ follow Dhrystone’s no-inline ground rules, best performances

Memory Subsystems

Key FeaturesBenefits

I-Cache & D-Cache

  • Size: 8KB to 64KB
  • Set associativity: 2-way or 4-way
  • Accelerating accesses to slow memories
  • Flexible cache configurations
MemBoost – Data Cache Write-AroundSmart cache line allocation policy, for better cache utilization and reduce number of memory accesses
MemBoost – Instruction and Data Pre-fetchConditionally fill instruction and data caches in advance, for minimum memory access latency
MemBoost – Multiple Outstanding Mem. Req.Issue multiple transactions to data memory sub-system for higher bus utilization, also support out-of-order completion
MemBoost – Dedicated I & D Bus InterfacesSeparate instruction and data buses, for instruction and data’s own memory transactions
Soft-error protection: ECC or parity for I-Cache and D-CacheCode and data integrity protection
Bus master port: AXI bus with data width matching configurable memory widthUser-selectable configuration for optimal efficiency

Platform-Level Interrupt Controller (PLIC)

Key FeaturesBenefits

Implements RISC-V PLIC specification

  • Up to 1023 PLIC interrupt sources
  • Up to 255 PLIC interrupt priority levels
  • Up to 16 PLIC interrupt targets
Allow individual interrupts to be serviced and prioritized without sharing

Enhanced interrupt features

  • Vectored interrupt dispatch
  • Priority-based preemption
  • Selectable edge trigger or level trigger
  • Faster interrupt handling for real-time applications
  • Complete hardware preemption support for faster response
  • Flexible interrupt source interface for simpler SoC design

Debug Support

Key FeaturesBenefits
Implements RISC-V debug specificationsSupported by industry debug tool suppliers
JTAG Debug PortIndustry-standard support
Embedded Debug Module with up to 8 triggersFlexible configurations to tradeoff between gate count and debugging capabilities
Exception redirection supportEntering debugger upon selected exceptions without using breakpoints

Product Package

AndesCore™ NX27V Processor with AE350 Platform

  • Pre-integrated NX27V with CPU subsystem (including PLIC, Timer and Debug Module), and AXI platform