AndesCore™ D10

DSP Extention Processor

AndesCore™ D10 Overview

  • > 130 DSP extension instructions
  • Caches for fast code and data accesses
  • Local Memories for deterministic code and data accesses
  • IEEE754-compliant FPU coprocessor
  • Memory Protection Unit (MPU) for secure RTOS
  • Memory Management Unit (MMU) for Linux

The D1088 is a 5-stage pipeline integer processor with integrated DSP offering 130 DSP SIMD (single instruction, multiple data) instructions. Targeting the real time processing requirements of power-constrained multimedia applications, At 90nm low power process, the D1088 delivers 588 DMIPS, 134 percent higher than the competing offerings. When measured with the popular Whetstone floating-point benchmark , the D1088 achieves 92 percent better performance. When running the popular and comprehensive (over 200) DSP libraries, the D1088 is 116 percent faster with half the code size. Even with the above advantages, D1088 still comes with a little smaller die area and power per MHz. The D1088’s optimized DSP libraries and C/C++ compiler make algorithm programming easier. 

Tightly integrated Integer and DSP processor architectures are not new, but most were designed for applications where power was not as much a constraint as it is today. The new D1088 was designed with this new reality in mind. It contains functionality to enhance efficiency and reduce code size.  For example, to significantly boost the computational performance in matrix, filtering, Fourier Transform, and statistics functions, it can execute 4-way 8-bit, or 2-way 16-bit SIMD instructions in a single latency as well as 8-bit and 16-bit SIMD instructions. In addition, for multimedia applications, the D1088 also supports 64-bit as add, subtract, and multiply mixed computation. 

For voice application, the D1088 offers left shift, right rounding and shift, most significant word, 32×32 multiply and specially designed 32-bit instructions to replace the lengthy 64-bit computation.  To reduce code size and increase efficiency, the D1088 provides a Zero Overhead Loop instruction to offload loop branching. To enhance parallel computational capacity, the D1088 provides left and right shift, minimum, maximum, and absolute value, besides traditional SIMD instructions such as add, subtract, and multiply.

Applications

  • Video event data recorder (VEDR)
  • Wireless device
  • Networking device
  • Storage device
  • DSC
  • DVC
  • Digital home
  • Embedded controller

Block Diagram

Development Tools

  • AndeSight™ Integrated Development Environment
  • AICE JTAG/SDP debugger hardware

Key Features and Performance

AndeStar™ V3 Architecture

Key FeaturesBenefits
21st-century RISC-like instruction set Better performance for modern compiler
16/32-bit mixable opcode format Smaller code size
16 or 32 general-purpose registers Trade-off between core size and performance requirements
All-C Embedded Programming Faster SW development and easier maintenance
Shadow stack pointer Efficiency and protection with a dedicated kernel stack pointer
Hardware divider More performance
Aligned and unaligned load/store multiple word instructions and post-increment load/store memory accesses Better program code size and performance
Direct support of up to 32 interrupts with programmable priority levels Quick identification of interrupt sources and fast assignment of service routines
4G address space Full range address space
Memory mapped IO Easy to program and friendly to compiler

CPU Core

Key FeaturesBenefits
  • 2.41 DMIPS/MHz*
  • 3.90 CoreMark/MHz*
Superior performance-per-MHz
5-stage pipeline Superior performance-efficiency, while allowing for high speeds

DSP extension instructions

  • > 130 instructions
  • Zero-overhead loop
  • Saturation and Rounding
  • Fractional Q31/Q15/ Q7 data types
  • Integer U32/U16/ U8 data types
  • 16-bit and 8-bit SIMD instructions
  • 64-bit signed/unsigned addition & subtraction
  • GCC intrinsic functions to use in C
  • GCC vector data type for SIMD instructions
Better performance for branches
Extensive branch predication (BTB and RAS) Better performance for branches
Hardware stack protection Stack size determination and runtime overflow error detection
Processor state bus Simplification SoC design and debugging
Performance monitors Program code performance tuning

Memory Management Unit

  • 32/64/128-entry 4-way set-associative main TLB
  • Hardware page table walker
  • Support two groups of page size (4KB & 1MB, 8KB & 1MB)
  • virtual memory support for full address space and easy code/data sharing
  • Support for full-featured OS such as Linux
  • Protection of superuser and user privilege
  • Hardware for fast address translation

Memory Protection Unit

  • 8 memory protection regions
Basic read/write/execute memory protection with minimun cost
Fast multipliers (1 cycle) More performance
Extensive clock gating and logic gating Lower power
N:1 core/bus clock ratios Simplified SoC integration
Low-latency vectored interrupt Faster context switch for real-time applications
Completion of most operations in 1 cycle Single-cycle capable for Local Memory and AHB bus accesses Better performance-efficiency
PowerBrake technology Peak power consumption reduction
Coprocessor interface For Andes FPU and other customer designed coprocessor unit

* BSP v4.2.0, DMIPS/MHZ without no-inline option, best performances

Memory Subsystems

Key FeaturesBenefits

I & D Cache

  • Virtually Indexed and Physically Tagged (VIPT)
  • Size:4KB to 64KB, line size:16B/32B
  • Set associativity: Direct-mapped/ 2 Way

Higher performance for large program size

  • Accelerating accesses to slow memories
  • Flexible cache configurations
  • VIPT for low power on context switch

Optional External Instruction and Data Local Memory

  • Size: 1KB to 4MB
  • ILM: program code, data and IO
  • DLM: program data

Higher efficiency for program execution

  • Flexible size selection to fit diversified needs
BIU supports 32-bit AHB/2AHB/AHB-lite/APB/AXI User-selectable bus interface for optimal efficiency

Debug Support

Key FeaturesBenefits
2-wire Serial Debug Port or 5-wire JTAG Debug Port Low-cost 2 wire support and industry-standard 5-wire support

Embedded Debug Module (EDM)

  • Up to 8 breakpoints and watchpoints
  • Secured debug access to system address space
  • Flexible configurations to trade off gate count and debugging capabilities
  • Code and data protection by allowing only authorized debugging

Performance

Process90LP28HPM
Frequency (MHz)5050
Dynamic power (uW/MHz)32.57.98
Area (mm2)0.160.029

* Base configuration, RVt library. ; Power consumption at typical process corner, Vdd (90LP: 1.2V, 40LP: 1.1V, 28HPM: 0.9V), 25°C

Process40LP28HPM
Frequency (MHz)492804
Dynamic power (uW/MHz)9.39.2
Area (mm2)0.110.06

* Base configuration, LVt library; Frequency at slow process corner, 40LP: 0.99V, 28HPM:0.81V, 125°C and without I/O constraint; Power consumption at typical process corner, Vdd (40LP:1.1V, 28HPM: 0.9V), 25°C