SlideShare une entreprise Scribd logo
1  sur  33
Multi-core processing
Max. CPU clock rate to 2.34 GHz
Min. feature size 14 nm
Instruction set A64, A32, T32
Microarchitecture Hurricane and Zephyr both ARMv8-A-
compatible
Cores 2× Hurricane + 2× Zephyr
Predecessor Apple A9, Apple A9X
GPU 6-core
Application Mobile
Review of multi-core
processing based on
By M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multi-core processing
 Microprocessor is an IC which has only the Central
Processing Unit (CPU)
 They lack RAM, ROM, and other peripherals
 The Intel Pentium series, i3, i5, Cortex A8 are popular
microprocessors which find their applications in desktop
PC’s, Laptops, notepads etc.
 To deploy a microprocessor peripherals are required to be
connected
What is microcontroller and
microprocessor ???
03M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multi-core processing
 microcontrollers possess a CPU along with RAM, ROM,
and other peripherals
 microcontrollers possess a CPU along with RAM, ROM,
and other peripherals
 power is far less as compared to an actual computer
 Popular microcontrollers in the market are 8051, STM32,
PIC32, Arduino, ATMEL etc.
What is microcontroller and
microprocessor ???
04M-Reza Khalifeh Mahmoodi
Presentation name goes here
microprocessor
microcontroller
05
Multi-core processing
Microprocessors are designed for generic and unspecific
applications like on a PC. There is generally an Operating
System (OS) installed which coordinates the different I/O’s.
They need high amount of resources like RAM, ROM, I/O
ports etc, which leads to increase in total cost of the
system.
Microcontrollers are designed to perform specific tasks. For
example washing machine, refrigerator, microwave, cars,
bikes, telephones etc. As the application is limited to one or
two processes, they require small resources like RAM,
ROM, I/O ports etc. This allows the integration of all these
components on a single chip leading to reduced size and
cost.
M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multi-core processing
06
Attributes Microcontrollers Microprocessors
Application
Are application specific
and are designed to
perform certain limited
tasks.
Have generic application
and are capable of
executing big and
complicated tasks.
One Solution
Have inbuilt processor,
RAM, ROM and I/O Ports.
Like a small stand-alone
computer in a single
Integrate Chip.
Generally don’t have
inbuilt RAM, ROM and I/O
ports. The pins are used to
interface with external
RAM, ROM and ports.
Performance Limited performance. Very high performance.
Speed
Generally operate at
speeds from 8 MHz – 200
MHz.
Generally operate at
speeds above 1 GHz.
Power Consumption
Are embedded inside
other devices, so are
designed to consume less
power.
Consume relative more
power. As performance is
the given higher
weightage over power.
Cost
Affordable and cheap. Can
get started with
a minimum external circuit.
Very expensive and
requires other peripherals
to work along.
M-Reza Khalifeh Mahmoodi
Presentation name goes here
Altera
Analog Devices
Atmel
Cypress Semiconductor
Maxim Integrated
ELAN Microelectronics Corp.
EPSON Semiconductor
Freescale Semiconductor
Fujitsu
Holtek
Hyperstone
Infineon
Intel
Lattice Semiconductor
Microchip Technology
National Semiconductor
NEC
19
NXP Semiconductors
Panasonic
Rabbit Semiconductor
Renesas Electronics
Rockwell
Silicon Laboratories
Silicon Motion
Sony
Spansion
STMicroelectronics
Texas Instruments
Toshiba
Ubicom
Xemics
Xilinx
XMOS
ZiLOG
This is a list of common
microcontrollers listed by brand
Presentation name goes here
ARM is the industry's leading supplier of microprocessor technology, offering the
widest range of microprocessor cores to address the performance, power and cost
requirements for almost all application markets. Combining a vibrant ecosystem with
over 1,000 partners delivering silicon, development tools and software, and more than
86 billion processors sold, ARM truly is “The Architecture for the Digital World
ARM, originally Acorn RISC Machine, later Advanced RISC Machine, is a family
of reduced instruction set computing (RISC) architectures for computer processors,
configured for various environments. British company ARM Holdings develops the
architecture and licenses it to other companies
In 2009, some manufacturers introduced netbooks based on ARM architecture CPUs,
in direct competition with netbooks based on Intel Atom. According to analyst firm IHS
iSuppli, by 2015, ARM Integrated circuits may be in 23% of all laptops.
A RISC-based computer design approach means processors require
fewer transistors than typical complex instruction set
computing (CISC) x86 processors in most personal computers. This approach
reduces costs, heat and power use. These characteristics are desirable for light,
portable, battery-powered devices—​including, smartphones, laptops and tablet
computers, and other embedded systems. For supercomputers, which consume large
amounts of electricity, ARM could also be a power-efficient solution
ARM Holdings
08
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here 09
ARM family ARM architecture ARM core Feature Cache (I / D), MMU Typical MIPS @ MHz
ARM1 ARMv1 ARM1 First implementation None
ARM2
ARMv2 ARM2
ARMv2 added the MUL
(multiply) instruction
None
4 MIPS @ 8 MHz
0.33 DMIPS/MHz
ARMv2a ARM250
Integrated MEMC (MMU
), graphics and I/O
processor. ARMv2a
added the SWP and
SWPB (swap)
instructions
None, MEMC1a 7 MIPS @ 12 MHz
ARM3 ARMv2a ARM3
First integrated memory
cache
4 KB unified
12 MIPS @ 25 MHz
0.50 DMIPS/MHz
ARM6 ARMv3
ARM60
ARMv3 first to support
32-bit memory address
space (previously 26-
bit).
ARMv3M first added
long multiple instructions
(32x32=64).
None 10 MIPS @ 12 MHz
ARM600
As ARM60, cache and
coprocessor bus (for
FPA10 floating-point
unit)
4 KB unified 28 MIPS @ 33 MHz
ARM610
As ARM60, cache, no
coprocessor bus
4 KB unified
17 MIPS @ 20 MHz
0.65 DMIPS/MHz
ARM7 ARMv3
ARM700 8 KB unified 40 MHz
ARM710
As ARM700, no
coprocessor bus
8 KB unified 40 MHz
ARM710a As ARM710 8 KB unified
40 MHz
0.68 DMIPS/MHz
This is a list of microarchitectures based on the ARM family
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here 10
ARM7T ARMv4T
ARM7TDMI(-S)
3-stage pipeline,
Thumb, ARMv4 first to
drop legacy ARM 26-
bit addressing
None
15 MIPS @ 16.8 MHz
63 DMIPS @ 70 MHz
ARM710T As ARM7TDMI, cache 8 KB unified, MMU 36 MIPS @ 40 MHz
ARM720T As ARM7TDMI, cache 8 KB unified, MMU with FCSE (Fast Context Switch Extension) 60 MIPS @ 59.8 MHz
ARM740T As ARM7TDMI, cache MPU
ARM7EJ ARMv5TEJ ARM7EJ-S
5-stage pipeline,
Thumb, Jazelle DBX,
Enhanced DSP
instructions
None
ARM8 ARMv4 ARM810[4][5]
5-stage pipeline, static
branch prediction,
double-bandwidth
memory
8 KB unified, MMU
84 MIPS @ 72 MHz
1.16 DMIPS/MHz
ARM9T ARMv4T
ARM9TDMI 5-stage pipeline, Thumb None
ARM920T As ARM9TDMI, cache 16 KB / 16 KB, MMU with FCSE (Fast Context Switch Extension)[6] 200 MIPS @ 180 MHz
ARM922T As ARM9TDMI, caches 8 KB / 8 KB, MMU
ARM940T As ARM9TDMI, caches 4 KB / 4 KB, MPU
ARM9E
ARMv5TE
ARM946E-S
Thumb, Enhanced DSP
instructions, caches
Variable, tightly coupled memories, MPU
ARM966E-S
Thumb, Enhanced DSP
instructions
No cache, TCMs
ARM968E-S As ARM966E-S No cache, TCMs
ARMv5TEJ ARM926EJ-S
Thumb, Jazelle DBX,
Enhanced DSP
instructions
Variable, TCMs, MMU 220 MIPS @ 200 MHz
ARMv5TE ARM996HS
Clockless processor, as
ARM966E-S
No caches, TCMs, MPU
ARM family ARM architecture ARM core Feature Cache (I / D), MMU Typical MIPS @ MHz
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here 11
ARM10E
ARMv5TE
ARM1022E As ARM1020E 16 KB / 16 KB, MMU
ARMv5TEJ ARM1026EJ-S Thumb, Jazelle DBX, Enhanced DSP instructions, (VFP) Variable, MMU or MPU
ARM11
ARMv6 ARM1136J(F)-S[7]
8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), Enhanced DSP
instructions
Variable, MMU
740 @ 532–665 MHz (i.MX31 SoC), 400–
528 MHz
ARMv6T2 ARM1156T2(F)-S 9-stage pipeline,[8] SIMD, Thumb-2, (VFP), Enhanced DSP instructions Variable, MPU
ARMv6Z ARM1176JZ(F)-S As ARM1136EJ(F)-S Variable, MMU + TrustZone
965 DMIPS @ 772 MHz, up to
2,600 DMIPS with four processors[9]
ARMv6K ARM11MPCore As ARM1136EJ(F)-S, 1–4 core SMP Variable, MMU
SecurCore
ARMv6-M SC000 0.9 DMIPS/MHz
ARMv4T SC100
ARMv7-M SC300 1.25 DMIPS/MHz
Cortex-M
ARMv6-M
Cortex-M0[10]
Microcontroller profile, most Thumb + some Thumb-2,[11] hardware
multiply instruction (optional small), optional system timer, optional bit-
banding memory
Optional cache, no TCM, no MPU 0.84 DMIPS/MHz
Cortex-M0+[12]
Microcontroller profile, most Thumb + some Thumb-2,[11] hardware
multiply instruction (optional small), optional system timer, optional bit-
banding memory
Optional cache, no TCM, optional MPU with
8 regions
0.93 DMIPS/MHz
Cortex-M1[13]
Microcontroller profile, most Thumb + some Thumb-2,[11] hardware
multiply instruction (optional small), OS option adds SVC / banked
stack pointer, optional system timer, no bit-banding memory
Optional cache, 0-1024 KB I-TCM, 0-
1024 KB D-TCM, no MPU
136 DMIPS @
170 MHz,[14](0.8 DMIPS/MHz FPGA-
dependent)[15]
ARMv7-M Cortex-M3[16]
Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide
instructions, optional bit-banding memory
Optional cache, no TCM, optional MPU with
8 regions
1.25 DMIPS/MHz
ARMv7E-M
Cortex-M4[17]
Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP
single-precision FPU, hardware multiply and divide instructions,
optional bit-banding memory
Optional cache, no TCM, optional MPU with
8 regions
1.25 DMIPS/MHz (1.27 w/FPU)
Cortex-M7[18]
Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single
and double precision FPU, hardware multiply and divide instructions
0-64 KB I-cache, 0-64 KB D-cache, 0-16 MB
I-TCM, 0-16 MB D-TCM (all these w/optional
ECC), optional MPU with 8 or 16 regions
2.14 DMIPS/MHz
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here 12
Cortex-R ARMv7-R
Cortex-R4[19]
Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU, hardware
multiply and optional divide instructions, optional parity & ECC for internal
buses / cache / TCM, 8-stage pipeline dual-core running lockstep with fault
logic
0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt MPU
with 8/12 regions
Cortex-R5[20]
Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision,
hardware multiply and optional divide instructions, optional parity & ECC for internal
buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic /
optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator
coherency port (ACP)[21]
0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt MPU with
12/16 regions
Cortex-R7[22]
Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision,
hardware multiply and optional divide instructions, optional parity & ECC for internal
buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic /
out-of-order execution / dynamic register renaming / optional as 2 independent
cores, low-latency peripheral port (LLPP), ACP[21]
0–64 KB / 0–64 KB, ? of 0–128 KB TCM, opt MPU with
16 regions
Cortex-R8[23] TBD TBD
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here 13
Cortex-A
(32-bit)
ARMv7-A
Cortex-A5[24]
Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4-
D16 FPU / Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional
MPCore, snoop control unit (SCU), generic interrupt controller (GIC),
accelerator coherence port (ACP)
4-64 KB / 4-64 KB L1, MMU + TrustZone 1.57 DMIPS/MHz per core
Cortex-A7[25]
Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4-D16 FPU / NEON /
Jazelle RCT and DBX / Hardware virtualization, in-order execution, superscalar,
1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop
control unit (SCU), generic interrupt controller (GIC), ACP, architecture and
feature set are identical to A15, 8-10 stage pipeline, low-power design[26]
8-64 KB / 8-64 KB L1, 0–1 MB L2, MMU +
TrustZone
1.9 DMIPS/MHz per core
Cortex-A8[27]
Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT
and DAC, 13-stage superscalar pipeline
16-32 KB / 16–32 KB L1, 0–1 MB L2 opt
ECC, MMU + TrustZone
Up to 2000 (2.0 DMIPS/MHz in speed
from 600 MHz to greater than 1 GHz)
Cortex-A9[28]
Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU /
Optional NEON / Jazelle RCT and DBX, out-of-order speculative
issue superscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic
interrupt controller (GIC), accelerator coherence port (ACP)
16–64 KB / 16–64 KB L1, 0–8 MB L2 opt
parity, MMU + TrustZone
2.5 DMIPS/MHz per core,
10,000 DMIPS @ 2 GHz on
Performance Optimized
TSMC 40G (dual-core)
Cortex-A12[29]
Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware
virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, Large
Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt
controller (GIC), accelerator coherence port (ACP)
32-64 KB 3.0 DMIPS/MHz per core
Cortex-A15[30]
Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON /
integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-
order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical
Address Extensions (LPAE), snoop control unit (SCU), generic interrupt
controller (GIC), ACP, 15-24 stage pipeline[26]
32 KB w/parity / 32 KB w/ECC L1, 0–
4 MB L2, L2 has ECC, MMU + TrustZone
At least 3.5 DMIPS/MHz per core (up to
4.01 DMIPS/MHz depending on
implementation)[31]
Cortex-A17[32]
Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON /
integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-
order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical
Address Extensions (LPAE), snoop control unit (SCU), generic interrupt
controller (GIC), ACP
32 KB L1, 256 KB-8 MB L2 w/optional
ECC
2.8 DMIPS/MHz
ARMv8-A Cortex-A32[33]
Application profile, AArch32, 1-4 SMP cores, TrustZone, NEON advanced SIMD,
VFPv4, hardware virtualization, dual issue, in-order pipeline
8-64 KB w/optional parity / 8-64 KB
w/optional ECC L1 per core, 128 KB-1 MB
L2 w/optional ECC shared
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here 14
Cortex-A
(64-bit)
ARMv8-A
Cortex-A35[34]
Application profile, AArch32 and AArch64, 1-4
SMP cores, TrustZone, NEON advanced SIMD,
VFPv4, hardware virtualization, dual issue, in-
order pipeline
8-64 KB w/parity / 8-64 KB w/ECC L1 per core, 128 KB-1 MB L2 shared,
40-bit physical addresses
1.78 DMIPS/MHz
Cortex-A53[35]
Application profile, AArch32 and AArch64, 1-4 SMP
cores, TrustZone, NEON advanced SIMD, VFPv4,
hardware virtualization, dual issue, in-order pipeline
8-64 KB w/parity / 8-64 KB w/ECC L1 per core, 128 KB-2 MB L2 shared, 40-
bit physical addresses
2.3 DMIPS/MHz
Cortex-A57[36]
Application profile, AArch32 and AArch64, 1-4 SMP
cores, TrustZone, NEON advanced SIMD, VFPv4,
hardware virtualization, multi-issue, deeply out-of-
order pipeline
48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB-2 MB L2 shared
w/ECC; 44-bit physical addresses
4.6 DMIPS/MHz
Cortex-A72[37]
Application profile, AArch32 and AArch64, 1-4 SMP
cores, TrustZone, NEON advanced SIMD, VFPv4,
hardware virtualization, multi-issue, deeply out-of-
order pipeline
48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB-2 MB L2 shared
w/ECC; 44-bit physical addresses
4.8 DMIPS/MHz
Cortex-A73[38]
Application profile, AArch32 and AArch64, 1-4 SMP
cores, TrustZone, NEON advanced SIMD, VFPv4,
hardware virtualization, multi-issue, deeply out-of-
order pipeline
64 KB / 32-64 KB L1 per core, 256 KB-8 MB L2 shared w/ optional ECC, 44-
bit physical addresses
4.9 DMIPS/MHz
M-Reza Khalifeh Mahmoodi
Multi-core processing
Presentation name goes here
Why should we use
Multi-core processing
instead of Single-core ?Mobile devices perform a wide variety of tasks such as Web browsing, video playback,
mobile
gaming, SMS text messaging, and location-based services. Due to the growth in the
availability
of high speed mobile and Wi-Fi networks, mobile devices will also be used for various
performance-intensive tasks that were previously handled by traditional PCs. The next
generation of smartphones (called “Super phones”) and tablets will be used for a wide
variety of
tasks such as playback of high definition 1080p videos, Adobe® Flash®-based online
gaming,
Flash-based streaming high definition videos, visually rich gaming, video editing,
simultaneous
HD video downloads, encode and uploads, and real-time HD video conferencing.
15
Presentation name goes here
 Difficult to make single-core
 clock frequencies even higher
 P = C × V 2 × F
 Moore's law
 Deeply pipelined circuits:
 heat problems
 speed of light problems
 difficult design and verification
 large design teams necessary
 server farms need expensive air-conditioning
 Many new applications are multithreaded
Why should we use Multi-
core processing instead
Single-core ?
The
Unique
Screen
Mockup
16
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multi-core processing
 Large problems can often be divided into smaller ones
 Several different forms of parallel computing
 Bit-level - processor word size
 Instruction-level – Hardware & Software
e = a + b f = c + d m = e * f
 Task parallelism
 Data
Parallel computing ???
17M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multi-core processing
 speedup from parallelization would be linear
 doubling the number of processing elements should halve the
runtime
 Slatency (s) =
1
1−𝑃+
𝑃
𝑆
 Slatency is the potential speedup in latency of the whole task;
 s is the speedup in latency of the execution of the
parallelizable part of the task;
 P is the percentage of the execution time of the whole task
concerning the parallelizable part of the task before
parallelization.Speedup in a serial program
 Speedup in a serial program
 For example, with a serial program in two parts A and B for
which TA = 3 s and TB = 1 s,
Amdahl's law ?
18M-Reza Khalifeh Mahmoodi
Presentation name goes here
Michael J. Flynn
Flynn proposed Flynn's
taxonomy, a method of
classifying digital computers,
in 1966
professor emeritus at
Stanford University
Flynn's taxonomy !!!!
 Classifications
 Single instruction stream single data stream (SISD)
 Single instruction stream, multiple data streams (SIMD)
 Multiple instruction streams, single data stream (MISD)
 Multiple instruction streams, multiple data streams (MIMD)
 Further divisions
 Single program, multiple data streams (SPMD)
 Multiple programs, multiple data streams (MPMD)
19
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
Developers must also choose the appropriate form of multiprocessing for
their application requirements. This choice will determine how easily both
new and existing code can achieve maximum concurrency. As Table 1
illustrates, developers have three basic forms to choose from:
 Asymmetric multiprocessing (AMP),
 Symmetric multiprocessing (SMP)
 Bound multiprocessing (BMP).
Running AMP, SMP or BMP Mode for
Multicore Systems
20
Model How it Works Key Advantages
Asymmetric
multiprocessi
ng (AMP)
A separate OS, or a separate copy of the same OS,
manages each core. Typically, each software process is
locked to a single core (e.g. process A runs only on core 1,
process B runs only on core 2, etc.).
Provides an execution environment
similar to that of uniprocessor
systems, allowing simple migration
of legacy code. Also allows
developers to manage each core
independently.
Symmetric
multiprocessi
ng (SMP)
A single OS manages all processor cores simultaneously.
The OS can dynamically schedule any process on any
core, enabling full utilization of all cores.
Provides greater scalability and
parallelism than AMP, along with
simpler shared resource
management.
Bound
multiprocessi
ng (BMP)
A single OS manages all cores simultaneously. As in SMP,
the OS can dynamically schedule processes on any core.
However, the developer can also lock any process (and all
of its associated threads) to a specific core.
Combines the developer control of
AMP with the transparent resource
management of SMP. The option to
lock threads to any core simplifies
migration of legacy code and
allows designers to dedicate cores
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multi-core processing
 process and all of its threads are locked to a single processor
core
CPU Utilization in
AMP Mode
21M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multi-core processing
 SMP addresses many of the issues by running only one copy of
an OS across all the chip’s cores. Because the OS has insight
into all system elements at all times, it can allocate resources on
multiple cores with little or no input from the application designer
Symmetric Multiprocessing
(SMP) Mode
22M-Reza Khalifeh Mahmoodi
Presentation name goes here
An introduction to NXP
NXP Semiconductors N.V. is a Dutch global semiconductor
manufacturer headquartered in Eindhoven, Netherlands. The
company employs approximately 45,000 people in more than 35
countries, including 11,200 engineers in 23 countries. NXP reported
revenue of $6.1 billion in 2015, including one month of revenue
contribution from recently merged Freescale Semiconductor.
NXP said it was the fifth-largest non-memory semiconductor supplier in 2016,
NXP is the co-inventor of near field communication (NFC) technology
NXP manufactures automotive chips for in-vehicle networking
NXP invented the I²C interface over 30 years ago
23
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
An example of ‘Single-core’ ARM
Architecture
24
NXP, microcontroller based on a CORTEX-M3 , ARM family and the best
technical example of this family is LPC178x/7x series and more specifically
LPC1788 Microcontroller.
ARMv7-M , Microcontroller profile, Thumb / Thumb-2, hardware multiply and
divide instructions, optional bit-banding memory
32-bit ARM Cortex-M3 microcontroller; up to 512 kB flash and 96 kB SRAM;
USB Device/Host/OTG; Ethernet; LCD; EMC
The LPC178x/7x adds a specialized flash memory accelerator to accomplish
optimal
performance when executing code from flash. The LPC178x/7x operates at
up to
120 MHz CPU frequency.
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
An example of
‘Single-core’
ARM
Architecture
25
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
Multiprocessor System Designs come with a number of challenges
26
 Multiprocessor System Designs is the need for low power capability
 the higher total memory system and bus bandwidth is required
 careful bus architecture planning is needed
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here 26M-Reza Khalifeh Mahmoodi
TMS 1000 8,000 1974 Texas Instruments 8,000 nm
Intel 4004 2,300 1971 Intel 10,000 nm 12 mm²
Intel 8008 3,500 1972 Intel 10,000 nm 14 mm²
Intel 8086 29,000 1978 Intel 3,000 nm 33 mm²
Intel 8088 29,000 1979 Intel 3,000 nm 33 mm²
Intel 80386 275,000 1985 Intel 1,500 nm 104 mm²
ARM 1 25,000[7] 1985 Acorn 3,000 nm 50 mm²
Intel 80486 1,180,235 1989 Intel 1000 nm 173 mm²
ARM 3 300,000 1989 Acorn
ARM 6 35,000 1991 ARM
Pentium 3,100,000 1993 Intel 800 nm 294 mm²
ARM 9TDMI 111,000 1999 Acorn 350 nm 4.8 mm²
Pentium Pro 5,500,000 1995 Intel 500 nm 307 mm²
Core 2 Duo Conroe 291,000,000 2006 Intel 65 nm 143 mm²
Core 2 Duo Allendale 169,000,000 2007 Intel 65 nm 111 mm²
Itanium 2 Madison 6M 410,000,000 2003 Intel 130 nm 374 mm²
ARM Cortex-A9 26,000,000 2007 ARM 45 nm 31 mm²
Core 2 Duo Wolfdale 3M 230,000,000 2008 Intel 45 nm 83 mm²
Itanium 2 with 9 MB cache 592,000,000 2004 Intel 130 nm 432 mm²
Core i7 (Quad) 731,000,000 2008 Intel 45 nm 263 mm²
Apple A7 (dual-
core ARM64 "mobile SoC")
1,000,000,000 2013 Apple 28 nm 102 mm²
22-core Xeon Broadwell-E5 7,200,000,000 2016 Intel 14 nm 456 mm²
SPARC M7 10,000,000,000 2015 Oracle 20 nm
GP100 Pascal 15,300,000,000 2016 Nvidia 16 nm 610 mm²
Challenges
for low power
capability
Presentation name goes here 28M-Reza Khalifeh Mahmoodi
Challenges for low power capability
 Simplest way: Core standby mode algorithm
Presentation name goes here
Challenges for bus bandwidth
29
 Simple dual core system with ROM sharing
 gets 78% of ideal performance in Dhrystone 2.1
Verilog simulation
 using of a 64-bit flash and simple fetch buffer
92%
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
LPC microcontrollers based on
ARM Cortex-M4 Core (single-
and multi-core)
LPC43XX series
24
 Cortex-M4 or M4F cores up to 204 MHz
 Highest performance, best power efficiency
 DSP options, multi-high-speed connectivity,
advanced peripherals
 Cortex-M0+ or M0 coprocessors
Multi-core processing
M-Reza Khalifeh Mahmoodi
Presentation name goes here
LPC4350:
31
Multi-core processing
M-Reza Khalifeh Mahmoodi
32-bit ARM Cortex-M4/M0 MCU;
up to 264 kB SRAM;
Ethernet;
two High-speed USBs;
advanced configurable peripherals
Presentation name goes here
i.MX 8 Family processors based on
ARM® Cortex-A53, Cortex-A72 +
Cortex-M4 cores
32
Multi-core processing
M-Reza Khalifeh Mahmoodi
Multi-screen (4x) support
Fast multi-OS platform deployment via
advanced full-chip hardware virtualization
and domain protection
Deploy rich, fully-independent graphics
content across 4x HD screens or 1x 4K
screen
Android™*, Linux®*, FreeRTOS, QNX™*,
Green Hills®, Dornerworks* XEN™*
Automotive AEC-Q100 Grade 3 (-40° to
125° C Tj), Industrial (-40° to 105° C Tj),
Consumer (-20° to 105° C Tj)
Fully supported on NXP’s 10 and 15-year
Longevity Program
Presentation name goes here
http://theembeddedguy.com/2016/05/09/microcontroller-vs-microprocessor/
https://en.wikipedia.org/wiki/List_of_common_microcontrollers
https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures
https://www.eeweb.com/blog/arm/multi-core-mcu-design-with-arm-cortex-m-
processors-and-coresight-soc
http://www.nxp.com/products/microcontrollers-and-processors/arm-processors/lpc-
cortex-m-mcus:LPC-ARM-CORTEX-M-MCUS
http://www.nxp.com/
http://www.intel.eu/
http://www.st.com/
http://www.ti.com/
References:
08
Multi-core processing
M-Reza Khalifeh Mahmoodi

Contenu connexe

Tendances

Microcontroller overview 1
Microcontroller overview 1Microcontroller overview 1
Microcontroller overview 1Sally Salem
 
Presentation on driverless cars by shahin hussan
Presentation on driverless cars by shahin hussan Presentation on driverless cars by shahin hussan
Presentation on driverless cars by shahin hussan Shahinhussan
 
Quality attributes(Non operational) of embedded systems
Quality attributes(Non operational) of embedded systemsQuality attributes(Non operational) of embedded systems
Quality attributes(Non operational) of embedded systemsShreyaBhoje
 
Artificial intelligence(04)
Artificial intelligence(04)Artificial intelligence(04)
Artificial intelligence(04)Nazir Ahmed
 
Expert system neural fuzzy system
Expert system neural fuzzy systemExpert system neural fuzzy system
Expert system neural fuzzy systemJamil S. Alagha
 
The Deep Learning Compiler
The Deep Learning CompilerThe Deep Learning Compiler
The Deep Learning CompilerTae Young Lee
 
Automotive electronics Systems by Ravikumar Chilmula
Automotive electronics Systems by Ravikumar ChilmulaAutomotive electronics Systems by Ravikumar Chilmula
Automotive electronics Systems by Ravikumar ChilmulaRavikumar Chilmula
 
Intelligent parking assist system
Intelligent parking assist systemIntelligent parking assist system
Intelligent parking assist systemSARATH PRAKASH
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
Introduction to embedded systems
Introduction to embedded systemsIntroduction to embedded systems
Introduction to embedded systemsApurva Zope
 
Driving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous DrivingDriving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous DrivingYu Huang
 

Tendances (20)

Embedded system.ppt
Embedded system.pptEmbedded system.ppt
Embedded system.ppt
 
Advanced driver assistance systems
Advanced driver assistance systemsAdvanced driver assistance systems
Advanced driver assistance systems
 
Microcontroller overview 1
Microcontroller overview 1Microcontroller overview 1
Microcontroller overview 1
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Presentation on driverless cars by shahin hussan
Presentation on driverless cars by shahin hussan Presentation on driverless cars by shahin hussan
Presentation on driverless cars by shahin hussan
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Embedded system
Embedded systemEmbedded system
Embedded system
 
Quality attributes(Non operational) of embedded systems
Quality attributes(Non operational) of embedded systemsQuality attributes(Non operational) of embedded systems
Quality attributes(Non operational) of embedded systems
 
Microcontrollers
MicrocontrollersMicrocontrollers
Microcontrollers
 
Artificial intelligence(04)
Artificial intelligence(04)Artificial intelligence(04)
Artificial intelligence(04)
 
Expert system neural fuzzy system
Expert system neural fuzzy systemExpert system neural fuzzy system
Expert system neural fuzzy system
 
The Deep Learning Compiler
The Deep Learning CompilerThe Deep Learning Compiler
The Deep Learning Compiler
 
Adaptive cruise control acc
Adaptive cruise control accAdaptive cruise control acc
Adaptive cruise control acc
 
Smart parking
Smart parkingSmart parking
Smart parking
 
Automotive electronics Systems by Ravikumar Chilmula
Automotive electronics Systems by Ravikumar ChilmulaAutomotive electronics Systems by Ravikumar Chilmula
Automotive electronics Systems by Ravikumar Chilmula
 
Intelligent parking assist system
Intelligent parking assist systemIntelligent parking assist system
Intelligent parking assist system
 
Embedded systems basics
Embedded systems basicsEmbedded systems basics
Embedded systems basics
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Introduction to embedded systems
Introduction to embedded systemsIntroduction to embedded systems
Introduction to embedded systems
 
Driving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous DrivingDriving behavior for ADAS and Autonomous Driving
Driving behavior for ADAS and Autonomous Driving
 

En vedette

FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)rinnocente
 
CPU Verification Metrics
CPU Verification MetricsCPU Verification Metrics
CPU Verification MetricsDVClub
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overviewSunil Thorat
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory SystemsAnh Dung NGUYEN
 
Smart Phone CPU
Smart Phone CPUSmart Phone CPU
Smart Phone CPU오석 한
 
Android internals 00 - Introduction (rev_1.1)
Android internals 00 - Introduction (rev_1.1)Android internals 00 - Introduction (rev_1.1)
Android internals 00 - Introduction (rev_1.1)Egor Elizarov
 
Android internals 08 - System start up, Media subsystem (rev_1.1)
Android internals 08 - System start up, Media subsystem (rev_1.1)Android internals 08 - System start up, Media subsystem (rev_1.1)
Android internals 08 - System start up, Media subsystem (rev_1.1)Egor Elizarov
 
Android internals 01 - Basic concepts of mobile platforms (rev_1.1)
Android internals 01 - Basic concepts of mobile platforms (rev_1.1)Android internals 01 - Basic concepts of mobile platforms (rev_1.1)
Android internals 01 - Basic concepts of mobile platforms (rev_1.1)Egor Elizarov
 
Samsung processors: Exynos family
Samsung processors: Exynos familySamsung processors: Exynos family
Samsung processors: Exynos familySagar Patel
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAnh Dung NGUYEN
 
Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)
Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)
Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)Egor Elizarov
 
Android internals 02 - High-level architecture, version control system (rev_1.1)
Android internals 02 - High-level architecture, version control system (rev_1.1)Android internals 02 - High-level architecture, version control system (rev_1.1)
Android internals 02 - High-level architecture, version control system (rev_1.1)Egor Elizarov
 
Hardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsHardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsThe Linux Foundation
 
Android internals 06 - Binder, Typical subsystem (rev_1.1)
Android internals 06 - Binder, Typical subsystem (rev_1.1)Android internals 06 - Binder, Typical subsystem (rev_1.1)
Android internals 06 - Binder, Typical subsystem (rev_1.1)Egor Elizarov
 
Android internals 03 - Build system, emulator (rev_1.1)
Android internals 03 - Build system, emulator (rev_1.1)Android internals 03 - Build system, emulator (rev_1.1)
Android internals 03 - Build system, emulator (rev_1.1)Egor Elizarov
 

En vedette (20)

FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)FPGA/Reconfigurable computing (HPRC)
FPGA/Reconfigurable computing (HPRC)
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overview
 
arm-cortex-a8
arm-cortex-a8arm-cortex-a8
arm-cortex-a8
 
Microblaze
MicroblazeMicroblaze
Microblaze
 
CPU Verification Metrics
CPU Verification MetricsCPU Verification Metrics
CPU Verification Metrics
 
Memory model
Memory modelMemory model
Memory model
 
Arm architecture overview
Arm architecture overviewArm architecture overview
Arm architecture overview
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory Systems
 
Game controlling via android
Game controlling via androidGame controlling via android
Game controlling via android
 
Smart Phone CPU
Smart Phone CPUSmart Phone CPU
Smart Phone CPU
 
Android internals 00 - Introduction (rev_1.1)
Android internals 00 - Introduction (rev_1.1)Android internals 00 - Introduction (rev_1.1)
Android internals 00 - Introduction (rev_1.1)
 
Android internals 08 - System start up, Media subsystem (rev_1.1)
Android internals 08 - System start up, Media subsystem (rev_1.1)Android internals 08 - System start up, Media subsystem (rev_1.1)
Android internals 08 - System start up, Media subsystem (rev_1.1)
 
Android internals 01 - Basic concepts of mobile platforms (rev_1.1)
Android internals 01 - Basic concepts of mobile platforms (rev_1.1)Android internals 01 - Basic concepts of mobile platforms (rev_1.1)
Android internals 01 - Basic concepts of mobile platforms (rev_1.1)
 
Samsung processors: Exynos family
Samsung processors: Exynos familySamsung processors: Exynos family
Samsung processors: Exynos family
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
 
Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)
Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)
Android internals 04 - “Androdized” kernel, Bionic, Logging subsystem (rev_1.1)
 
Android internals 02 - High-level architecture, version control system (rev_1.1)
Android internals 02 - High-level architecture, version control system (rev_1.1)Android internals 02 - High-level architecture, version control system (rev_1.1)
Android internals 02 - High-level architecture, version control system (rev_1.1)
 
Hardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsHardware accelerated Virtualization in the ARM Cortex™ Processors
Hardware accelerated Virtualization in the ARM Cortex™ Processors
 
Android internals 06 - Binder, Typical subsystem (rev_1.1)
Android internals 06 - Binder, Typical subsystem (rev_1.1)Android internals 06 - Binder, Typical subsystem (rev_1.1)
Android internals 06 - Binder, Typical subsystem (rev_1.1)
 
Android internals 03 - Build system, emulator (rev_1.1)
Android internals 03 - Build system, emulator (rev_1.1)Android internals 03 - Build system, emulator (rev_1.1)
Android internals 03 - Build system, emulator (rev_1.1)
 

Similaire à Review Multicore processing based on ARM architecture

mobile processors introduction..
mobile processors introduction..mobile processors introduction..
mobile processors introduction..Muhammad Sayam
 
Powerful SoM based on i.MX 8M Mini processor for various embedded applications
Powerful SoM based on i.MX 8M Mini processor for various embedded applicationsPowerful SoM based on i.MX 8M Mini processor for various embedded applications
Powerful SoM based on i.MX 8M Mini processor for various embedded applicationsnie, jack
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - ReviewAbdelrahman Hosny
 
Introduction to OMAP 35x Platform engineering topic
Introduction to OMAP 35x Platform engineering topicIntroduction to OMAP 35x Platform engineering topic
Introduction to OMAP 35x Platform engineering topicHkumar24K
 
18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptx18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptxKokilaK25
 
MCI-Unit_1.PPTX electronics communication Engineering
MCI-Unit_1.PPTX electronics communication EngineeringMCI-Unit_1.PPTX electronics communication Engineering
MCI-Unit_1.PPTX electronics communication EngineeringKongaMadhukar
 
Arm processors' architecture
Arm processors'   architectureArm processors'   architecture
Arm processors' architectureDr.YNM
 
LECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORSLECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORSDr.YNM
 
MYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-Module
MYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-ModuleMYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-Module
MYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-ModuleLinda Zhang
 
Microcontroller(18CS44) module 1
Microcontroller(18CS44)  module 1Microcontroller(18CS44)  module 1
Microcontroller(18CS44) module 1Swetha A
 
Explain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfExplain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfarjunenterprises1978
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors ArchitecturesMohammed Hilal
 
ARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture IllustrationARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture IllustrationJason J Pulikkottil
 

Similaire à Review Multicore processing based on ARM architecture (20)

Digital electronics
Digital electronicsDigital electronics
Digital electronics
 
Ec8791 arm 9 processor
Ec8791 arm 9 processorEc8791 arm 9 processor
Ec8791 arm 9 processor
 
mobile processors introduction..
mobile processors introduction..mobile processors introduction..
mobile processors introduction..
 
Arm arc-2016
Arm arc-2016Arm arc-2016
Arm arc-2016
 
Module-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdfModule-2 Instruction Set Cpus.pdf
Module-2 Instruction Set Cpus.pdf
 
Powerful SoM based on i.MX 8M Mini processor for various embedded applications
Powerful SoM based on i.MX 8M Mini processor for various embedded applicationsPowerful SoM based on i.MX 8M Mini processor for various embedded applications
Powerful SoM based on i.MX 8M Mini processor for various embedded applications
 
iPhone Architecture - Review
iPhone Architecture - ReviewiPhone Architecture - Review
iPhone Architecture - Review
 
Introduction to OMAP 35x Platform engineering topic
Introduction to OMAP 35x Platform engineering topicIntroduction to OMAP 35x Platform engineering topic
Introduction to OMAP 35x Platform engineering topic
 
Ppt
PptPpt
Ppt
 
Arm
ArmArm
Arm
 
18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptx18CS44-MODULE1-PPT.pptx
18CS44-MODULE1-PPT.pptx
 
MCI-Unit_1.PPTX electronics communication Engineering
MCI-Unit_1.PPTX electronics communication EngineeringMCI-Unit_1.PPTX electronics communication Engineering
MCI-Unit_1.PPTX electronics communication Engineering
 
Arm processors' architecture
Arm processors'   architectureArm processors'   architecture
Arm processors' architecture
 
LECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORSLECT 1: ARM PROCESSORS
LECT 1: ARM PROCESSORS
 
MYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-Module
MYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-ModuleMYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-Module
MYC-Y6ULX CPU Module - NXP i.MX 6UL/6ULL System-on-Module
 
Microcontroller(18CS44) module 1
Microcontroller(18CS44)  module 1Microcontroller(18CS44)  module 1
Microcontroller(18CS44) module 1
 
Explain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfExplain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdf
 
ARM.pdf
ARM.pdfARM.pdf
ARM.pdf
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors Architectures
 
ARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture IllustrationARM 7 and 9 Core Architecture Illustration
ARM 7 and 9 Core Architecture Illustration
 

Dernier

Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptNarmatha D
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 

Dernier (20)

Industrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.pptIndustrial Safety Unit-IV workplace health and safety.ppt
Industrial Safety Unit-IV workplace health and safety.ppt
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 

Review Multicore processing based on ARM architecture

  • 1. Multi-core processing Max. CPU clock rate to 2.34 GHz Min. feature size 14 nm Instruction set A64, A32, T32 Microarchitecture Hurricane and Zephyr both ARMv8-A- compatible Cores 2× Hurricane + 2× Zephyr Predecessor Apple A9, Apple A9X GPU 6-core Application Mobile
  • 2. Review of multi-core processing based on By M-Reza Khalifeh Mahmoodi
  • 3. Presentation name goes here Multi-core processing  Microprocessor is an IC which has only the Central Processing Unit (CPU)  They lack RAM, ROM, and other peripherals  The Intel Pentium series, i3, i5, Cortex A8 are popular microprocessors which find their applications in desktop PC’s, Laptops, notepads etc.  To deploy a microprocessor peripherals are required to be connected What is microcontroller and microprocessor ??? 03M-Reza Khalifeh Mahmoodi
  • 4. Presentation name goes here Multi-core processing  microcontrollers possess a CPU along with RAM, ROM, and other peripherals  microcontrollers possess a CPU along with RAM, ROM, and other peripherals  power is far less as compared to an actual computer  Popular microcontrollers in the market are 8051, STM32, PIC32, Arduino, ATMEL etc. What is microcontroller and microprocessor ??? 04M-Reza Khalifeh Mahmoodi
  • 5. Presentation name goes here microprocessor microcontroller 05 Multi-core processing Microprocessors are designed for generic and unspecific applications like on a PC. There is generally an Operating System (OS) installed which coordinates the different I/O’s. They need high amount of resources like RAM, ROM, I/O ports etc, which leads to increase in total cost of the system. Microcontrollers are designed to perform specific tasks. For example washing machine, refrigerator, microwave, cars, bikes, telephones etc. As the application is limited to one or two processes, they require small resources like RAM, ROM, I/O ports etc. This allows the integration of all these components on a single chip leading to reduced size and cost. M-Reza Khalifeh Mahmoodi
  • 6. Presentation name goes here Multi-core processing 06 Attributes Microcontrollers Microprocessors Application Are application specific and are designed to perform certain limited tasks. Have generic application and are capable of executing big and complicated tasks. One Solution Have inbuilt processor, RAM, ROM and I/O Ports. Like a small stand-alone computer in a single Integrate Chip. Generally don’t have inbuilt RAM, ROM and I/O ports. The pins are used to interface with external RAM, ROM and ports. Performance Limited performance. Very high performance. Speed Generally operate at speeds from 8 MHz – 200 MHz. Generally operate at speeds above 1 GHz. Power Consumption Are embedded inside other devices, so are designed to consume less power. Consume relative more power. As performance is the given higher weightage over power. Cost Affordable and cheap. Can get started with a minimum external circuit. Very expensive and requires other peripherals to work along. M-Reza Khalifeh Mahmoodi
  • 7. Presentation name goes here Altera Analog Devices Atmel Cypress Semiconductor Maxim Integrated ELAN Microelectronics Corp. EPSON Semiconductor Freescale Semiconductor Fujitsu Holtek Hyperstone Infineon Intel Lattice Semiconductor Microchip Technology National Semiconductor NEC 19 NXP Semiconductors Panasonic Rabbit Semiconductor Renesas Electronics Rockwell Silicon Laboratories Silicon Motion Sony Spansion STMicroelectronics Texas Instruments Toshiba Ubicom Xemics Xilinx XMOS ZiLOG This is a list of common microcontrollers listed by brand
  • 8. Presentation name goes here ARM is the industry's leading supplier of microprocessor technology, offering the widest range of microprocessor cores to address the performance, power and cost requirements for almost all application markets. Combining a vibrant ecosystem with over 1,000 partners delivering silicon, development tools and software, and more than 86 billion processors sold, ARM truly is “The Architecture for the Digital World ARM, originally Acorn RISC Machine, later Advanced RISC Machine, is a family of reduced instruction set computing (RISC) architectures for computer processors, configured for various environments. British company ARM Holdings develops the architecture and licenses it to other companies In 2009, some manufacturers introduced netbooks based on ARM architecture CPUs, in direct competition with netbooks based on Intel Atom. According to analyst firm IHS iSuppli, by 2015, ARM Integrated circuits may be in 23% of all laptops. A RISC-based computer design approach means processors require fewer transistors than typical complex instruction set computing (CISC) x86 processors in most personal computers. This approach reduces costs, heat and power use. These characteristics are desirable for light, portable, battery-powered devices—​including, smartphones, laptops and tablet computers, and other embedded systems. For supercomputers, which consume large amounts of electricity, ARM could also be a power-efficient solution ARM Holdings 08 Multi-core processing M-Reza Khalifeh Mahmoodi
  • 9. Presentation name goes here 09 ARM family ARM architecture ARM core Feature Cache (I / D), MMU Typical MIPS @ MHz ARM1 ARMv1 ARM1 First implementation None ARM2 ARMv2 ARM2 ARMv2 added the MUL (multiply) instruction None 4 MIPS @ 8 MHz 0.33 DMIPS/MHz ARMv2a ARM250 Integrated MEMC (MMU ), graphics and I/O processor. ARMv2a added the SWP and SWPB (swap) instructions None, MEMC1a 7 MIPS @ 12 MHz ARM3 ARMv2a ARM3 First integrated memory cache 4 KB unified 12 MIPS @ 25 MHz 0.50 DMIPS/MHz ARM6 ARMv3 ARM60 ARMv3 first to support 32-bit memory address space (previously 26- bit). ARMv3M first added long multiple instructions (32x32=64). None 10 MIPS @ 12 MHz ARM600 As ARM60, cache and coprocessor bus (for FPA10 floating-point unit) 4 KB unified 28 MIPS @ 33 MHz ARM610 As ARM60, cache, no coprocessor bus 4 KB unified 17 MIPS @ 20 MHz 0.65 DMIPS/MHz ARM7 ARMv3 ARM700 8 KB unified 40 MHz ARM710 As ARM700, no coprocessor bus 8 KB unified 40 MHz ARM710a As ARM710 8 KB unified 40 MHz 0.68 DMIPS/MHz This is a list of microarchitectures based on the ARM family Multi-core processing M-Reza Khalifeh Mahmoodi
  • 10. Presentation name goes here 10 ARM7T ARMv4T ARM7TDMI(-S) 3-stage pipeline, Thumb, ARMv4 first to drop legacy ARM 26- bit addressing None 15 MIPS @ 16.8 MHz 63 DMIPS @ 70 MHz ARM710T As ARM7TDMI, cache 8 KB unified, MMU 36 MIPS @ 40 MHz ARM720T As ARM7TDMI, cache 8 KB unified, MMU with FCSE (Fast Context Switch Extension) 60 MIPS @ 59.8 MHz ARM740T As ARM7TDMI, cache MPU ARM7EJ ARMv5TEJ ARM7EJ-S 5-stage pipeline, Thumb, Jazelle DBX, Enhanced DSP instructions None ARM8 ARMv4 ARM810[4][5] 5-stage pipeline, static branch prediction, double-bandwidth memory 8 KB unified, MMU 84 MIPS @ 72 MHz 1.16 DMIPS/MHz ARM9T ARMv4T ARM9TDMI 5-stage pipeline, Thumb None ARM920T As ARM9TDMI, cache 16 KB / 16 KB, MMU with FCSE (Fast Context Switch Extension)[6] 200 MIPS @ 180 MHz ARM922T As ARM9TDMI, caches 8 KB / 8 KB, MMU ARM940T As ARM9TDMI, caches 4 KB / 4 KB, MPU ARM9E ARMv5TE ARM946E-S Thumb, Enhanced DSP instructions, caches Variable, tightly coupled memories, MPU ARM966E-S Thumb, Enhanced DSP instructions No cache, TCMs ARM968E-S As ARM966E-S No cache, TCMs ARMv5TEJ ARM926EJ-S Thumb, Jazelle DBX, Enhanced DSP instructions Variable, TCMs, MMU 220 MIPS @ 200 MHz ARMv5TE ARM996HS Clockless processor, as ARM966E-S No caches, TCMs, MPU ARM family ARM architecture ARM core Feature Cache (I / D), MMU Typical MIPS @ MHz Multi-core processing M-Reza Khalifeh Mahmoodi
  • 11. Presentation name goes here 11 ARM10E ARMv5TE ARM1022E As ARM1020E 16 KB / 16 KB, MMU ARMv5TEJ ARM1026EJ-S Thumb, Jazelle DBX, Enhanced DSP instructions, (VFP) Variable, MMU or MPU ARM11 ARMv6 ARM1136J(F)-S[7] 8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), Enhanced DSP instructions Variable, MMU 740 @ 532–665 MHz (i.MX31 SoC), 400– 528 MHz ARMv6T2 ARM1156T2(F)-S 9-stage pipeline,[8] SIMD, Thumb-2, (VFP), Enhanced DSP instructions Variable, MPU ARMv6Z ARM1176JZ(F)-S As ARM1136EJ(F)-S Variable, MMU + TrustZone 965 DMIPS @ 772 MHz, up to 2,600 DMIPS with four processors[9] ARMv6K ARM11MPCore As ARM1136EJ(F)-S, 1–4 core SMP Variable, MMU SecurCore ARMv6-M SC000 0.9 DMIPS/MHz ARMv4T SC100 ARMv7-M SC300 1.25 DMIPS/MHz Cortex-M ARMv6-M Cortex-M0[10] Microcontroller profile, most Thumb + some Thumb-2,[11] hardware multiply instruction (optional small), optional system timer, optional bit- banding memory Optional cache, no TCM, no MPU 0.84 DMIPS/MHz Cortex-M0+[12] Microcontroller profile, most Thumb + some Thumb-2,[11] hardware multiply instruction (optional small), optional system timer, optional bit- banding memory Optional cache, no TCM, optional MPU with 8 regions 0.93 DMIPS/MHz Cortex-M1[13] Microcontroller profile, most Thumb + some Thumb-2,[11] hardware multiply instruction (optional small), OS option adds SVC / banked stack pointer, optional system timer, no bit-banding memory Optional cache, 0-1024 KB I-TCM, 0- 1024 KB D-TCM, no MPU 136 DMIPS @ 170 MHz,[14](0.8 DMIPS/MHz FPGA- dependent)[15] ARMv7-M Cortex-M3[16] Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memory Optional cache, no TCM, optional MPU with 8 regions 1.25 DMIPS/MHz ARMv7E-M Cortex-M4[17] Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP single-precision FPU, hardware multiply and divide instructions, optional bit-banding memory Optional cache, no TCM, optional MPU with 8 regions 1.25 DMIPS/MHz (1.27 w/FPU) Cortex-M7[18] Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single and double precision FPU, hardware multiply and divide instructions 0-64 KB I-cache, 0-64 KB D-cache, 0-16 MB I-TCM, 0-16 MB D-TCM (all these w/optional ECC), optional MPU with 8 or 16 regions 2.14 DMIPS/MHz Multi-core processing M-Reza Khalifeh Mahmoodi
  • 12. Presentation name goes here 12 Cortex-R ARMv7-R Cortex-R4[19] Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lockstep with fault logic 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt MPU with 8/12 regions Cortex-R5[20] Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic / optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator coherency port (ACP)[21] 0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt MPU with 12/16 regions Cortex-R7[22] Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic / out-of-order execution / dynamic register renaming / optional as 2 independent cores, low-latency peripheral port (LLPP), ACP[21] 0–64 KB / 0–64 KB, ? of 0–128 KB TCM, opt MPU with 16 regions Cortex-R8[23] TBD TBD Multi-core processing M-Reza Khalifeh Mahmoodi
  • 13. Presentation name goes here 13 Cortex-A (32-bit) ARMv7-A Cortex-A5[24] Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4- D16 FPU / Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) 4-64 KB / 4-64 KB L1, MMU + TrustZone 1.57 DMIPS/MHz per core Cortex-A7[25] Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4-D16 FPU / NEON / Jazelle RCT and DBX / Hardware virtualization, in-order execution, superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, architecture and feature set are identical to A15, 8-10 stage pipeline, low-power design[26] 8-64 KB / 8-64 KB L1, 0–1 MB L2, MMU + TrustZone 1.9 DMIPS/MHz per core Cortex-A8[27] Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT and DAC, 13-stage superscalar pipeline 16-32 KB / 16–32 KB L1, 0–1 MB L2 opt ECC, MMU + TrustZone Up to 2000 (2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz) Cortex-A9[28] Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU / Optional NEON / Jazelle RCT and DBX, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) 16–64 KB / 16–64 KB L1, 0–8 MB L2 opt parity, MMU + TrustZone 2.5 DMIPS/MHz per core, 10,000 DMIPS @ 2 GHz on Performance Optimized TSMC 40G (dual-core) Cortex-A12[29] Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP) 32-64 KB 3.0 DMIPS/MHz per core Cortex-A15[30] Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of- order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, 15-24 stage pipeline[26] 32 KB w/parity / 32 KB w/ECC L1, 0– 4 MB L2, L2 has ECC, MMU + TrustZone At least 3.5 DMIPS/MHz per core (up to 4.01 DMIPS/MHz depending on implementation)[31] Cortex-A17[32] Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of- order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP 32 KB L1, 256 KB-8 MB L2 w/optional ECC 2.8 DMIPS/MHz ARMv8-A Cortex-A32[33] Application profile, AArch32, 1-4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline 8-64 KB w/optional parity / 8-64 KB w/optional ECC L1 per core, 128 KB-1 MB L2 w/optional ECC shared Multi-core processing M-Reza Khalifeh Mahmoodi
  • 14. Presentation name goes here 14 Cortex-A (64-bit) ARMv8-A Cortex-A35[34] Application profile, AArch32 and AArch64, 1-4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in- order pipeline 8-64 KB w/parity / 8-64 KB w/ECC L1 per core, 128 KB-1 MB L2 shared, 40-bit physical addresses 1.78 DMIPS/MHz Cortex-A53[35] Application profile, AArch32 and AArch64, 1-4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline 8-64 KB w/parity / 8-64 KB w/ECC L1 per core, 128 KB-2 MB L2 shared, 40- bit physical addresses 2.3 DMIPS/MHz Cortex-A57[36] Application profile, AArch32 and AArch64, 1-4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, multi-issue, deeply out-of- order pipeline 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB-2 MB L2 shared w/ECC; 44-bit physical addresses 4.6 DMIPS/MHz Cortex-A72[37] Application profile, AArch32 and AArch64, 1-4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, multi-issue, deeply out-of- order pipeline 48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB-2 MB L2 shared w/ECC; 44-bit physical addresses 4.8 DMIPS/MHz Cortex-A73[38] Application profile, AArch32 and AArch64, 1-4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, multi-issue, deeply out-of- order pipeline 64 KB / 32-64 KB L1 per core, 256 KB-8 MB L2 shared w/ optional ECC, 44- bit physical addresses 4.9 DMIPS/MHz M-Reza Khalifeh Mahmoodi Multi-core processing
  • 15. Presentation name goes here Why should we use Multi-core processing instead of Single-core ?Mobile devices perform a wide variety of tasks such as Web browsing, video playback, mobile gaming, SMS text messaging, and location-based services. Due to the growth in the availability of high speed mobile and Wi-Fi networks, mobile devices will also be used for various performance-intensive tasks that were previously handled by traditional PCs. The next generation of smartphones (called “Super phones”) and tablets will be used for a wide variety of tasks such as playback of high definition 1080p videos, Adobe® Flash®-based online gaming, Flash-based streaming high definition videos, visually rich gaming, video editing, simultaneous HD video downloads, encode and uploads, and real-time HD video conferencing. 15
  • 16. Presentation name goes here  Difficult to make single-core  clock frequencies even higher  P = C × V 2 × F  Moore's law  Deeply pipelined circuits:  heat problems  speed of light problems  difficult design and verification  large design teams necessary  server farms need expensive air-conditioning  Many new applications are multithreaded Why should we use Multi- core processing instead Single-core ? The Unique Screen Mockup 16 Multi-core processing M-Reza Khalifeh Mahmoodi
  • 17. Presentation name goes here Multi-core processing  Large problems can often be divided into smaller ones  Several different forms of parallel computing  Bit-level - processor word size  Instruction-level – Hardware & Software e = a + b f = c + d m = e * f  Task parallelism  Data Parallel computing ??? 17M-Reza Khalifeh Mahmoodi
  • 18. Presentation name goes here Multi-core processing  speedup from parallelization would be linear  doubling the number of processing elements should halve the runtime  Slatency (s) = 1 1−𝑃+ 𝑃 𝑆  Slatency is the potential speedup in latency of the whole task;  s is the speedup in latency of the execution of the parallelizable part of the task;  P is the percentage of the execution time of the whole task concerning the parallelizable part of the task before parallelization.Speedup in a serial program  Speedup in a serial program  For example, with a serial program in two parts A and B for which TA = 3 s and TB = 1 s, Amdahl's law ? 18M-Reza Khalifeh Mahmoodi
  • 19. Presentation name goes here Michael J. Flynn Flynn proposed Flynn's taxonomy, a method of classifying digital computers, in 1966 professor emeritus at Stanford University Flynn's taxonomy !!!!  Classifications  Single instruction stream single data stream (SISD)  Single instruction stream, multiple data streams (SIMD)  Multiple instruction streams, single data stream (MISD)  Multiple instruction streams, multiple data streams (MIMD)  Further divisions  Single program, multiple data streams (SPMD)  Multiple programs, multiple data streams (MPMD) 19 Multi-core processing M-Reza Khalifeh Mahmoodi
  • 20. Presentation name goes here Developers must also choose the appropriate form of multiprocessing for their application requirements. This choice will determine how easily both new and existing code can achieve maximum concurrency. As Table 1 illustrates, developers have three basic forms to choose from:  Asymmetric multiprocessing (AMP),  Symmetric multiprocessing (SMP)  Bound multiprocessing (BMP). Running AMP, SMP or BMP Mode for Multicore Systems 20 Model How it Works Key Advantages Asymmetric multiprocessi ng (AMP) A separate OS, or a separate copy of the same OS, manages each core. Typically, each software process is locked to a single core (e.g. process A runs only on core 1, process B runs only on core 2, etc.). Provides an execution environment similar to that of uniprocessor systems, allowing simple migration of legacy code. Also allows developers to manage each core independently. Symmetric multiprocessi ng (SMP) A single OS manages all processor cores simultaneously. The OS can dynamically schedule any process on any core, enabling full utilization of all cores. Provides greater scalability and parallelism than AMP, along with simpler shared resource management. Bound multiprocessi ng (BMP) A single OS manages all cores simultaneously. As in SMP, the OS can dynamically schedule processes on any core. However, the developer can also lock any process (and all of its associated threads) to a specific core. Combines the developer control of AMP with the transparent resource management of SMP. The option to lock threads to any core simplifies migration of legacy code and allows designers to dedicate cores Multi-core processing M-Reza Khalifeh Mahmoodi
  • 21. Presentation name goes here Multi-core processing  process and all of its threads are locked to a single processor core CPU Utilization in AMP Mode 21M-Reza Khalifeh Mahmoodi
  • 22. Presentation name goes here Multi-core processing  SMP addresses many of the issues by running only one copy of an OS across all the chip’s cores. Because the OS has insight into all system elements at all times, it can allocate resources on multiple cores with little or no input from the application designer Symmetric Multiprocessing (SMP) Mode 22M-Reza Khalifeh Mahmoodi
  • 23. Presentation name goes here An introduction to NXP NXP Semiconductors N.V. is a Dutch global semiconductor manufacturer headquartered in Eindhoven, Netherlands. The company employs approximately 45,000 people in more than 35 countries, including 11,200 engineers in 23 countries. NXP reported revenue of $6.1 billion in 2015, including one month of revenue contribution from recently merged Freescale Semiconductor. NXP said it was the fifth-largest non-memory semiconductor supplier in 2016, NXP is the co-inventor of near field communication (NFC) technology NXP manufactures automotive chips for in-vehicle networking NXP invented the I²C interface over 30 years ago 23 Multi-core processing M-Reza Khalifeh Mahmoodi
  • 24. Presentation name goes here An example of ‘Single-core’ ARM Architecture 24 NXP, microcontroller based on a CORTEX-M3 , ARM family and the best technical example of this family is LPC178x/7x series and more specifically LPC1788 Microcontroller. ARMv7-M , Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memory 32-bit ARM Cortex-M3 microcontroller; up to 512 kB flash and 96 kB SRAM; USB Device/Host/OTG; Ethernet; LCD; EMC The LPC178x/7x adds a specialized flash memory accelerator to accomplish optimal performance when executing code from flash. The LPC178x/7x operates at up to 120 MHz CPU frequency. Multi-core processing M-Reza Khalifeh Mahmoodi
  • 25. Presentation name goes here An example of ‘Single-core’ ARM Architecture 25 Multi-core processing M-Reza Khalifeh Mahmoodi
  • 26. Presentation name goes here Multiprocessor System Designs come with a number of challenges 26  Multiprocessor System Designs is the need for low power capability  the higher total memory system and bus bandwidth is required  careful bus architecture planning is needed Multi-core processing M-Reza Khalifeh Mahmoodi
  • 27. Presentation name goes here 26M-Reza Khalifeh Mahmoodi TMS 1000 8,000 1974 Texas Instruments 8,000 nm Intel 4004 2,300 1971 Intel 10,000 nm 12 mm² Intel 8008 3,500 1972 Intel 10,000 nm 14 mm² Intel 8086 29,000 1978 Intel 3,000 nm 33 mm² Intel 8088 29,000 1979 Intel 3,000 nm 33 mm² Intel 80386 275,000 1985 Intel 1,500 nm 104 mm² ARM 1 25,000[7] 1985 Acorn 3,000 nm 50 mm² Intel 80486 1,180,235 1989 Intel 1000 nm 173 mm² ARM 3 300,000 1989 Acorn ARM 6 35,000 1991 ARM Pentium 3,100,000 1993 Intel 800 nm 294 mm² ARM 9TDMI 111,000 1999 Acorn 350 nm 4.8 mm² Pentium Pro 5,500,000 1995 Intel 500 nm 307 mm² Core 2 Duo Conroe 291,000,000 2006 Intel 65 nm 143 mm² Core 2 Duo Allendale 169,000,000 2007 Intel 65 nm 111 mm² Itanium 2 Madison 6M 410,000,000 2003 Intel 130 nm 374 mm² ARM Cortex-A9 26,000,000 2007 ARM 45 nm 31 mm² Core 2 Duo Wolfdale 3M 230,000,000 2008 Intel 45 nm 83 mm² Itanium 2 with 9 MB cache 592,000,000 2004 Intel 130 nm 432 mm² Core i7 (Quad) 731,000,000 2008 Intel 45 nm 263 mm² Apple A7 (dual- core ARM64 "mobile SoC") 1,000,000,000 2013 Apple 28 nm 102 mm² 22-core Xeon Broadwell-E5 7,200,000,000 2016 Intel 14 nm 456 mm² SPARC M7 10,000,000,000 2015 Oracle 20 nm GP100 Pascal 15,300,000,000 2016 Nvidia 16 nm 610 mm² Challenges for low power capability
  • 28. Presentation name goes here 28M-Reza Khalifeh Mahmoodi Challenges for low power capability  Simplest way: Core standby mode algorithm
  • 29. Presentation name goes here Challenges for bus bandwidth 29  Simple dual core system with ROM sharing  gets 78% of ideal performance in Dhrystone 2.1 Verilog simulation  using of a 64-bit flash and simple fetch buffer 92% Multi-core processing M-Reza Khalifeh Mahmoodi
  • 30. Presentation name goes here LPC microcontrollers based on ARM Cortex-M4 Core (single- and multi-core) LPC43XX series 24  Cortex-M4 or M4F cores up to 204 MHz  Highest performance, best power efficiency  DSP options, multi-high-speed connectivity, advanced peripherals  Cortex-M0+ or M0 coprocessors Multi-core processing M-Reza Khalifeh Mahmoodi
  • 31. Presentation name goes here LPC4350: 31 Multi-core processing M-Reza Khalifeh Mahmoodi 32-bit ARM Cortex-M4/M0 MCU; up to 264 kB SRAM; Ethernet; two High-speed USBs; advanced configurable peripherals
  • 32. Presentation name goes here i.MX 8 Family processors based on ARM® Cortex-A53, Cortex-A72 + Cortex-M4 cores 32 Multi-core processing M-Reza Khalifeh Mahmoodi Multi-screen (4x) support Fast multi-OS platform deployment via advanced full-chip hardware virtualization and domain protection Deploy rich, fully-independent graphics content across 4x HD screens or 1x 4K screen Android™*, Linux®*, FreeRTOS, QNX™*, Green Hills®, Dornerworks* XEN™* Automotive AEC-Q100 Grade 3 (-40° to 125° C Tj), Industrial (-40° to 105° C Tj), Consumer (-20° to 105° C Tj) Fully supported on NXP’s 10 and 15-year Longevity Program
  • 33. Presentation name goes here http://theembeddedguy.com/2016/05/09/microcontroller-vs-microprocessor/ https://en.wikipedia.org/wiki/List_of_common_microcontrollers https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures https://www.eeweb.com/blog/arm/multi-core-mcu-design-with-arm-cortex-m- processors-and-coresight-soc http://www.nxp.com/products/microcontrollers-and-processors/arm-processors/lpc- cortex-m-mcus:LPC-ARM-CORTEX-M-MCUS http://www.nxp.com/ http://www.intel.eu/ http://www.st.com/ http://www.ti.com/ References: 08 Multi-core processing M-Reza Khalifeh Mahmoodi