SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Evolution of Personal Computing by
Microprocessors and SoCs
For Credit Seminar: EEC7203 (Internal Assessment)
Submitted To

Dr. T. Shanmuganantham
Associate Professor,
Department of Electronics Engineering

Azmath Moosa
Reg No: 13304006
M. Tech 1st Yr
Department of Electronics Engineering,
School of Engg & Tech,
Pondicherry University
Abstract
Throughout history, new and improved technologies have transformed the human
experience. In the 20th century, the pace of change sped up radically as we entered the
computing age. For nearly 40 years the Microprocessor driven by innovations of companies
like Intel have continuously created new possibilities in the lives of people around the world.
In this paper, I hope to capture the evolution of this amazing device that has raised computing
to a whole new level and made it relevant in all fields – Engineering, Research, Medical,
Academia, Businesses, Manufacturing, Commuting etc. I will highlight the significant strides
made in each generation of Processors and the remarkable ways in which engineers overcame
seemingly unsurmountable challenges and continued to push the evolution to where it is today.

Page | i
Table of Contents
Title

Page No.

1.

Abstract

i

2.

Table of Contents

ii

3.

List of Figures

iii

4.

Introduction

1

5.

X86 and birth of the PC

2

6.

The Pentium

3

7.

Pipelined Design

4

8.

The Pentium 4

5

9.

The Core Microarchitecture

7

10.

Tick Tock Cadence

10

11.

The Nehalem Microarchitecture

10

12.

The SandyBridge Microarchitecture

12

13.

The Haswell Microarchitecture

15

14.

Performance Comparison

16

15.

Shift in Computing Trends

18

16.

Advanced RISC Machines

18

17.

System on Chip (SoC)

19

18.

Conclusion

22

19.

References

Page | ii
List of Figures
Figure 1: 4004 Layout
Figure 2: Pentium Chip
Figure 3: Pentium CPU based PC architecture
Figure 4: Pentium 2 logo
Figure 5: Pentium 3 logo
Figure 6: Pentium 4 HT technology illustration
Figure 7: NetBurst architecture feature presentation at Intel Developer
Forum
Figure 8: The NetBurst Pipeline
Figure 9: The Core architecture feature presentation at Intel Developer
Forum
Figure 10: The Core architecture pipeline
Figure 11: Macro fusion explained at IDF
Figure 12: Power Management capabilities of Core architecture
Figure 13: Intel's new tick tock strategy revealed at IDF
Figure 14: Nehalem pipeline backend
Figure 15: Nehalem pipeline frontend
Figure 16: Improved Loop Stream Detector
Figure 17: Nehalem CPU based PC architecture
Figure 18: Sandybridge architecture overview at IDF
Figure 19: Sandybridge pipeline frontend
Figure 20: Sandybridge pipeline backend
Figure 21: Video transcoding capabilities of Nehalem
Figure 22: Typical planar transistor
Figure 23: FinFET Tri-Gate transistor
Figure 24: FinFET Delay vs Power
Figure 25: SEM photograph of fabricated FinFET trigate transistors
Figure 26: Haswell pipeline frontend
Figure 27: Haswell pipeline backend
Figure 28: Performance comparisons of 5 generations of Intel processors
Figure 29: Market share of personal computing devices.
Figure 30: A smartphone SoC; Qualcomm's OMAP
Figure 31: A SoC for tablet; Nvidia TEGRA

1
3
4
4
4
6
6
7
8
8
9
9
10
11
11
11
11
12
13
13
14
14
14
15
15
16
16
17
18
20
21

Page | iii
Introduction
In 1969, Intel was found with aim of manufacturing memory devices. Their first
product was Shottky TTL bipolar SRAM memory chip. A Japanese company – Nippon
Calculating Machine Corporation approached Intel to design 12 custom chips for its new
calculator. Intel engineers suggested a family of just four chips, including one that could be
programmed for use in a variety of products. Intel designed a set of four chips known as the
MCS-4. It included a central processing unit (CPU) chip—the 4004—as well as a supporting
read-only memory (ROM) chip for the custom applications programs, a random-access
memory (RAM) chip for processing data, and a shift-register chip for the input/output (I/O)

Figure 1: 4004 Layout

port. MCS-4 was a "building block" that engineers could purchase and then customize with
software to perform different functions in a wide variety of electronic devices.
And thus, the industry of the Microprocessor was born. 4004 had 2,300 pMOS
transistors at 10um and was clocked at 740 kHz. 4 pins were multiplexed for both address and
data (16 pin IC). In the very next year, the 8008 was introduced. It was an 8 bit processor
clocked at 500 kHz with 3,500 pMOS transistors at the same 10um. It was actually slower
with 0.05 MIPS (Millions of instructions per second) as compared to 4004 with 0.07. It was
in 1974, that the 8080 with 10 times the performance of 8008 with a different transistor
technology was launched. It used 4,500 NMOS transistors of size 6um. It was clocked at 2
MHz with a whopping 0.29 MIPS. Finally in March 1976, the 8085 clocked at 3 MHz with
yet another newer transistor technology - depletion type NMOS transistors of size 3 um was
launched. It was capable of 0.37 MIPS. The 8085 was a popular device of its time and is still
used in universities across the globe to introduce students to microprocessors.
Page | 1
x86 and birth of the PC
The 8086 16 bit processor made its debut in 1978. New techniques such as that of
memory segmentation into banks to extend capacity and Pipelining to speed up execution were
introduced. It was designed to be compatible with 8085 Assembly Mnemonics. It had 29,000
transistors of 3um channel length and was clocked at 5, 8 and 10 MHz with a full 0.75 MIPS
at maximum clock. It was the father of what is now known as the x86 Architecture which
eventually turned out to be Intel’s most successful line of processors that power many
computing devices even today. Introduced soon after was the processor that powered the first
PC – the 8088. Clocked at 5-8 MHz with 0.33-0.66 MIPS, it was 8086 with an external 8 bit
bus.
In 1981, a revolution seized the computer industry stirred by the IBM PC. By the late
'70s, personal computers were available from many vendors, such as Tandy, Commodore, TI
and Apple. Computers from different vendors were not compatible. Each vendor had their own
architecture, their own operating system, their own bus interface, and their own software.
Backed by IBM's marketing might and name recognition, the IBM PC quickly captured the
bulk of the market. Other vendors either left the PC market (TI), pursued niche markets
(Commodore, Apple) or abandoned their own architecture in favor of IBM's (Tandy). With a
market share approaching 90%, the PC became a de-facto standard. Software houses wrote
operating systems (MicroSoft DOS, Digital Research DOS), spread sheets (Lotus 123), word
processors (WordPerfect, WordStar) and compilers (MicroSoft C, Borland C) that ran on the
PC. Hardware vendors built disk drives, printers and data acquisition systems that connected
to the PC's external bus. Although IBM initially captured the PC market, it subsequently lost
it to clone vendors. Accustomed to being a monopoly supplier of mainframe computers, IBM
was unprepared for the fierce competition that arose as Compaq, Leading Edge, AT&T, Dell,
ALR, AST, Ampro, Diversified Technologies and others all vied for a share of the PC market.
Besides low prices and high performance, the clone vendors provided one other very important
thing to the PC market: an absolute hardware standard. In order to sell a PC clone, the
manufacturer had to be able to guarantee that it would run all of the customer's existing PC
software, and work with all of the customer's existing peripheral hardware. The only way to do
this was to design the clone to be identical to the original IBM PC at the register level. Thus,
the standard that the IBM PC defined became graven in stone as dozens of clone vendors
shipped millions of machines that conformed to it in every detail. This standardization has been
an important factor in the low cost and wide availability of PC systems.
Page | 2
8086 and 80186/88 were limited to addressing 1M of memory. Thus, the PC was also
limited to this range. This limitation was increased to 16 MB by 80286 released in 1982. It
had max clock of 16 MHz with more than 2 MIPS. It had 134,000 transistors at 1.5um. The
processors and the PC up to this point were all 16 bit. The 80386 range of processors, released
in 1985, were the first 32 bit processors to be used in the PC. The first of these had 275,000
transistors at 1um and was clocked at 33 MHz with 5.1 MIPS. Its addressing range could be
virtually 32 GB. Over the next few years, Intel modified the architecture and provided some
improvements in terms of memory addressing range and clock speed. The 80486 range of
processors, released in 1989, brought significant advancements in computing capability with a
whopping 41 MIPS for a processor clocked at 50 MHz with 1.2 million transistors at 0.8 um
or 800 nm. It had a new technique to speed up RAM read/writes with the Cache memory. It
was integrated onto the CPU die and was referred to as level 1 or L1 cache (as opposed to the
L2 cache available in the motherboard). As with the previous series, Intel slightly modified
the architecture and released higher clocked versions over the next few years.

The Pentium
The Intel Pentium microprocessor was introduced in
1993.

Its microarchitecture, dubbed P5, was Intel's fifth-

generation and first 32 bit superscalar microarchitecture.
Superscalar architecture is one in which multiple execution
units or functional units (such as adders, shifters and
multipliers) are provided and operate in parallel. As a direct
extension of the 80486 architecture, it included dual integer
pipelines, a faster floating-point unit, wider data bus, separate
code and data caches and features for further reduced address

Figure 2: Pentium Chip

calculation latency. In 1996, the Pentium with MMX Technology (often simply referred to as
Pentium MMX) was introduced with the same basic microarchitecture complemented with an
MMX instruction set, larger caches, and some other enhancements. The Pentium was based
on 0.8 um process technology, involved 3.1 million transistors and was clocked at 60 MHz
with 100 MIPS. The Pentium was truly capable of addressing 4 GB of RAM without any
operating system based virtualization.

Page | 3
The next microarchitecture was the P6
or the Pentium Pro released in 1995.
It had an integrated L2 cache. One
major change Intel brought to the PC architecture was
the presence of FSB (Front Side Bus) that managed the
CPU’s communications with the RAM and other IO.
RAM and Graphics card were high speed peripherals
and were interfaced through the Northbridge. Other IO
devices like keyboard and speakers were interfaced
Figure 3: Pentium CPU based PC architecture

through the Southbridge.

Pentium II followed it soon in 1997. It
had

MMX,

improved

16

bit

performance and had double the L2 cache. Pentium II had 7.5 million
transistors starting with 0.35um process technology but later revisions utilised
0.25um transistors.
Figure 4: Pentium 2
logo

The Pentium III followed in 1999 with 9.5 million 0.25um transistors and a
new instruction set SSE (Streaming SIMD Extensions) that assisted DSP and
graphics processing. Intel was able to push the clock speed higher and higher
with Pentium III with some variants clocked as high as 1 GHz.
Figure 5: Pentium 3
logo

Pipelined Design
At a high level the goal of a CPU is to grab instructions from memory and execute those
instructions. All of the tricks and improvements we see from one generation to the next just
help to accomplish that goal faster.
The assembly line analogy for a pipelined microprocessor is over used but that's because it is
quite accurate. Rather than seeing one instruction worked on at a time, modern processors

Page | 4
feature an assembly line of steps that breaks up the grab/execute process to allow for higher
throughput.
The basic pipeline is as follows: fetch, decode, execute, and commit to memory. One would
first fetch the next instruction from memory (there's a counter and pointer that tells the CPU
where to find the next instruction). One would then decode that instruction into an internally
understood format (this is key to enabling backwards compatibility). Next one would execute
the instruction (this stage, like most here, is split up into fetching data needed by the instruction
among other things). Finally one would commit the results of that instruction to memory and
start the process over again. Modern CPU pipelines feature many more stages than what've
been outlined above.
Pipelines are divided into two halves. Frontend and Backend. The front end is responsible for
fetching and decoding instructions, while the back end deals with executing them. The division
between the two halves of the CPU pipeline also separates the part of the pipeline that must
execute in order from the part that can execute out of order. Instructions have to be fetched and
completed in program order (can't click Print until you click File first), but they can be executed
in any order possible so long as the result is correct.
Many instructions are either dependent on one another (e.g. C=A+B followed by E=C+D) or
they need data that's not immediately available and has to be fetched from main memory (a
process that can take hundreds of cycles, or an eternity in the eyes of the processor). Being able
to reorder instructions before they're executed allows the processor to keep doing work rather
than just sitting around waiting.
This document aims to highlight changes to the x86 pipeline with each generation of
processors.

The Pentium 4
The NetBurst microarchitecture started with Pentium 4. This line of processors started
in 2000 clocked at 1.4 GHz, 42 million transistors at 0.18 um process size and SSE2 instruction
set. The early variants were codenamed Willamette (1.9 to 2.0 GHz) and later ones Northwood
(up to 3.0 GHz) and Prescott.

Page | 5
The diagram is from Intel feature presentation
of the NetBurst architecture. The Willamette
was an early variant with SSE2, Rapid
Execution engine (in which ALUs operate at
twice

the

core

clock

frequency)

and

Instruction Trace Cache (ITC cached
decoded instructions for faster loop execution).
HT Technology refers to the prevention of
Figure 7: NetBurst architecture feature presentation at Intel
Developer Forum

CPU wastage by assigning it to execute one
thread or application when another one waits

for data from RAM to arrive. This essentially acts like a dual processor system.

Figure 6: Pentium 4 HT technology illustration

The NetBurst pipeline was 20 stages long. As illustrated in the figure to the right, the BTB
(Branch Target Buffer) helps to define the address of the next micro-op in the trace cache (TC
Nxt IP). Then micro-ops are fetched out of the trace cache (TC Fetch) and are transferred
(Drive) into the RAT (register alias table). After that, the necessary resources are allocated
(such as loading queues, storing buffers etc. (Alloc)), and there comes logic registers rename
(Rename). Micro-ops are put in the Queue until there appears free place in the Schedulers.
There, micro-ops' dependencies are to be solved, and then micro-ops are transferred to the
register files of the corresponding Dispatch Units. There, a micro-op is executed, and Flags are
calculated. When implementing the jump instruction, the real branch address and the predicted

Page | 6
one are to be compared (Branch Check). After that the new
address

is

recorded

in

the

BTB

(Drive).

Northwood and Prescott were later variations with certain
enhancements as illustrated in the diagram above. Processor
specific details are unnecessary.
The next major advancement was the 64 bit
NetBurst released in 2005. The Prescott line up continued
with maximum clock speeds of 3.8 GHz, transistor sizes of
0.09um.

It had 2MB cache and EIST (Enhanced Intel

SpeedStep Technology – allowing dynamic processor clock
speed scaling through software).

EIST was particularly

useful for mobile processors as a lot of power was conserved
when running at low clock speeds.

NetBurst family

continued to grow with the Pentium D (dual core HT
disabled

processors)

and

Pentium

Extreme

Figure 8: The NetBurst Pipeline

Edition

processors (Dual core with HT enabled).

The Core Microarchitecture
The high power consumption and heat intensity, the resulting inability to effectively
increase clock speed, and other shortcomings such as the inefficient pipeline were the primary
reasons for which Intel abandoned the NetBurst microarchitecture and switched to completely
different architectural design, delivering high efficiency through a small pipeline rather than
high clock speeds.
Intel’s solution was the Core microarchitecture released in 2006. The first of these
were sold under the brand name of “Core 2” with duo and quad variants (dual and quad CPUs).

Page | 7
Merom was for
mobile computing,
Conroe was for
desktop

systems,

and

Woodcrest

was

for servers

and workstations.
While
architecturally
identical, the three
processor
differed

lines
in

the

socket used, bus
speed, and power

Figure 9: The Core architecture feature presentation at Intel Developer Forum

consumption. The
diagram below illustrates the Conroe architecture.

14 stage pipeline of the Core
architecture was a trade-off between
long and short pipeline designs. The
architectural

highlights

of

this

generation are given below.
Wide Dynamic Execution referred
to two things. First, the ability of the
processor to fetch, dispatch, execute
and

return

four

instructions

simultaneously. Second, a technique
Figure 10: The Core architecture pipeline

called Macro fusion in which two
x86 instructions could be combined

into a single micro-op to increase performance.

Page | 8
Figure 11: Macro fusion explained at IDF

Figure 12: Power Management capabilities of Core
architecture

In previous generations, the ALU typically broke instructions into two blocks, which resulted
in two micro ops and thus two execution clock cycles. In this generation, Intel extended the
execution width of the ALU and the load/store units to 128 bits, allowing for eight single
precision or four double precision blocks to be processed per cycle. The feature was called
Advanced Digital Media Boost, because it applied to SSE instructions which were utilised by
Multimedia transcoding applications. Intel Advanced Smart Cache referred to the unified
L2 cache that allowed for a large L2 cache to be shared by two processing cores (2 MB or 4
MB). Caching was more effective now because data was no longer stored twice into different
L2 caches any more (no replication). This freed up the system bus from being overloaded with
RAM read/write activity as each core could share data directly through the cache. The Smart
Memory Access feature referred to the inclusion of prefetchers. A prefetcher gets data into a
higher level unit using very speculative algorithms. It is designed to provide data that is very
likely to be requested soon, which can reduce memory access latency and increase efficiency.
The memory prefetchers constantly have a look at memory access patterns, trying to predict if
there is something they could move into the L2 cache from RAM - just in case that data could
be requested next. Intelligent Power Capability was a culmination of many techniques. The
65-nm process provided a good basis for efficient ICs. Clock gating and sleep transistors made
sure that all units as well as single transistors that were not needed remained shut down.
Enhanced SpeedStep still reduced the clock speed when the system was idle or under a low
load and was also capable of controlling each core separately. Some features were also
available such as Execute Disable Bit by which an operating system with support for the bit
may mark certain areas of memory as non-executable. The processor will then refuse to execute
any code residing in these areas of memory. The general technique, known as executable space
Page | 9
protection, is used to prevent certain types of malicious software from taking over computers
by inserting their code into another program's data storage area and running their own code
from within this section; this is known as a buffer overflow attack. It is also to be noted that
HyperThreading was removed.

Tick-Tock Cadence
Since 2007, Intel adopted a "Tick-Tock" model to follow every microarchitectural
change with a die shrink of the process technology. Every "tick" is a shrinking of process
technology of the previous microarchitecture and every "Tock" is a new microarchitecture.
Every year to 18 months, there is expected to be one Tick or Tock.

Figure 13: Intel's new tick tock strategy revealed at IDF

In 2007, the Core microarchitecture underwent a “Tick” to the 45 nm process. Processors were
codenamed Penryn. Process shrinking always brings down energy consumption and improves
power savings.

The Nehalem Microarchitecture
The next Tock was introduced in 2008 with the Nehalem microarchitecture. The
transistor count in this generation was nearing the Billion mark with around 700 million
transistors in the i7. The pipeline frontend and backend are illustrated below.
Page | 10
Backend
Figure 15: Nehalem pipeline frontend

The new changes to the pipeline in this were as
Figure 14: Nehalem pipeline backend

follows:


Loop Stream Detector – detected and cached
loops to prevent fetching instructions from cache
and decoding them again and again



Improved Branch Predictor – Fetched branch

Figure 16: Improved Loop Stream Detector

instructions prior to execution based on an improved prediction algorithm


SSE 4+ - New instructions helpful for operations on database and DNA sequencing
were introduced.

Other changes to the architecture were:


HyperThreading – HT was reintroduced



Turbo Boost – The processor could
intelligently control its clock speed as per
application

requirements

and

thus,

dynamically conserve power. Unlike EIST,
no OS intervention is required.
Figure 17: Nehalem CPU based PC architecture

Page | 11


QPI – QuickPath Interconnect was the new system bus replacing FSB. Intel had moved
the memory controller on to the CPU die.



L3 Cache – shared between all 4 cores

The next tick was in 2010 codenamed Westmere with process shrinking to 32nm.

The SandyBridge Microarchitecture
The next Tock was in 2011 with the SandyBridge microarchitecture also marketed as
2nd generation of i3, i5 and i7 processors. With SandyBridge, Intel surpassed the 1 Billion
transistor count mark. The architectural improvements in this generation can be summarised
in the diagram below:

Figure 18: Sandybridge architecture overview at IDF

Changes to the pipeline were as follows:



A Micro-op Cache - When SB’s fetch hardware grabs a new instruction it first checks
to see if the instruction is in the micro-op cache, if it is then the cache services the rest
of the pipeline and the front end is powered down. The decode hardware is a very
complex part of the x86 pipeline, turning it off saves a significant amount of power.

Page | 12
Backend
Figure 19: Sandybridge pipeline frontend



Redesigned Branch Prediction Unit – SB
caches twice as many branch targets as
Nehalem with much effective and longer
storage of history.



Figure 20: Sandybridge pipeline backend

Physical Register File - A physical register file stores micro-op operands in the register
file; as the micro-op travels down the OoO (Out of Order execution engine) it only
carries pointers to its operands and not the data itself. This significantly reduces the
power of the OoO hardware (moving large amounts of data around a chip eats power),
it also reduces die area further down the pipe. The die savings are translated into a larger
out of order window.



AVX Instruction Set – Advanced Vector Extensions are a group of instructions that
are suitable for floating point intensive calculations in multimedia, scientific and
financial applications. SB features 256 bit operands for this instructions set.

Other changes to the architecture were:


Ring On-Die Interconnect - With Nehalem/Westmere all cores, whether dual, quad or
six of them, had their own private path to the last level (L3) cache. That’s roughly 1000
wires per core. The problem with this approach is that it doesn’t work well for scaling
up in things that need access to the L3 cache. Sandy Bridge adds a GPU and video
transcoding engine on-die that share the L3 cache. Rather than laying out another 2000
wires to the L3 cache Intel introduced a ring bus
Page | 13


On-Die GPU and QuickSync - The Sandy Bridge GPU is on-die built out of the same
32nm transistors as the CPU cores. It gets equal access to the L3 cache. The GPU is
on its own power island and clock domain. The GPU can be powered down or clocked
up independently of the CPU. Graphics turbo is available on both desktop and mobile
parts.

QuickSync is a hardware acceleration technology for video transcoding.

Rendering videos will be faster and more efficient.


Multimedia Transcoding - Media processing in SB is composed
of two major components: video decode, and video encode. The
entire video pipeline is now decoded via fixed function units.
This is in contrast to Intel’s previous design that uses the EU array
for some video decode stages. SB processor power is cut in half
Figure 21: Video transcoding
capabilities of Nehalem

for HD video playback.


More Aggressive Turbo Boost

The next Tick was in 2012 with the IvyBridge microarchitecture. The die was shrinked to
a 22nm process. It was marketed as 3rd generation of i3, i5 and i7 processors. Intel used
FinFET tri-gate transistor structure for the first time. Comparisons of the new structure
released by Intel are provided below.

Figure 22: Typical planar transistor

Figure 23: FinFET Tri-Gate transistor

As the above diagram shows, a FinFET structure or a 3D gate (as Intel calls it) allows
for more control over the channel by maximizing the Gate area. This means high ON current
and extremely low leakage current. This directly translates into lower operating voltages, lower
TDPs and hence higher clock frequencies. Comparisons in terms of delay and operating
voltage between the two structures are shown to the right.
Page | 14
Figure 24: FinFET Delay vs Power

Figure 25: SEM photograph of fabricated FinFET trigate
transistors

A scanning electron microscope image of the actual
transistors fabricated are shown to the right. A single transistor consists of multiple Fins as
parallel conduction paths maximize current flow.

The Haswell Microarchitecture
Ivy Bridge was followed by the next Tock of 2013, the Haswell microarchitecture. It
is currently being marketed as the 4th generation of core i3, i5 and i7 processors.
Changes to the pipeline were as follows:


Wider Execution Unit - adds two more execution ports, one for integer math and
branches (port 6) and one for store address calculation (port 7). The extra ALU and
port does one of two things: either improve performance for integer heavy code, or
allow integer work to continue while FP math occupies ports 0 and 1.



AVX2 and FMA - The other major addition to the execution engine is support for
Intel's AVX2 instructions, including FMA (Fused Multiply-Add). Ports 0 & 1 now
include newly designed 256-bit FMA units. As each FMA operation is effectively two
floating point operations, these two units double the peak floating point throughput of
Haswell compared to Sandy/Ivy Bridge.

Page | 15
Backend
Figure 26: Haswell pipeline frontend

The architectural improvements in this
generation can be summarised as follows:


Improved L3 Cache – The cache
Figure 27: Haswell pipeline backend

bandwidth has been increased and is
now also capable of clocking itself separately from the Cores.


GPU and QuickSync – Notable performance improvements have been made to the ondie GPU.

QuickSync is a hardware acceleration technology for Multimedia

transcoding. Haswell improves on image quality and adds support for certain codecs
such as SVC, Motion JPEG and MPEG2.

Performance Comparisons
Before concluding this document, a comparison of the performance of these processors
has to be illustrated. The following graphs showcase performance improvements of Intel
processors over five generations starting with Conroe all the
way up to Haswell. Processor naming convention is as
illustrated to the right

Page | 16
Figure 28: Performance comparisons of 5 generations of Intel processors

Intel is about half a century old. From the 4004 to the current 4th generation of i7, i5
and i3 processors, a lot has changed in the electronics industry. But this is not the end. This
evolution will continue. Intel’s next Tick will be Broadwell scheduled for this year utilizing
14nm transistor technology.
Page | 17
Shift in Computing Trends
With its powerful x86 architecture, and excellent business strategy, Intel has managed
to dominate the PC market for almost as long as its age. Now, however, market analysts have
noticed a significant new shift in computing trends. More and more customers are losing
interest in the PC and moving towards more mobile computing platforms. The chart below
(Courtesy: Gartner) highlights this shift.

Market Share
5,00,000
4,50,000
4,00,000
3,50,000
3,00,000
2,50,000
2,00,000
1,50,000
1,00,000
50,000
0
2012

2013

PC (Desk and Notebook)

Ultramobile

Tablet

2014
Smartphone (Normalised by 4)

Figure 29: Market share of personal computing devices.

PC sales are beginning to drop as is evident. Meanwhile, the era of tablets and
smartphones is beginning. A common mistake many industry giants make is the lack of
importance they give to such shifts and end up losing it all. It happened with IBM (it lost the
PC market) and Intel will be no exception unless it is careful.

Advanced RISC Machines
The battle for the mainstream processor market has been fought between two main
protagonists, Intel and AMD, while semiconductor manufacturers like Sun and IBM
traditionally concentrated on the more specialist Unix server and workstation markets.
Unnoticed to many, another company rose to a point of dominance, with sales of chips based
on its technology far surpassing those of Intel and AMD combined. That pioneering company
is ARM Holdings, and while it's not a name that's on everyone's lips in the same way that the
Page | 18
'big two' are, indications suggest that this company will continue to go from strength to
strength.
Early 8-bit microprocessors like the Intel 8080 or the Motorola 6800 had only a few
simple instructions. They didn't even have an instruction to multiply two integer numbers, for
example, so this had to be done using long software routines involving multiple shifts and
additions. Working on the belief that hardware was fast but software was slow, subsequent
microprocessor development involved providing processors with more instructions to carry out
ever more complicated functions. Called the CISC (complicated instruction set computer)
approach, this was the philosophy that Intel adopted and that, more or less, is still followed by
today's latest Core i7 processors.
In the early 1980s a radically different philosophy called RISC (reduced instruction set
computer) was conceived. According to this model of computing, processors would have only
a few simple instructions but, as a result of this simplicity, those instructions would be superfast, most of them executing in a single clock cycle. So while much more of the work would
have to be done in the software, an overall gain in performance would be achievable. ARM
was established on this philosophy.
Semiconductor companies usually design their chips and fabricate them at their own
facility (like Intel) or lease it to a foundry such as TSMC. However, ARM designs processors
but neither manufactures silicon chips nor markets ARM-branded hardware. Instead it sells, or
more accurately licences, intellectual property (IP), which allows other semiconductor
companies to manufacture ARM-based hardware. Designs are supplied as a circuit description,
from which the manufacturer creates a physical design to meet the needs of its own
manufacturing processes. It's provided in a hardware description language that provides a
textual definition of how the building blocks connect together. The language used is RTL
(register transfer-level).

System on Chip (SoCs)
A processor is the large component that forms the heart of the PC. A core, on the other
hand, is the heart of a microprocessor that semiconductor manufacturers can build into their
own custom chip designs. That customised chip will often be much more than what most people
would think of as a processor, and could provide a significant proportion of the functionality
required in a particular device. Referred to as a system on chip (SoC) design, this type of chip
Page | 19
minimises the number of components, which, in turn, keeps down both the cost and the size of
the circuit board, both of which are essential for high volume portable products such as
smartphones.
ARM powered SoCs are included in games consoles, personal media players, set-top
boxes, internet radios, home automation systems, GPS receivers, ebook readers, TVs, DVD
and Blu-ray players, digital cameras and home media servers. Cheaper, less powerful chips
are found in home products, including toys, cordless phones and even coffee makers. They're
even used in cars to drive dashboard displays, anti-lock breaking, airbags and other safetyrelated systems, and for engine management. Also, healthcare products is a major growth area

Figure 30: A smartphone SoC; Qualcomm's OMAP

over the last five years, with products varying from remote patient monitoring systems to
medical imaging scanners. ARM devices are used extensively in hard disk and solid state
drives. They also crop up in wireless keyboards, and are used as the driving force behind
printers and networking devices like wireless router/access points.
Modern SoCs also come with advanced (DirectX-9 equivalent) graphics capabilities
that can surpass game consoles like the Nintendo Wii. Imagination Technologies, which was
once known in the PC world with its “PowerVR” graphics cards, licenses its graphics
processors designs to many SoC makers, including Samsung, Apple and many more. Others
like Qualcomm or NVIDIA design their own graphics architecture. Qualcomm markets its
Page | 20
products under the OMAP series. NVIDIA markets under Tegra brand and other companies
such as Apple market theirs as A series. HTC, LG, Nokia and other smartphone manufacturers
do not design their own SoCs but use the above mentioned.
Finally, SoCs come with a myriad of smaller co-processors that are critical to overall
system performance. The video encoding and decoding hardware powers the video
functionality of smartphones. The image processor ensures that photos are processed properly
and saved quickly and the audio processor frees the CPU(s) from having to work on audio
signals. Together, all those components -and their associated drivers/software- define the
overall performance of a system.

Figure 31: A SoC for tablet; Nvidia TEGRA

Page | 21
Conclusion
Computers have truly revolutionized our world and have changed the way we work,
communicate and entertain ourselves. Fuelled by constant innovations in chip design and
transistor technology this evolution doesn’t seem to be bothered to stop. In recent years, there
have been tremendous shifts in computing trends with mobile computers such as tablets and
smartphones becoming more and more preferable, possibly, due to lowering costs and prices.
While computing did start with the microprocessor, it is headed towards a scheme that
incorporates the microprocessor as a smaller subset of a larger system. One that incorporates
graphics, memory, modem and video transcoding co processors on a single chip. The SoC era
has begun…

Page | 22
References
[1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic
Architecture, [online] Available: http://www.intel.com/products/processor/manuals
[2] King, J. ; Quinnell, E. ; Galloway, F. ; Patton, K. ; Seidel, P. ; Dinh, J. ; Hai Bui and
Bhowmik, A., "The Floating-Point Unit of the Jaguar x86 Core," in 21st IEEE
Symposium on Computer Arithmetic (ARITH), 2013, pp. 7-16.
[3] Ibrahim, A.H. ; Abdelhalim, M.B. ; Hussein, H. ; Fahmy, A., "Analysis of x86
instruction set usage for Windows 7 applications," in 2nd International Conference on
Computer Technology and Development (ICCTD), 2010, pp. 511-516.
[4] PC Architecture, Acid Reviews, [online] 2014,
http://acidreviews.blogspot.in/2008/12/pc-architecture.html (Accessed: 2nd February
2014).
[5] Alpert, D. and Avnon, D., "Architecture of the Pentium microprocessor," IEEE
Micro, vol. 13, Issue 3, pp. 11-21, 1993.
[6] Computer Processor History, Computer Hope, [online] 2014,
http://www.computerhope.com/history/processor.htm (Accessed: 2nd February 2014).
[7] Gartner Press Release, Gartner Analyst, [online] 2014,
http://www.gartner.com/newsroom/id/2610015 (Accessed: 8th February 2014).
[8] Intel Processor Number, CPU World, [online] 2014, http://www.cpuworld.com/info/Intel/processor-number.html (Accessed: 9th February 2014).

Contenu connexe

Tendances

Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
Jonah McLeod
 
01 intel processor architecture core
01 intel processor architecture core01 intel processor architecture core
01 intel processor architecture core
sssuhas
 
Sample_HEngineering
Sample_HEngineeringSample_HEngineering
Sample_HEngineering
Zachary Job
 
The computer generations
The computer generationsThe computer generations
The computer generations
Suman Mnv
 

Tendances (20)

2nd generation of computer
2nd generation of computer2nd generation of computer
2nd generation of computer
 
Branrel Santos Powerpoint Presentation
Branrel Santos  Powerpoint PresentationBranrel Santos  Powerpoint Presentation
Branrel Santos Powerpoint Presentation
 
3rd generation computer
3rd generation computer3rd generation computer
3rd generation computer
 
Microprocesser
MicroprocesserMicroprocesser
Microprocesser
 
Intel Processors
Intel ProcessorsIntel Processors
Intel Processors
 
Brochure (2016-01-30)
Brochure (2016-01-30)Brochure (2016-01-30)
Brochure (2016-01-30)
 
Generations of computer
Generations of computerGenerations of computer
Generations of computer
 
Difference between soc and single board computer ppt1
Difference between soc and single board computer ppt1Difference between soc and single board computer ppt1
Difference between soc and single board computer ppt1
 
Five generations-of-computers
Five generations-of-computersFive generations-of-computers
Five generations-of-computers
 
01 intel processor architecture core
01 intel processor architecture core01 intel processor architecture core
01 intel processor architecture core
 
L15 micro evlutn
L15 micro evlutnL15 micro evlutn
L15 micro evlutn
 
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
My amazing journey from mainframes to smartphones  chm lecture aug 2014 finalMy amazing journey from mainframes to smartphones  chm lecture aug 2014 final
My amazing journey from mainframes to smartphones chm lecture aug 2014 final
 
Sample_HEngineering
Sample_HEngineeringSample_HEngineering
Sample_HEngineering
 
The evolution of computers
The evolution of computersThe evolution of computers
The evolution of computers
 
Industrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric computeIndustrial trends in heterogeneous and esoteric compute
Industrial trends in heterogeneous and esoteric compute
 
Evolution of Computer
Evolution of Computer Evolution of Computer
Evolution of Computer
 
MYC-YA15XC-T CPU Module
MYC-YA15XC-T CPU ModuleMYC-YA15XC-T CPU Module
MYC-YA15XC-T CPU Module
 
The computer generations
The computer generationsThe computer generations
The computer generations
 
Journey of computing
Journey of computingJourney of computing
Journey of computing
 
MYD-YA15XC-T Development Board
MYD-YA15XC-T Development BoardMYD-YA15XC-T Development Board
MYD-YA15XC-T Development Board
 

Similaire à Evolution of Computing Microprocessors and SoCs

Mother board & Processor
Mother board & ProcessorMother board & Processor
Mother board & Processor
Praveen Vs
 
Ba401 Intel Corporation
Ba401 Intel CorporationBa401 Intel Corporation
Ba401 Intel Corporation
BA401NU
 
02 Computer Evolution And Performance
02  Computer  Evolution And  Performance02  Computer  Evolution And  Performance
02 Computer Evolution And Performance
Jeanie Delos Arcos
 

Similaire à Evolution of Computing Microprocessors and SoCs (20)

MICROCONTROLLRES NOTES.pdf
MICROCONTROLLRES NOTES.pdfMICROCONTROLLRES NOTES.pdf
MICROCONTROLLRES NOTES.pdf
 
Microprocessor
MicroprocessorMicroprocessor
Microprocessor
 
CPU HISTORY MUKUND
CPU HISTORY MUKUNDCPU HISTORY MUKUND
CPU HISTORY MUKUND
 
Microprocessors
MicroprocessorsMicroprocessors
Microprocessors
 
Mother board & Processor
Mother board & ProcessorMother board & Processor
Mother board & Processor
 
DileepB EDPS talk 2015
DileepB  EDPS talk 2015DileepB  EDPS talk 2015
DileepB EDPS talk 2015
 
Microprocessors and Applications
Microprocessors and ApplicationsMicroprocessors and Applications
Microprocessors and Applications
 
Computer Evolution
Computer EvolutionComputer Evolution
Computer Evolution
 
8085 micro processor- notes
8085 micro  processor- notes8085 micro  processor- notes
8085 micro processor- notes
 
Microprocessor and Positive and Negative Logic
Microprocessor and Positive and Negative LogicMicroprocessor and Positive and Negative Logic
Microprocessor and Positive and Negative Logic
 
Ba401 Intel Corporation
Ba401 Intel CorporationBa401 Intel Corporation
Ba401 Intel Corporation
 
Micro Lec 1
Micro Lec 1Micro Lec 1
Micro Lec 1
 
Computer project
Computer projectComputer project
Computer project
 
Intel i3 processor
 Intel i3 processor Intel i3 processor
Intel i3 processor
 
History of microprocessors copy
History of microprocessors   copyHistory of microprocessors   copy
History of microprocessors copy
 
Organisasi dan arsitektur komputer 2
Organisasi dan arsitektur komputer   2Organisasi dan arsitektur komputer   2
Organisasi dan arsitektur komputer 2
 
Generation of computer
Generation of computerGeneration of computer
Generation of computer
 
Generation of computer
Generation of computerGeneration of computer
Generation of computer
 
02 Computer Evolution And Performance
02  Computer  Evolution And  Performance02  Computer  Evolution And  Performance
02 Computer Evolution And Performance
 
Comp generations 09
Comp generations 09Comp generations 09
Comp generations 09
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Evolution of Computing Microprocessors and SoCs

  • 1. Evolution of Personal Computing by Microprocessors and SoCs For Credit Seminar: EEC7203 (Internal Assessment) Submitted To Dr. T. Shanmuganantham Associate Professor, Department of Electronics Engineering Azmath Moosa Reg No: 13304006 M. Tech 1st Yr Department of Electronics Engineering, School of Engg & Tech, Pondicherry University
  • 2. Abstract Throughout history, new and improved technologies have transformed the human experience. In the 20th century, the pace of change sped up radically as we entered the computing age. For nearly 40 years the Microprocessor driven by innovations of companies like Intel have continuously created new possibilities in the lives of people around the world. In this paper, I hope to capture the evolution of this amazing device that has raised computing to a whole new level and made it relevant in all fields – Engineering, Research, Medical, Academia, Businesses, Manufacturing, Commuting etc. I will highlight the significant strides made in each generation of Processors and the remarkable ways in which engineers overcame seemingly unsurmountable challenges and continued to push the evolution to where it is today. Page | i
  • 3. Table of Contents Title Page No. 1. Abstract i 2. Table of Contents ii 3. List of Figures iii 4. Introduction 1 5. X86 and birth of the PC 2 6. The Pentium 3 7. Pipelined Design 4 8. The Pentium 4 5 9. The Core Microarchitecture 7 10. Tick Tock Cadence 10 11. The Nehalem Microarchitecture 10 12. The SandyBridge Microarchitecture 12 13. The Haswell Microarchitecture 15 14. Performance Comparison 16 15. Shift in Computing Trends 18 16. Advanced RISC Machines 18 17. System on Chip (SoC) 19 18. Conclusion 22 19. References Page | ii
  • 4. List of Figures Figure 1: 4004 Layout Figure 2: Pentium Chip Figure 3: Pentium CPU based PC architecture Figure 4: Pentium 2 logo Figure 5: Pentium 3 logo Figure 6: Pentium 4 HT technology illustration Figure 7: NetBurst architecture feature presentation at Intel Developer Forum Figure 8: The NetBurst Pipeline Figure 9: The Core architecture feature presentation at Intel Developer Forum Figure 10: The Core architecture pipeline Figure 11: Macro fusion explained at IDF Figure 12: Power Management capabilities of Core architecture Figure 13: Intel's new tick tock strategy revealed at IDF Figure 14: Nehalem pipeline backend Figure 15: Nehalem pipeline frontend Figure 16: Improved Loop Stream Detector Figure 17: Nehalem CPU based PC architecture Figure 18: Sandybridge architecture overview at IDF Figure 19: Sandybridge pipeline frontend Figure 20: Sandybridge pipeline backend Figure 21: Video transcoding capabilities of Nehalem Figure 22: Typical planar transistor Figure 23: FinFET Tri-Gate transistor Figure 24: FinFET Delay vs Power Figure 25: SEM photograph of fabricated FinFET trigate transistors Figure 26: Haswell pipeline frontend Figure 27: Haswell pipeline backend Figure 28: Performance comparisons of 5 generations of Intel processors Figure 29: Market share of personal computing devices. Figure 30: A smartphone SoC; Qualcomm's OMAP Figure 31: A SoC for tablet; Nvidia TEGRA 1 3 4 4 4 6 6 7 8 8 9 9 10 11 11 11 11 12 13 13 14 14 14 15 15 16 16 17 18 20 21 Page | iii
  • 5. Introduction In 1969, Intel was found with aim of manufacturing memory devices. Their first product was Shottky TTL bipolar SRAM memory chip. A Japanese company – Nippon Calculating Machine Corporation approached Intel to design 12 custom chips for its new calculator. Intel engineers suggested a family of just four chips, including one that could be programmed for use in a variety of products. Intel designed a set of four chips known as the MCS-4. It included a central processing unit (CPU) chip—the 4004—as well as a supporting read-only memory (ROM) chip for the custom applications programs, a random-access memory (RAM) chip for processing data, and a shift-register chip for the input/output (I/O) Figure 1: 4004 Layout port. MCS-4 was a "building block" that engineers could purchase and then customize with software to perform different functions in a wide variety of electronic devices. And thus, the industry of the Microprocessor was born. 4004 had 2,300 pMOS transistors at 10um and was clocked at 740 kHz. 4 pins were multiplexed for both address and data (16 pin IC). In the very next year, the 8008 was introduced. It was an 8 bit processor clocked at 500 kHz with 3,500 pMOS transistors at the same 10um. It was actually slower with 0.05 MIPS (Millions of instructions per second) as compared to 4004 with 0.07. It was in 1974, that the 8080 with 10 times the performance of 8008 with a different transistor technology was launched. It used 4,500 NMOS transistors of size 6um. It was clocked at 2 MHz with a whopping 0.29 MIPS. Finally in March 1976, the 8085 clocked at 3 MHz with yet another newer transistor technology - depletion type NMOS transistors of size 3 um was launched. It was capable of 0.37 MIPS. The 8085 was a popular device of its time and is still used in universities across the globe to introduce students to microprocessors. Page | 1
  • 6. x86 and birth of the PC The 8086 16 bit processor made its debut in 1978. New techniques such as that of memory segmentation into banks to extend capacity and Pipelining to speed up execution were introduced. It was designed to be compatible with 8085 Assembly Mnemonics. It had 29,000 transistors of 3um channel length and was clocked at 5, 8 and 10 MHz with a full 0.75 MIPS at maximum clock. It was the father of what is now known as the x86 Architecture which eventually turned out to be Intel’s most successful line of processors that power many computing devices even today. Introduced soon after was the processor that powered the first PC – the 8088. Clocked at 5-8 MHz with 0.33-0.66 MIPS, it was 8086 with an external 8 bit bus. In 1981, a revolution seized the computer industry stirred by the IBM PC. By the late '70s, personal computers were available from many vendors, such as Tandy, Commodore, TI and Apple. Computers from different vendors were not compatible. Each vendor had their own architecture, their own operating system, their own bus interface, and their own software. Backed by IBM's marketing might and name recognition, the IBM PC quickly captured the bulk of the market. Other vendors either left the PC market (TI), pursued niche markets (Commodore, Apple) or abandoned their own architecture in favor of IBM's (Tandy). With a market share approaching 90%, the PC became a de-facto standard. Software houses wrote operating systems (MicroSoft DOS, Digital Research DOS), spread sheets (Lotus 123), word processors (WordPerfect, WordStar) and compilers (MicroSoft C, Borland C) that ran on the PC. Hardware vendors built disk drives, printers and data acquisition systems that connected to the PC's external bus. Although IBM initially captured the PC market, it subsequently lost it to clone vendors. Accustomed to being a monopoly supplier of mainframe computers, IBM was unprepared for the fierce competition that arose as Compaq, Leading Edge, AT&T, Dell, ALR, AST, Ampro, Diversified Technologies and others all vied for a share of the PC market. Besides low prices and high performance, the clone vendors provided one other very important thing to the PC market: an absolute hardware standard. In order to sell a PC clone, the manufacturer had to be able to guarantee that it would run all of the customer's existing PC software, and work with all of the customer's existing peripheral hardware. The only way to do this was to design the clone to be identical to the original IBM PC at the register level. Thus, the standard that the IBM PC defined became graven in stone as dozens of clone vendors shipped millions of machines that conformed to it in every detail. This standardization has been an important factor in the low cost and wide availability of PC systems. Page | 2
  • 7. 8086 and 80186/88 were limited to addressing 1M of memory. Thus, the PC was also limited to this range. This limitation was increased to 16 MB by 80286 released in 1982. It had max clock of 16 MHz with more than 2 MIPS. It had 134,000 transistors at 1.5um. The processors and the PC up to this point were all 16 bit. The 80386 range of processors, released in 1985, were the first 32 bit processors to be used in the PC. The first of these had 275,000 transistors at 1um and was clocked at 33 MHz with 5.1 MIPS. Its addressing range could be virtually 32 GB. Over the next few years, Intel modified the architecture and provided some improvements in terms of memory addressing range and clock speed. The 80486 range of processors, released in 1989, brought significant advancements in computing capability with a whopping 41 MIPS for a processor clocked at 50 MHz with 1.2 million transistors at 0.8 um or 800 nm. It had a new technique to speed up RAM read/writes with the Cache memory. It was integrated onto the CPU die and was referred to as level 1 or L1 cache (as opposed to the L2 cache available in the motherboard). As with the previous series, Intel slightly modified the architecture and released higher clocked versions over the next few years. The Pentium The Intel Pentium microprocessor was introduced in 1993. Its microarchitecture, dubbed P5, was Intel's fifth- generation and first 32 bit superscalar microarchitecture. Superscalar architecture is one in which multiple execution units or functional units (such as adders, shifters and multipliers) are provided and operate in parallel. As a direct extension of the 80486 architecture, it included dual integer pipelines, a faster floating-point unit, wider data bus, separate code and data caches and features for further reduced address Figure 2: Pentium Chip calculation latency. In 1996, the Pentium with MMX Technology (often simply referred to as Pentium MMX) was introduced with the same basic microarchitecture complemented with an MMX instruction set, larger caches, and some other enhancements. The Pentium was based on 0.8 um process technology, involved 3.1 million transistors and was clocked at 60 MHz with 100 MIPS. The Pentium was truly capable of addressing 4 GB of RAM without any operating system based virtualization. Page | 3
  • 8. The next microarchitecture was the P6 or the Pentium Pro released in 1995. It had an integrated L2 cache. One major change Intel brought to the PC architecture was the presence of FSB (Front Side Bus) that managed the CPU’s communications with the RAM and other IO. RAM and Graphics card were high speed peripherals and were interfaced through the Northbridge. Other IO devices like keyboard and speakers were interfaced Figure 3: Pentium CPU based PC architecture through the Southbridge. Pentium II followed it soon in 1997. It had MMX, improved 16 bit performance and had double the L2 cache. Pentium II had 7.5 million transistors starting with 0.35um process technology but later revisions utilised 0.25um transistors. Figure 4: Pentium 2 logo The Pentium III followed in 1999 with 9.5 million 0.25um transistors and a new instruction set SSE (Streaming SIMD Extensions) that assisted DSP and graphics processing. Intel was able to push the clock speed higher and higher with Pentium III with some variants clocked as high as 1 GHz. Figure 5: Pentium 3 logo Pipelined Design At a high level the goal of a CPU is to grab instructions from memory and execute those instructions. All of the tricks and improvements we see from one generation to the next just help to accomplish that goal faster. The assembly line analogy for a pipelined microprocessor is over used but that's because it is quite accurate. Rather than seeing one instruction worked on at a time, modern processors Page | 4
  • 9. feature an assembly line of steps that breaks up the grab/execute process to allow for higher throughput. The basic pipeline is as follows: fetch, decode, execute, and commit to memory. One would first fetch the next instruction from memory (there's a counter and pointer that tells the CPU where to find the next instruction). One would then decode that instruction into an internally understood format (this is key to enabling backwards compatibility). Next one would execute the instruction (this stage, like most here, is split up into fetching data needed by the instruction among other things). Finally one would commit the results of that instruction to memory and start the process over again. Modern CPU pipelines feature many more stages than what've been outlined above. Pipelines are divided into two halves. Frontend and Backend. The front end is responsible for fetching and decoding instructions, while the back end deals with executing them. The division between the two halves of the CPU pipeline also separates the part of the pipeline that must execute in order from the part that can execute out of order. Instructions have to be fetched and completed in program order (can't click Print until you click File first), but they can be executed in any order possible so long as the result is correct. Many instructions are either dependent on one another (e.g. C=A+B followed by E=C+D) or they need data that's not immediately available and has to be fetched from main memory (a process that can take hundreds of cycles, or an eternity in the eyes of the processor). Being able to reorder instructions before they're executed allows the processor to keep doing work rather than just sitting around waiting. This document aims to highlight changes to the x86 pipeline with each generation of processors. The Pentium 4 The NetBurst microarchitecture started with Pentium 4. This line of processors started in 2000 clocked at 1.4 GHz, 42 million transistors at 0.18 um process size and SSE2 instruction set. The early variants were codenamed Willamette (1.9 to 2.0 GHz) and later ones Northwood (up to 3.0 GHz) and Prescott. Page | 5
  • 10. The diagram is from Intel feature presentation of the NetBurst architecture. The Willamette was an early variant with SSE2, Rapid Execution engine (in which ALUs operate at twice the core clock frequency) and Instruction Trace Cache (ITC cached decoded instructions for faster loop execution). HT Technology refers to the prevention of Figure 7: NetBurst architecture feature presentation at Intel Developer Forum CPU wastage by assigning it to execute one thread or application when another one waits for data from RAM to arrive. This essentially acts like a dual processor system. Figure 6: Pentium 4 HT technology illustration The NetBurst pipeline was 20 stages long. As illustrated in the figure to the right, the BTB (Branch Target Buffer) helps to define the address of the next micro-op in the trace cache (TC Nxt IP). Then micro-ops are fetched out of the trace cache (TC Fetch) and are transferred (Drive) into the RAT (register alias table). After that, the necessary resources are allocated (such as loading queues, storing buffers etc. (Alloc)), and there comes logic registers rename (Rename). Micro-ops are put in the Queue until there appears free place in the Schedulers. There, micro-ops' dependencies are to be solved, and then micro-ops are transferred to the register files of the corresponding Dispatch Units. There, a micro-op is executed, and Flags are calculated. When implementing the jump instruction, the real branch address and the predicted Page | 6
  • 11. one are to be compared (Branch Check). After that the new address is recorded in the BTB (Drive). Northwood and Prescott were later variations with certain enhancements as illustrated in the diagram above. Processor specific details are unnecessary. The next major advancement was the 64 bit NetBurst released in 2005. The Prescott line up continued with maximum clock speeds of 3.8 GHz, transistor sizes of 0.09um. It had 2MB cache and EIST (Enhanced Intel SpeedStep Technology – allowing dynamic processor clock speed scaling through software). EIST was particularly useful for mobile processors as a lot of power was conserved when running at low clock speeds. NetBurst family continued to grow with the Pentium D (dual core HT disabled processors) and Pentium Extreme Figure 8: The NetBurst Pipeline Edition processors (Dual core with HT enabled). The Core Microarchitecture The high power consumption and heat intensity, the resulting inability to effectively increase clock speed, and other shortcomings such as the inefficient pipeline were the primary reasons for which Intel abandoned the NetBurst microarchitecture and switched to completely different architectural design, delivering high efficiency through a small pipeline rather than high clock speeds. Intel’s solution was the Core microarchitecture released in 2006. The first of these were sold under the brand name of “Core 2” with duo and quad variants (dual and quad CPUs). Page | 7
  • 12. Merom was for mobile computing, Conroe was for desktop systems, and Woodcrest was for servers and workstations. While architecturally identical, the three processor differed lines in the socket used, bus speed, and power Figure 9: The Core architecture feature presentation at Intel Developer Forum consumption. The diagram below illustrates the Conroe architecture. 14 stage pipeline of the Core architecture was a trade-off between long and short pipeline designs. The architectural highlights of this generation are given below. Wide Dynamic Execution referred to two things. First, the ability of the processor to fetch, dispatch, execute and return four instructions simultaneously. Second, a technique Figure 10: The Core architecture pipeline called Macro fusion in which two x86 instructions could be combined into a single micro-op to increase performance. Page | 8
  • 13. Figure 11: Macro fusion explained at IDF Figure 12: Power Management capabilities of Core architecture In previous generations, the ALU typically broke instructions into two blocks, which resulted in two micro ops and thus two execution clock cycles. In this generation, Intel extended the execution width of the ALU and the load/store units to 128 bits, allowing for eight single precision or four double precision blocks to be processed per cycle. The feature was called Advanced Digital Media Boost, because it applied to SSE instructions which were utilised by Multimedia transcoding applications. Intel Advanced Smart Cache referred to the unified L2 cache that allowed for a large L2 cache to be shared by two processing cores (2 MB or 4 MB). Caching was more effective now because data was no longer stored twice into different L2 caches any more (no replication). This freed up the system bus from being overloaded with RAM read/write activity as each core could share data directly through the cache. The Smart Memory Access feature referred to the inclusion of prefetchers. A prefetcher gets data into a higher level unit using very speculative algorithms. It is designed to provide data that is very likely to be requested soon, which can reduce memory access latency and increase efficiency. The memory prefetchers constantly have a look at memory access patterns, trying to predict if there is something they could move into the L2 cache from RAM - just in case that data could be requested next. Intelligent Power Capability was a culmination of many techniques. The 65-nm process provided a good basis for efficient ICs. Clock gating and sleep transistors made sure that all units as well as single transistors that were not needed remained shut down. Enhanced SpeedStep still reduced the clock speed when the system was idle or under a low load and was also capable of controlling each core separately. Some features were also available such as Execute Disable Bit by which an operating system with support for the bit may mark certain areas of memory as non-executable. The processor will then refuse to execute any code residing in these areas of memory. The general technique, known as executable space Page | 9
  • 14. protection, is used to prevent certain types of malicious software from taking over computers by inserting their code into another program's data storage area and running their own code from within this section; this is known as a buffer overflow attack. It is also to be noted that HyperThreading was removed. Tick-Tock Cadence Since 2007, Intel adopted a "Tick-Tock" model to follow every microarchitectural change with a die shrink of the process technology. Every "tick" is a shrinking of process technology of the previous microarchitecture and every "Tock" is a new microarchitecture. Every year to 18 months, there is expected to be one Tick or Tock. Figure 13: Intel's new tick tock strategy revealed at IDF In 2007, the Core microarchitecture underwent a “Tick” to the 45 nm process. Processors were codenamed Penryn. Process shrinking always brings down energy consumption and improves power savings. The Nehalem Microarchitecture The next Tock was introduced in 2008 with the Nehalem microarchitecture. The transistor count in this generation was nearing the Billion mark with around 700 million transistors in the i7. The pipeline frontend and backend are illustrated below. Page | 10
  • 15. Backend Figure 15: Nehalem pipeline frontend The new changes to the pipeline in this were as Figure 14: Nehalem pipeline backend follows:  Loop Stream Detector – detected and cached loops to prevent fetching instructions from cache and decoding them again and again  Improved Branch Predictor – Fetched branch Figure 16: Improved Loop Stream Detector instructions prior to execution based on an improved prediction algorithm  SSE 4+ - New instructions helpful for operations on database and DNA sequencing were introduced. Other changes to the architecture were:  HyperThreading – HT was reintroduced  Turbo Boost – The processor could intelligently control its clock speed as per application requirements and thus, dynamically conserve power. Unlike EIST, no OS intervention is required. Figure 17: Nehalem CPU based PC architecture Page | 11
  • 16.  QPI – QuickPath Interconnect was the new system bus replacing FSB. Intel had moved the memory controller on to the CPU die.  L3 Cache – shared between all 4 cores The next tick was in 2010 codenamed Westmere with process shrinking to 32nm. The SandyBridge Microarchitecture The next Tock was in 2011 with the SandyBridge microarchitecture also marketed as 2nd generation of i3, i5 and i7 processors. With SandyBridge, Intel surpassed the 1 Billion transistor count mark. The architectural improvements in this generation can be summarised in the diagram below: Figure 18: Sandybridge architecture overview at IDF Changes to the pipeline were as follows:  A Micro-op Cache - When SB’s fetch hardware grabs a new instruction it first checks to see if the instruction is in the micro-op cache, if it is then the cache services the rest of the pipeline and the front end is powered down. The decode hardware is a very complex part of the x86 pipeline, turning it off saves a significant amount of power. Page | 12
  • 17. Backend Figure 19: Sandybridge pipeline frontend  Redesigned Branch Prediction Unit – SB caches twice as many branch targets as Nehalem with much effective and longer storage of history.  Figure 20: Sandybridge pipeline backend Physical Register File - A physical register file stores micro-op operands in the register file; as the micro-op travels down the OoO (Out of Order execution engine) it only carries pointers to its operands and not the data itself. This significantly reduces the power of the OoO hardware (moving large amounts of data around a chip eats power), it also reduces die area further down the pipe. The die savings are translated into a larger out of order window.  AVX Instruction Set – Advanced Vector Extensions are a group of instructions that are suitable for floating point intensive calculations in multimedia, scientific and financial applications. SB features 256 bit operands for this instructions set. Other changes to the architecture were:  Ring On-Die Interconnect - With Nehalem/Westmere all cores, whether dual, quad or six of them, had their own private path to the last level (L3) cache. That’s roughly 1000 wires per core. The problem with this approach is that it doesn’t work well for scaling up in things that need access to the L3 cache. Sandy Bridge adds a GPU and video transcoding engine on-die that share the L3 cache. Rather than laying out another 2000 wires to the L3 cache Intel introduced a ring bus Page | 13
  • 18.  On-Die GPU and QuickSync - The Sandy Bridge GPU is on-die built out of the same 32nm transistors as the CPU cores. It gets equal access to the L3 cache. The GPU is on its own power island and clock domain. The GPU can be powered down or clocked up independently of the CPU. Graphics turbo is available on both desktop and mobile parts. QuickSync is a hardware acceleration technology for video transcoding. Rendering videos will be faster and more efficient.  Multimedia Transcoding - Media processing in SB is composed of two major components: video decode, and video encode. The entire video pipeline is now decoded via fixed function units. This is in contrast to Intel’s previous design that uses the EU array for some video decode stages. SB processor power is cut in half Figure 21: Video transcoding capabilities of Nehalem for HD video playback.  More Aggressive Turbo Boost The next Tick was in 2012 with the IvyBridge microarchitecture. The die was shrinked to a 22nm process. It was marketed as 3rd generation of i3, i5 and i7 processors. Intel used FinFET tri-gate transistor structure for the first time. Comparisons of the new structure released by Intel are provided below. Figure 22: Typical planar transistor Figure 23: FinFET Tri-Gate transistor As the above diagram shows, a FinFET structure or a 3D gate (as Intel calls it) allows for more control over the channel by maximizing the Gate area. This means high ON current and extremely low leakage current. This directly translates into lower operating voltages, lower TDPs and hence higher clock frequencies. Comparisons in terms of delay and operating voltage between the two structures are shown to the right. Page | 14
  • 19. Figure 24: FinFET Delay vs Power Figure 25: SEM photograph of fabricated FinFET trigate transistors A scanning electron microscope image of the actual transistors fabricated are shown to the right. A single transistor consists of multiple Fins as parallel conduction paths maximize current flow. The Haswell Microarchitecture Ivy Bridge was followed by the next Tock of 2013, the Haswell microarchitecture. It is currently being marketed as the 4th generation of core i3, i5 and i7 processors. Changes to the pipeline were as follows:  Wider Execution Unit - adds two more execution ports, one for integer math and branches (port 6) and one for store address calculation (port 7). The extra ALU and port does one of two things: either improve performance for integer heavy code, or allow integer work to continue while FP math occupies ports 0 and 1.  AVX2 and FMA - The other major addition to the execution engine is support for Intel's AVX2 instructions, including FMA (Fused Multiply-Add). Ports 0 & 1 now include newly designed 256-bit FMA units. As each FMA operation is effectively two floating point operations, these two units double the peak floating point throughput of Haswell compared to Sandy/Ivy Bridge. Page | 15
  • 20. Backend Figure 26: Haswell pipeline frontend The architectural improvements in this generation can be summarised as follows:  Improved L3 Cache – The cache Figure 27: Haswell pipeline backend bandwidth has been increased and is now also capable of clocking itself separately from the Cores.  GPU and QuickSync – Notable performance improvements have been made to the ondie GPU. QuickSync is a hardware acceleration technology for Multimedia transcoding. Haswell improves on image quality and adds support for certain codecs such as SVC, Motion JPEG and MPEG2. Performance Comparisons Before concluding this document, a comparison of the performance of these processors has to be illustrated. The following graphs showcase performance improvements of Intel processors over five generations starting with Conroe all the way up to Haswell. Processor naming convention is as illustrated to the right Page | 16
  • 21. Figure 28: Performance comparisons of 5 generations of Intel processors Intel is about half a century old. From the 4004 to the current 4th generation of i7, i5 and i3 processors, a lot has changed in the electronics industry. But this is not the end. This evolution will continue. Intel’s next Tick will be Broadwell scheduled for this year utilizing 14nm transistor technology. Page | 17
  • 22. Shift in Computing Trends With its powerful x86 architecture, and excellent business strategy, Intel has managed to dominate the PC market for almost as long as its age. Now, however, market analysts have noticed a significant new shift in computing trends. More and more customers are losing interest in the PC and moving towards more mobile computing platforms. The chart below (Courtesy: Gartner) highlights this shift. Market Share 5,00,000 4,50,000 4,00,000 3,50,000 3,00,000 2,50,000 2,00,000 1,50,000 1,00,000 50,000 0 2012 2013 PC (Desk and Notebook) Ultramobile Tablet 2014 Smartphone (Normalised by 4) Figure 29: Market share of personal computing devices. PC sales are beginning to drop as is evident. Meanwhile, the era of tablets and smartphones is beginning. A common mistake many industry giants make is the lack of importance they give to such shifts and end up losing it all. It happened with IBM (it lost the PC market) and Intel will be no exception unless it is careful. Advanced RISC Machines The battle for the mainstream processor market has been fought between two main protagonists, Intel and AMD, while semiconductor manufacturers like Sun and IBM traditionally concentrated on the more specialist Unix server and workstation markets. Unnoticed to many, another company rose to a point of dominance, with sales of chips based on its technology far surpassing those of Intel and AMD combined. That pioneering company is ARM Holdings, and while it's not a name that's on everyone's lips in the same way that the Page | 18
  • 23. 'big two' are, indications suggest that this company will continue to go from strength to strength. Early 8-bit microprocessors like the Intel 8080 or the Motorola 6800 had only a few simple instructions. They didn't even have an instruction to multiply two integer numbers, for example, so this had to be done using long software routines involving multiple shifts and additions. Working on the belief that hardware was fast but software was slow, subsequent microprocessor development involved providing processors with more instructions to carry out ever more complicated functions. Called the CISC (complicated instruction set computer) approach, this was the philosophy that Intel adopted and that, more or less, is still followed by today's latest Core i7 processors. In the early 1980s a radically different philosophy called RISC (reduced instruction set computer) was conceived. According to this model of computing, processors would have only a few simple instructions but, as a result of this simplicity, those instructions would be superfast, most of them executing in a single clock cycle. So while much more of the work would have to be done in the software, an overall gain in performance would be achievable. ARM was established on this philosophy. Semiconductor companies usually design their chips and fabricate them at their own facility (like Intel) or lease it to a foundry such as TSMC. However, ARM designs processors but neither manufactures silicon chips nor markets ARM-branded hardware. Instead it sells, or more accurately licences, intellectual property (IP), which allows other semiconductor companies to manufacture ARM-based hardware. Designs are supplied as a circuit description, from which the manufacturer creates a physical design to meet the needs of its own manufacturing processes. It's provided in a hardware description language that provides a textual definition of how the building blocks connect together. The language used is RTL (register transfer-level). System on Chip (SoCs) A processor is the large component that forms the heart of the PC. A core, on the other hand, is the heart of a microprocessor that semiconductor manufacturers can build into their own custom chip designs. That customised chip will often be much more than what most people would think of as a processor, and could provide a significant proportion of the functionality required in a particular device. Referred to as a system on chip (SoC) design, this type of chip Page | 19
  • 24. minimises the number of components, which, in turn, keeps down both the cost and the size of the circuit board, both of which are essential for high volume portable products such as smartphones. ARM powered SoCs are included in games consoles, personal media players, set-top boxes, internet radios, home automation systems, GPS receivers, ebook readers, TVs, DVD and Blu-ray players, digital cameras and home media servers. Cheaper, less powerful chips are found in home products, including toys, cordless phones and even coffee makers. They're even used in cars to drive dashboard displays, anti-lock breaking, airbags and other safetyrelated systems, and for engine management. Also, healthcare products is a major growth area Figure 30: A smartphone SoC; Qualcomm's OMAP over the last five years, with products varying from remote patient monitoring systems to medical imaging scanners. ARM devices are used extensively in hard disk and solid state drives. They also crop up in wireless keyboards, and are used as the driving force behind printers and networking devices like wireless router/access points. Modern SoCs also come with advanced (DirectX-9 equivalent) graphics capabilities that can surpass game consoles like the Nintendo Wii. Imagination Technologies, which was once known in the PC world with its “PowerVR” graphics cards, licenses its graphics processors designs to many SoC makers, including Samsung, Apple and many more. Others like Qualcomm or NVIDIA design their own graphics architecture. Qualcomm markets its Page | 20
  • 25. products under the OMAP series. NVIDIA markets under Tegra brand and other companies such as Apple market theirs as A series. HTC, LG, Nokia and other smartphone manufacturers do not design their own SoCs but use the above mentioned. Finally, SoCs come with a myriad of smaller co-processors that are critical to overall system performance. The video encoding and decoding hardware powers the video functionality of smartphones. The image processor ensures that photos are processed properly and saved quickly and the audio processor frees the CPU(s) from having to work on audio signals. Together, all those components -and their associated drivers/software- define the overall performance of a system. Figure 31: A SoC for tablet; Nvidia TEGRA Page | 21
  • 26. Conclusion Computers have truly revolutionized our world and have changed the way we work, communicate and entertain ourselves. Fuelled by constant innovations in chip design and transistor technology this evolution doesn’t seem to be bothered to stop. In recent years, there have been tremendous shifts in computing trends with mobile computers such as tablets and smartphones becoming more and more preferable, possibly, due to lowering costs and prices. While computing did start with the microprocessor, it is headed towards a scheme that incorporates the microprocessor as a smaller subset of a larger system. One that incorporates graphics, memory, modem and video transcoding co processors on a single chip. The SoC era has begun… Page | 22
  • 27. References [1] Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture, [online] Available: http://www.intel.com/products/processor/manuals [2] King, J. ; Quinnell, E. ; Galloway, F. ; Patton, K. ; Seidel, P. ; Dinh, J. ; Hai Bui and Bhowmik, A., "The Floating-Point Unit of the Jaguar x86 Core," in 21st IEEE Symposium on Computer Arithmetic (ARITH), 2013, pp. 7-16. [3] Ibrahim, A.H. ; Abdelhalim, M.B. ; Hussein, H. ; Fahmy, A., "Analysis of x86 instruction set usage for Windows 7 applications," in 2nd International Conference on Computer Technology and Development (ICCTD), 2010, pp. 511-516. [4] PC Architecture, Acid Reviews, [online] 2014, http://acidreviews.blogspot.in/2008/12/pc-architecture.html (Accessed: 2nd February 2014). [5] Alpert, D. and Avnon, D., "Architecture of the Pentium microprocessor," IEEE Micro, vol. 13, Issue 3, pp. 11-21, 1993. [6] Computer Processor History, Computer Hope, [online] 2014, http://www.computerhope.com/history/processor.htm (Accessed: 2nd February 2014). [7] Gartner Press Release, Gartner Analyst, [online] 2014, http://www.gartner.com/newsroom/id/2610015 (Accessed: 8th February 2014). [8] Intel Processor Number, CPU World, [online] 2014, http://www.cpuworld.com/info/Intel/processor-number.html (Accessed: 9th February 2014).