SlideShare une entreprise Scribd logo
1  sur  443
Chapter 1
Syllabus
Catalog Description: Computer structure, machine
representation of data,
addressing and indexing, computation and control instructions,
assembly
language and assemblers; procedures (subroutines) and data
segments,
linkages and subroutine calling conventions, loaders; practical
use of an
assembly language for computer implementation of illustrative
examples.
Course Goals
0 Knowledge of the basic structure of microcomputers -
registers, mem-
ory, addressing I/O devices, etc.
1 Knowledge of most non-privileged hardware instructions for
the Ar-
chitecture being studied.
2 Ability to write small programs in assembly language
3 Knowledge of computer representations of data, and how to
do simple
arithmetic in binary & hexadecimal, including conversions
4 Being able to implementing a moderately complicated
algorithm in
assembler, with emphasis on efficiency.
5 Knowledge of procedure calling conventions and interfacing
with high-
level languages.
Optional Text: Kip Irvine, Assembly Language for the IBM PC,
Prentice
Hall, 4th or 5th edition
1
Additional References: Intel and DOS API documentation as
presented
in Intel publications and online at www.x86.org; lecture notes
(to be sup-
plied as we go).
Prerequisites by Topic. Working knowledge of some
programming lan-
guage (102/103: C/C++); Minimal programming experience
Major Topics Covered in the Course:
1 Low-level and high-level languages; why learn assembler?
2 How does one study a new computer: the CPU, memory,
addressing
modes, operation modes.
3 History of the Intel family of microprocessors.
4-5 Registers; simple arithmetic instructions; byte order;
Arithmetic and
logical operations.
6 Implementing longer integer type support; carry and overflow.
7 Shifts, multiplication and division.
8 Memory layout.
9 Direct video memory access; discussion of the first project.
10 Assembler syntax; how to use the tools.
11-13 Conditional & unconditional jumps; loops; emulating
high-level lan-
guage constructions; Stack; call and return; procedures
14-15 String instructions: effcient memory-to-memory
operations.
16 Interrupts overview: interrupt table; how do interrupts work;
classif-
cation.
17 Summary of the most important interrupts.
18-20 DOS interrupt; File I/O functions; file-copy program;
discussion of
the second project
21 Interrupt handlers; keyboard drivers; timer-driven processes;
viruses
and virus-protection software.
2
22 Debug interrupts; how do debuggers and profilers work.
23-24 (Optional).interfacing with high level languages;
Protected mode fun-
damentals
Grading The grading is based on two projects, midterm project
is 49%
and the final is 51%. Please note that the projects are
individual, submitting
projects that are similar to submissions of others and/or are
essentially
downloads from the Web would result in a fail.
Office Hours My hours this term for CSc 210 will be 3:45 ¶Ł
4:45 on
Mondays.
Zoom links:
11am https://ccny.zoom.us/j/85378437821
2pm https://ccny.zoom.us/j/87625527827
3
https://ccny.zoom.us/j/87625527827
https://ccny.zoom.us/j/85378437821
Chapter 2
Preliminary material
4
: Why assembler?
• Why take this class?
• Why program assembler?
• Why know assembler?
5
: NOTE: think Binary!
Why binary?
Binary numbers (WIKI)
(brief answer: because this is easy to implement)
Why hex?
Hexadecimal numbers (WIKI)
(brief answer: because it is much easier to work with shorter
strings)
What about DNA?
6
https://en.wikipedia.org/wiki/Hexadecimal
https://en.wikipedia.org/wiki/Binary_number
2.1 Introduction #1: looking at new hard-
ware
• CPU, general purpose (arithmetic) registers
– How large?
– How many?
– Are they all the same?
– Modes?
• Memory Model
– Is all memory the same?
– Flat?
– Segmented?
– Paged?
• Other hardware (peripherals)
• OS
• Special features
7
2.2 Introduction #2: History
Intel Processors Over the Years
The History Of Intel CPUs
-1971 before Intel
1971 4004
• Intention
• Name
• Usage
• What can you do with 4 bits?
1972 8008
- Doubling – what can you do with 8 bits
1974 8080
1975 8085
1975 Z80
1974 CP/M – Digital Research, Gary Kildall
1978
- 8086 – X86 architecture. 8 8bit registers, 8(+6) 16 bit
registers. 1mb
limit. 1mb mystery?
1979 8088 – cost cutting
1981 iAPX 432 – an attempted 32 bit processor
1982 80186 – minor improvements/corrections
1981 IBM PC
8
https://en.wikipedia.org/wiki/CP/M
https://www.tomshardware.com/picturestory/710-history-of-
intel-cpus.html
https://www.businessnewsdaily.com/10817-slideshow-intel-
processors-over-the-years.html
2.3 Introduction #3: Fundamentals
Data types
1 bit
4 nibble
8 byte
16 word
32 dword, doubleword
64 qword, quadword
80 tenbyte
9
2.4 x86 CPU
10
Registers Overlap!
Problem: Let AH = 2,AL = 3. What is AX?
Solution
:
00000010 00000011
AH=00000010b=02h
AL=00000011b=03h
AX=0000001000000011b = 0203h = 515d
Note: suffices b, h and d are part of the Assembly language
syntax;
d(ecimal) is the default. Assignment syntax, however, is
different, it is only
used for an illustration here.
Fast solution:
AX = 2*256+3
AX = (2<<8)+3
Problem: Let AX = 2020. What are AL and AH ?
Registers BX,CX, DX are divided similarly.
General purpose aka Arithmetic registers:
Sequence A,B,C,D is an illusion, these letters stand for
Accumulator,
Base, Count, Data.
8 8bit registers: AH,AL,BH,BL,CH,CL,DH,DL 8 16bit registers:
AX,BX,CX,DX
and SI,DI,BP(?),SP(??)
SP generally cannot be used for calculations, BP usually cannot
be used
either.
(32 bit to be described later)
11
IP – Instruction pointer
points to the first byte of the current instruction.
Code:
B B B B B B B B B B B B B B B B B B
Code is essentially a one dimensional array of bytes (in C/C++
– un-
signed char type).
IP initially is 0, after one instruction is executed it should be 2,
then 5,
then 6, ....
Simplified logic (one instruction)
byte code[MAXCODE];
byte opcode;
opcode=code[IP++];
switch (opcode) {
case 0x00: ...
case 0x01: ...
...
case 0xFF: ...
}
each subcase will read additional bytes if needed to complete
reading of
the instruction.
12
Simplified logic (full execution)
byte code[MAXCODE];
byte opcode;
while(true) {
opcode=code[IP++];
switch(opcode) {
case 0x00: ...
....
case 0xFF: ...
}
}
Why is this simplified?
• CS is also used.
• how do we terminate?
• how do we change the executed sequence?
What do we do with this?
Is switch efficient?
Question: what would IP=k do (if such instruction exists).
13
FLAGS register
Should be seen not as a single 16-bit register but as a collection
of 16
1-bit registers.
More important ones: ZF, SF, CF, DF
Neither FLAGS nor the names above are keywords.
14
Segment registers : CS, DS, SS, ES – specify where segments
(“parts”) of
the program are located.
• CS Code Segment
• DS Data Segment
• SS Stack Segment
• ES Extra Segment
15
2.5 8086 registers – full list
• AX Accumulator eXtended
• AL Accumulator Low
• AH Accumulator High
• BX Base eXtended
• BL Base Low
• BH Base High
• CX Count eXtended
• CL Count Low
• CH Count High
• DX Data eXtended
• DL Data Low
• DH Data High
• SI Source Index
• DI Destination Index
• BP Base Pointer
• SP Stack Pointer
• CS Code Segment
• DS Data Segment
• SS Stack Segment
• ES Extra Segment
• IP Instruction Pointer (not a keyword)
• Flags Flags (not a keyword)
16
2.6 General addressing scheme
Three distinct ways to address memory:
• Absolute address : mem[offset] (flat model–generally cannot
be done)
• Segmented address : mem[f(seg,offset)] (done by hardware).
Usual
notation: ssss:oooo (hex digits)
• Expressing segmented address in assembly syntax – to be
covered
later
The f(seg,offset) function is mode-dependent.
In real mode, f(seg,offset)=seg*16+offset.
This allows to build 20 bit numbers out of 16 bit quantities.
Examples
0000:0000 =⇒ 00000
1234:5678 =⇒ 179B8
+ 12340
05678
--------
179B8
The mapping is not one-to-one! Different (seg,offset) pairs may
point to
the same address.
0000:0100 =⇒ 00100
0010:0000 =⇒ 00100
Puzzle
FFFF:FFFF =⇒ ?????
(ref: A10 address line)
==
17
Code segment is effectively mem[f(CS,i)], Data segment is
effectively
mem[f(DS,i)]
Protected memory addressing function uses Segment Descriptor
Table
lookup. Fields include Base, Limit, Access Rights.
Implication: instructions Segment<-value are very costly in
protected
mode.
18
2.7 Back to History: Original IBM PC (1981)
Distorted:
Timeline
IBM’s brand recognition, along with a massive marketing
campaign, ignites the fast growth of the personal computer mar -
ket with the announcement of its own personal computer (PC).
The first IBM PC, formally known as the IBM Model 5150, was
based on a 4.77 MHz Intel 8088 microprocessor and used Mi -
crosofts MS-DOS operating system. The IBM PC revolutionized
business computing by becoming the first PC to gain
widespread
adoption by industry. The IBM PC was widely copied
(“cloned”)
and led to the creation of a vast “ecosystem” of software, pe -
ripherals, and other commodities for use with the platform.
Better:
WIKIPEDIA article
Additional link (on reaction):
Orson Scott Card’s novel
19
https://en.wikipedia.org/wiki/Lost_Boys_(novel)
https://en.wikipedia.org/wiki/IBM_Personal_Computer
https://www.computerhistory.org/timeline/1981/
No OS !
Three options:
• CP/M-86 (Control program for Microcomputers), see also DR
page
• UCSD p-System
• PC DOS/MS DOS, see also 86-DOS
See also: PL/M
Introduction #2: History (cont)
1982 80186, 80188
1982-1991 80286
1985-2007 80386
80186 : almost not used in PC’s, many improvements in
instructions
(kept).
80286 : 16mb protected mode–promise not fullfilled.
Real mode −→−→ Prot mode
XENIX
20
https://en.wikipedia.org/wiki/Xenix
https://en.wikipedia.org/wiki/Intel_80386
https://en.wikipedia.org/wiki/Intel_80286
https://en.wikipedia.org/wiki/Intel_80186
https://en.wikipedia.org/wiki/PL/M
https://en.wikipedia.org/wiki/86-DOS
https://en.wikipedia.org/wiki/IBM_PC_DOS
https://en.wikipedia.org/wiki/UCSD_Pascal
http://www.digitalresearch.biz/CPM.HTM
https://en.wikipedia.org/wiki/CP/M
80386
• 32 bit
• 2 additional modes
• misc enhancements (debugging)
21
Doubling of registers again
EAX = xxxxxxxxxxxxxxxx ahahahah alalalal
22
Flags register becomes EFLAGS :
Additionally:
• Control Registers CR0..CR7 (CR0=MSW(Machine Status
Word) on
80286)
• Test Registers TR0..TR7
• Debug Registers DR0..DR7
64 bit mode adds RAX,...
23
24
On paging
Virtual memory allows to execute programs larger than physical
mem-
ory.
Generally cannot be controlled by the programmer, paging
algorithms
are implemented by the OS
Page replacement algorithms
Application algorithms can be tailored for paging environment.
Example:
#define N 1024
int x[N][N],y[N][N],z[N][N];
int i,j;
for (int i=0; i<N; i++)
for (int j=0; j<N; j++)
z[i][j]=x[i][j]+y[i][j];
vs
#define N 1024
int x[N][N],y[N][N],z[N][N];
int i,j;
for (int i=0; i<N; i++)
for (int j=0; j<N; j++)
z[j][i]=x[j][i]+y[j][i];
Will the two programs run equally fast ?
25
https://www.geeksforgeeks.org/page-replacement-algorithms-in-
operating-systems/
Assume 3 pages are available. (1 page is exactly a row of a
matrix
above.)
Two dimensional arrays are stored row by row.
First program : 1024 swaps.
Second program : 10242 swaps.
Technical info
26
https://wiki.osdev.org/Paging
2.8 Back to History: 32 bit OS?)
OS/2 1987-2001
27
https://en.wikipedia.org/wiki/OS/2
80486 : 1989
8087 : 1980
80187 (for 80186), 80287 (for 80286), 80387(for 80386).
Other coprocessors existed.
Stack design, 8 80-bit registers ST(0), ST(1),.. ST(7).
80486 = 80386 + 80387
Datatypes:
32bit single (float in C/C++)
64bit double
80bit extended (internal format)
Pentium : 1993-
1993 Pentium (P5), why not 80586? (80486.00+100.00=???)
28
https://en.wikipedia.org/wiki/Pentium
https://en.wikipedia.org/wiki/Pentium
https://en.wikipedia.org/wiki/Intel_8087
https://en.wikipedia.org/wiki/Intel_80486
1995 Pentium Pro (P6), MMX addition
1997 Pentium II
1999 Pentium III
2000 Pentium 4
MMX:
• MultiMedia eXtension
• Multiple Math eXtension
• Matrix Math eXtension
Intel Core (from 2006)
29
https://en.wikipedia.org/wi ki/Intel_Core
https://en.wikipedia.org/wiki/Pentium_4
https://en.wikipedia.org/wiki/Pentium_III
https://en.wikipedia.org/wiki/Pentium_II
https://en.wikipedia.org/wiki/Pentium_Pro
Chapter 3
Instructions
3.1 Overall structure of asm program
• Header – TBD
• Sequence of instructions
• Trailer – TBD
Instructions generally are written one per line (minor exceptions
later)
Instructions generally follow the following format:
[<label>:] <opcode> [<operands>] [;comment]
[<label>:] [;comment]
where
<label> – optional label (any identifier that is not a keyword or
defined oth-
erwise).
<opcode> – name of the instruction (keyword)
<operands> – comma-separated operands, if any; their number
(0-3) depends on
the opcode
;comment – any text, ignored up to the EOL.
Trivial example:
30
lab: ; this line does not do anything
Symbolic representation of instructions corresponds to
particular se-
quence of bytes which are actually executed.
3.2 The NOP instruction
NOP (do nothing)
Binary representation: one byte, hex value 90h.
Execution:
Before:
bb bb bb bb bb bb bb bb bb
↑IP
90 bb bb bb bb bb bb
After:
bb bb bb bb bb bb bb bb bb 90
↑IP
bb bb bb bb bb bb
IP is incremented by 1; no other register is changed
31
WHY have it?
• delay?
• padding for sloppy compilers
• patching (code deletion)
• reserving space for patching(code addition)
32
3.3 The MOV instruction
MOV dst,src (copy src to dst)
Example:
MOV AL,BL
;
; before : AL=3 BL=7
; after : AL=7 BL=7
Example:
MOV DL,CH
MOV DL,DL
MOV AX,CX
MOV AX,SP
MOV SP,CX ; very dangerous
MOV EDI,EDI
MOV EDI,ESP
MOV AL,CX ; illegal
MOV EDI,CX ; illegal
MOV IP,AX ; illegal
MOV AX,CS ; ok, special case (see below)
MOV DS,AX ; ok, special case (see below)
MOV CS,DX ; special case, illegal
MOV DS,EDI ; illegal
MOV CR0,EAX ; priveleged
MOV DR0,EAX ; ok, special case (see below)
RULE #1: size of src and dst must match
Most instructions support only gp regis-
ters
33
Argument types:
• (r)egister
• (m)emory
• (i)mmediate
• (s)pecial register
Argument size:
• (b)yte
• (w)ord
• (d)oubleword
• ...
MOV DL,CH ; brr instruction
34
General template for 2-arg instructions:
r m i
r . . .
m . . .
i . . .
Move-specific template:
r m i s
r . . . .
m . . . .
i . . . .
s . . . .
35
Right now:
r m i
r X . .
m . . .
i . . .
Examples:
MOV AL,[100] ; brm
MOV BX,[200] ; wrm
MOV EDI,[400] ; drm
MOV [100],AL ; bmr
MOV [200],BX ; wmr
MOV [400],EDI ; dmr
Thus
r m i
r X X .
m X . .
i . . .
What does [#] really mean?
Answer: bytes beginning with byte #.
in
MOV AX,[100]
which byte goes where?
36
Examples:
MOV AL,1 ; bri
MOV DX,2 ; wri
MOV EDI,4 ; dri
r m i
r X X X
m X . .
i . . .
Examples:
MOV AL,97 ;
MOV AL,61h ; all four lines are equivalent
MOV AL,01100001b
MOV AL,’a’ ;
...
MOV AL,1000 ; ???
37
No storing into immediates, this would be like
1=x;
in C.
Thus:
r m i
r X X X
m X . .
i × × ×
Important: MOV with immediate is a fundamentally different
operation
from the rr,rm, mr forms.
38
RULE #2: no memory-to-memory
(2 exceptions later)
Thus:
r m i
r X X X
m X × ?
i × × ×
MOV [100],1 ; should not compile
RULE #3: size must be known
Correct syntax:
MOV byte ptr [100],1
MOV word ptr [100],1
MOV dword ptr [100],1
MOV qword ptr [100],1 ; 64 bit only
MOV tbyte ptr [100],1 ; ???
What about
MOV [100],AL
MOV byte ptr [100],AL ; unneeded
MOV word ptr [100],AL ; will not compile
Final result:
r m i
r X X X
m X × X
i × × ×
39
Full table (MOV only):
r m i s
r X X X X
m X × X X
i × × × ×
s X X × ×
3.3.1 Examples
Here is how C/C++ assignments may be compiled:
char c1,c2; c1=c2;
-------------------
MOV AL,c2
MOV c1,AL
short s1,s2; s1=s2;
-------------------
MOV AX,s2
MOV s1,AX
int x,y; x=y;
-------------------
MOV EAX,y;
MOV x,EAX;
40
int x,y,z; x=y=z;
-------------------
MOV EAX,z;
MOV x,EAX;
MOV y,EAX;
int x; x=0;
-------------------
MOV x,0;
int x,y,z; x=y=z=0;
-------------------
MOV x,0
MOV y,0
MOV z,0
perhaps, a better implementation?
MOV EAX,0 ; could be even better
MOV x,EAX
MOV y,EAX
MOV z,EAX
41
Exercise: Exchange bytes in [100] and [101]
MOV AL,[100]
MOV AH,[101]
MOV [100],AH
MOV [101],AL
can this be done in fewer lines of code?
MOV AX,[100]
MOV [100],AH
MOV [101],AL
Note: Byte order matters.
42
3.3.2 Byte order
Consider:
MOV [100],AX
Does
LE,reversed AL go into [100] and AH into [101] or, instead:
BE,normal AH go into [100] and AL into [101]
More than you want to know on Endianness
LE,reversed : Intel, Dec
BE,normal : IBM mainframe, Motorola, Sun
Practical implications:
• it is important to know the endiness of the hardware and the
data.
• it is important to be able to swap.
• it is important to be able determine the endiness. How?
Specific example of byte order importance:
short s=1;
FILE *f=fopen("try.dat","wb");
if (!f) { ... error handling ... }
fwrite(&s,1,sizeof(s),f);
fclose(f);
Should create a 2-byte file try.dat.
Now,
43
https://en.wikipedia.org/wiki/Endianness
short s;
FILE *f=fopen("try.dat","rb");
if (!f) { ... error handling ... }
fread(&s,1,sizeof(s),f);
fclose(f);
cout << s;
should print the value of s – indeed 1.
But: what will happen if we run the Writing program on an Intel
comp,
move the data file to a Sun, and run the reading program there?
Exercise: Can a high-level program be written that determines
the order of bytes?
44
3.4 The XCHG instruction
XCHG dst,src (exchange src with dst)
XCHG r m i
r X X ×
m X × ×
i × × ×
Segment and other non-gp registers are not supported.
The syntax and examples from MOV apply, except for non-use
of non-gp
registers and immediates.
Examples (which of the following are valid?)
XCHG AL,AH
XCHG AX,SP
XCHG EAX,EDI
XCHG AL,[400]
XCHG [400],AL ;same as above
XCHG AL,DI
XCHG DI,DS
XCHG EAX,7
XCHG [100],[101]
XCHG AX,AX ; nop?
XCHG DI,DI ; nop?
XCHG CL,CL ; nop?
45
Can a better version of byte swap program be now written?
Better:
MOV AX,[100]
XCHG AL,AH
MOV [100],AX
Yet better:
XCHG AX,[100]
XCHG AL,AH
XCHG [100],AX
Q: can a shorter program be written (perhaps with another
instruc-
tion)?
46
3.4.1 Binary encoding of XCHG
We only consider accumulator exchanges now.
Instructions
XCHG AX,reg
are extra optimized in the intel architecture.
90h XCHG AX,AX
91h XCHG AX,CX
92h XCHG AX,DX
93h XCHG AX,BX
94h XCHG AX,SP
95h XCHG AX,BP
96h XCHG AX,SI
97h XCHG AX,DI
Q: Why the # of registers is a power of 2 ?
A: Because this allows to represent registers as in a fixed
number of bits.
47
16-bit register representation:
000b AX
001b CX
010b DX
011b BX
100b SP
101b BP
110b SI
111b DI
An emulator may use code like
unsigned short regs[8];
#define AX regs[0]
#define CX regs[1]
#define DX regs[2]
#define BX regs[3]
#define SP regs[4]
#define BP regs[5]
#define SI regs[6]
#define DI regs[7]
Notes:
• this is just an example!
• 8 bit registers have their own 3-bit keys
• 32 bit registers parallel 16 bit registers
• 64 bit registers use 4-bit keys
• The above code should define 8-bit regs properly (f.e. setting
AX
should set AL,AH too!
48
• The above code should be modified to support 32 bit registers
texttt{XCHG AX,AX}
is NOP. General encoding scheme of XCHG (with accumulator):
1 0 0 1 0 r e g
This idea is used in other instructions. XCHG without
accumulator uses
a l8engthier encoding, with first byte 86h/87h.
XCHG encoding
49
https://c9x.me/x86/html/file_module_x86_id_328.html
NOTE: MOV has several different forms, including optimized
forms for
the accumulator.
Similar scheme is used for the segment registers:
00b ES
01b CS
50
10b SS
11b DS
3.5 The ADD instruction
ADD dst,src (dst += src)
(proper name should be increment by.)
General 2-operand instruction layout applies:
ADD r m i
r X X X
m X × X
i × × ×
Given that syntax of ADD is largely similar to MOV, the
examples are sim-
ilar:
ADD AX,BX
ADD EAX,ESP
ADD DL,CL
ADD AX,[100]
ADD [150],EAX
ADD AX,DS ; illegal
ADD AX,DL ; illegal
ADD [10],5 ; syntax error
ADD word ptr [10],5 ; fine
C example:
int x,y,z; x=y+z;
-----
MOV EAX,y
51
ADD EAX,z
MOV x,EAX
int x,y,z; x=x+y;
-----
MOV EAX,y
ADD x,EAX
int x,y,z; x=x+25;
-----
ADD x,25
(NOTE: size specification is not required if x is declared to be a
double
word)
52
Consider
ADD AL,AL
Generally, multiplication by 2 should not be done as
multiplication
(generally about 3x slower than addition). Writing
int x; x=2*x;
is wrong! One should use either addition or a shift (if
available). (What is
better depends on the situation and hardware).
Q: Should we replace multiplication by addition in :
int f(int);
int x; x=2*f(x);
More simple examples:
Consider
ADD AL,0 ; nop ?
ADD AL,1 ; increment ?
ADD AL,-1 ; decrement ?
ADD AL,AL ; double
53
MOV AL,1 ; AL=1
ADD AL,AL ; AL=2
ADD AL,AL ; AL=?
ADD AL,AL ; AL=?
ADD AL,AL ; AL=?
ADD AL,AL ; AL=?
ADD AL,AL ; AL=?
ADD AL,AL ; AL=?
ADD AL,AL ; AL=?
ADD AL,AL ; AL=?
MOV AL,1 ; AL=1 binary |00000001
ADD AL,AL ; AL=2 binary |00000010
ADD AL,AL ; AL=4 binary |00000100
ADD AL,AL ; AL=8 binary |00001000
ADD AL,AL ; AL=16 binary |00010000
ADD AL,AL ; AL=32 binary |00100000
ADD AL,AL ; AL=64 binary |01000000
ADD AL,AL ; AL=128 binary |10000000
ADD AL,AL ; AL=0 binary 1|00000000 << overflow
ADD AL,AL ; AL=0 binary 00000000
Is this an assembler problem ?
unsigned char c;
c=1;
printf("%d",c); c=c+c;
printf("%d",c); c=c+c;
printf("%d",c); c=c+c;
printf("%d",c); c=c+c;
printf("%d",c); c=c+c;
....
Note: if you like C++ and cout<<, make sure to cast!
54
Q: what would be the output if we use char rather than unsigned
char?
Is this a size problem ? Try
MOV AX,1
ADD AX,AX
...
OR
MOV EAX,1
ADD EAX,EAX
...
OR
C/C++ versions.
55
Unlike MOV and XCHG, ADD is an arithmetic instruction: it
sets flags.
Warning: the discussion of the flags is slightly simplified, I’m
not con-
sidering the OF. Thus there are slight differences between the
behavior
described and the actual behavior of the processor. This makes
no differ-
ence for most programs, but there are rare instances where this
matters. In
particular, I will consider JS and JL as equivalent, in reality
they are not
exactly the same.
ZF Zero Flag
SF Sign Flag
CF Carry Flag
OF Overflow Flag
; ZF SF CF
MOV AL,1 ; AL=1 binary |00000001 ? ? ?
ADD AL,AL ; AL=2 binary |00000010 0 0 0
ADD AL,AL ; AL=4 binary |00000100 0 0 0
ADD AL,AL ; AL=8 binary |00001000 0 0 0
ADD AL,AL ; AL=16 binary |00010000 0 0 0
ADD AL,AL ; AL=32 binary |00100000 0 0 0
ADD AL,AL ; AL=64 binary |01000000 0 0 0
ADD AL,AL ; AL=128 binary |10000000 0 1 0
ADD AL,AL ; AL=0 binary 1|00000000 1 0 1 << overflow
ADD AL,AL ; AL=0 binary 00000000 1 0 0
WARNING: This is slightly simplified (there is also OF)
Flags can be used to
• implement conditionals (IF, WHILE,...)
• implement “long” arithmetic
• check for overflow
56
3.5.1 Overflow detection
unsigned int x,y,z;
....
x=y+z; // concern about overflow
unsigned int x,y,z;
....
y=0x90000000;
z=0x90000000;
x=y+z; // overflow will occur here, result will be incorrect.
can we check for it like this?
unsigned int x,y,z;
....
if (y+z>0xFFFFFFFF)
error("overflow");
x=y+z;
Correct way:
unsigned int x,y,z;
....
if (y>0xFFFFFFFF-z)
error("overflow");
x=y+z;
57
Exercise: what about signed types?
A: you will need to check both for “positive” overflow (adding
two large
positive number) and for the “negative;; overflow (adding two
large nega-
tive numbers).
In assembler, flags report overflow condition – no need for
extra check-
ing!
3.6 The SUB instruction
SUB dst,src (dst -= src)
(proper name should be decrement by.)
General 2-operand instruction layout applies:
SUB r m i
r X X X
m X × X
i × × ×
Given that syntax of SUB is identical to ADD, syntax examples
are similar
and omitted.
ADD AX,100
SUB AX,-100 ; same as above
;
ADD AX,-100
SUB AX,100 ; same as above
What do these instructions do?
ADD AX,0
SUB AX,0
58
What does this instruction do?
SUB EAX,EAX
Answer: most efficient way to zero up a register.
What is the difference between the two instructions below?
SUB EAX,EAX
MOV EAX,0
Answer: the former is more efficient; the latter is rarely used,
only in
the situations when flags must be preserved. (an example,
involving an if,
will be given later.)
Revising example we saw above, more efficient code:
int x,y,z; x=y=z=0;
-------------------
SUB EAX,EAX
MOV x,EAX
MOV y,EAX
MOV z,EAX
with SUB, Carry flag indicates borrowing.
59
3.7 The INC instruction
INC dst (dst++)
Do we write
ADD AX,1
ADD byte ptr [10],1
A: yes, we can, but usually we would use the optimized form
General 1-operand instruction layout applies:
INC r m i
X X ×
(Same format applies to three more instructions, explained
later).
Register form is optimized to one-byte encoding:
40h INC AX
41h INC CX
42h INC DX
43h INC BX
44h INC SP
45h INC BP
46h INC SI
47h INC DI
Other forms of INC are encoded in lengthier way beginning
with 0FFh
and 0FEh.
Warning: this encoding applies to BOTH 16 and 32 registers!
What is better?
60
INC AX
INC AX
;or
ADD AX,2
A: former. But do not do this with memory arguments.
61
3.8 The DEC instruction
DEC dst (dst- -)
DEC r m i
X X ×
Comments on INC above are applicable.
Optimized form:
48h DEC AX
49h DEC CX
4Ah DEC DX
4Bh DEC BX
4Ch DEC SP
4Dh DEC BP
4Eh DEC SI
4Fh DEC DI
Other forms of DEC are encoded in lengthier way beginning
with 0FFh
and 0FEh.
62
3.9 The NEG instruction
NEG dst (dst=-dst)
NEG r m i
X X ×
How to negate without using NEG?
NEG EAX
;is the same as
SUB EBX,EBX
SUB EBX,EAX
MOV EAX,EBX
Solve equation x = −x ?
MOV AL,x
NEG AL
if AL did not change, does this mean x is 0?
No, it is either 0 or 128!
For short, the solutions are 0 and 215=8000h, for 32-bit they are
....
63
#include <stdlib.h>
#include <stdio.h>
int main() {
short s,s1;
s=0; s1=-s; if (s!=s1) printf("for s=%d, changes;n",s);
else printf("for s=%d, does not change;n",s);
s=1000;s1=-s; if (s!=s1) printf("for s=%d, changes;n",s);
else printf("for s=%d, does not change;n",s);
s=0x8000;s1=-s; if (s!=s1) printf("for s=%d, changes;n",s);
else printf("for s=%d, does not change;n",s);
return 0;
}
}
results in
for s=0, does not change;
for s=1000, changes;
for s=-32768, does not change;
3.10 The CMP instruction
CMP dst,src (dst-src)
General 2-operand instruction layout applies:
CMP r m i
r X X X
m X × X
i × × ×
Conditionals (if,while) are implemented in two stages:
• compute flags
• do (or not) the operation depending on a flag(or flags) – this is
later.
64
Examples:
; if (x==0) ...
;
; compute ZF from value of x
; if (x!=0) ...
;
; compute ZF from value of x
; if (x<0) ...
;
; compute SF from value of x
; if (x<=0) ...
;
; compute SF and ZF from value of x
What about
; if (x==y) ...
;
;
to set the flags, we compute the difference
; if (x==y) ...
;
MOV EAX,x
SUB EAX,y
; this sets ZF for our use
65
; if (x<y) ...
;
MOV EAX,x
SUB EAX,y
; this sets SF for our use
; if (x<y) ...
;
MOV EAX,x
SUB EAX,y
; this sets SF and ZF for our use
Problem: this computation destroys value of x which is often
needed.
So, instead:
; if (x<y) ...
;
MOV EAX,x
CMP EAX,y
; this sets SF and ZF for our use
Flags are set exactly as for SUB, but EAX retains the value of
x.
; if (x==5) ...
;
MOV EAX,x
SUB EAX,5 ; inefficient
;
SUB x,5 ; worse yet: destroys variable
;
CMP x,5 ; fine
66
3.11 Logical or Bitwise
What kind of AND operation should be implemented in
hardware? (same
question of course can be asked of OR etc but we will
concentrate on AND)
• logical AND (in C, &&) ?
• bitwise AND (in C, &) ?
• both ?
Logical AND works with True and False concepts:
1 && 1 = 1
1 && 0 = 0
0 && 1 = 0
0 && 0 = 0
Q: what about 5 && 6 ?
(5 is first converted to 1 (true), 6 is converted to 1 (true), we
compute
1&&1 and get result 1).
This is a multi-step operation!
Bitwise AND: apply the AND operation to every bit (every
column):
0101 == 5
& 0110 == 6
----
0100 == 4
Thus 5 & 6 = 4.
Notice that on single bits (or multibit 0 and 1), the results of &
and &&
are identical.
1 & 1 = 1
1 & 0 = 0
0 & 1 = 0
0 & 0 = 0
67
We summarize:
• On appropriate logical values, logical and bitwise AND are the
same.
• Logical AND is a multistep operation – hard to implement in
hard-
ware.
• Bitwise AND can be done easily and efficiently (array of
AND-gates)
• on non-standard input values, logical AND offers minimal
advantages
(how often do we care about 5&&6?)
• on non-standard input values, bitwise AND offers huge
advantages:
– masks
– sets
– images
3.11.1 Bits and Masks
Most x86 2-operand instructions are encoded as follows:
o p c o d e d w m d r e g r - m .....
for example, for ADD, opcode is 000000, MOV, opcode is
100010, etc
(Encoding for forms with accumulator and immediate are
similar).
The w field states if the operation is word or byte. Register
reg=000
means AX if w==1 and AL if w==0. The d field states if the
direction is from
register (0) or to register(1). So decoding should retrieve these
bytes.
currentbyte=code[ip++];
....
wordop=currentbyte & 1;
from_reg=(currentbyte & 2)!=0;
....
68
alternately one can use division and mod (yuck).
3.11.2 Checking for odd/even
Q: how do we check for a number being odd or even?
if (x is odd) { ... }
69
if (x % 2 == 1) { ... }
better:
if (x % 2) { ... }
much better:
better:
if (x & 1) { ... }
Q: how do we check for a number is divisible by 4?
if ((x & 3)==0) { ... }
if (!(x & 3) { ... }
Q: how do we check for a number is divisible by 2k?
if ((x & 2k−1)==0) { ... }
70
3.11.3 Sets
How do we represent a set?
Consider {2,3,5,7,11,13}
Idea #1: Representation as a link list:
2 ⇒ 3 ⇒ 5 ⇒ 7 ⇒ 11 ⇒ 13 ⇒ #
Idea #2: Representation as an array:
2 3 5 7 11 13
Idea #3: Representation as a bit array (set)
0 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0
Total space : 2 bytes.
Search time:
• list Linear
• array Linear or log (binary search)
• set Constant!
Exercise: write the formula
If A,B are similar sets, one can use & to compute the
intersection:
unsigned char A[N];
unsigned char B[N];
unsigned char C[N];
for (int i=0;i<N;i++) C[i]=A[i]&B[i];
Set representation is not always the right one, consider
{1,10000000}!.
71
3.11.4 Images
Consider images.
Actual representation (b/w image): one dimensional bit array.
What would this produce?
&
Conclusion: Bitwise AND is a much better choice for assembler
implen-
tation.
72
3.12 The AND instruction
AND dst,src (dst &= src)
AND r m i
r X X X
m X × X
i × × ×
Syntax is the same as ADD,SUB,...
Examples:
MOV AX,5
AND AX,6 ; AX is 4
AND AX,1 ; check evenness
AND AX,7 ; what does this do?
AND AX,0 ; what does this do?
AND AX,AX ; what does this do?
AND AX,CH ; what does this do?
3.13 The TEST instruction
TEST dst,src (dst & src)
TEST is non-descructive AND, cf. CMP/SUB.
TEST r m i
r X X X
m X × X
i × × ×
73
Syntax is the same as AND, ADD, SUB,...
Encoding is less efficient than for other operations – TEST is
less com-
mon.
TEST is helpful when we want to extract different bits from the
same
number.
3.14 The OR instruction
OR dst,src (dst |= src)
OR r m i
r X X X
m X × X
i × × ×
Syntax is the same as ADD,SUB, AND...
In C, bitwise or is coded as |. Hardware implements the bitwise
form.
Examples:
MOV AX,5
OR AX,6 ; AX is 7
0101 == 5
| 0110 == 6
----
0111 == 7
OR AX,0
(set flags – not as efficient as RR form)
OR AX,AX
74
(set flags – just as efficient as RR form of AND)
OR AX,1
(set last bit to 1; numerically round up to an odd number.)
OR AX,2
(set next to the last last bit to 1.)
OR AX,3
(set last two bits to 1)
For sets (and graphics) OR implements union.
3.15 The XOR instruction
XOR dst,src (dst ˆ= src)
OR r m i
r X X X
m X × X
i × × ×
Syntax is the same as ADD,SUB, OR...
Truth table for XOR:
1 XOR 1 = 0
1 XOR 0 = 1
0 XOR 1 = 1
0 XOR 0 = 0
75
XOR AX,AX
(clear register – just as efficient as RR form of SUB)
XOR AX,0
(only sets flags)
XOR AX,1
(toggle last bit)
XOR AX,7
(toggle last three bits)
XOR AX,0xFFFF
(toggle all bits)
Involution property of XOR:
((d XOR k) XOR k) == (d XOR (k XOR k)) == (d XOR 0) == d
Application #1: temporary lines:
(same idea can be used for the mouse pointer or inverting
selected text).
76
Application #2: cypher:
((data XOR key) XOR key) == data
char data[N];
for (int i=0; i<N; i++)
data[i]=data[i] ^ key;
In pure form, variation of a substitution cipher:
char data[N];
char subst[256]; // contains permutation of ASCII chars
for (int i=0; i<N; i++)
data[i]=subst[data[i]];
Not secure, but good for puzzles! Cryptogram puzzles.
With key chaining:
char data[N];
for (int i=0; i<N; i++) {
char c;
c=data[i];
data[i]=c ^ key;
key=f(key,c);
}
77
https://api.razzlepuzzles.com/cryptogram
nearly unbreakable; cf. the Type 1 font encryption episode.
The unbreakable Adobe encryption was actually implemented
with this
code:
unsigned short int r; // this is the key
unsigned short int c1 = 52845;
unsigned short int c2 = 22719;
unsigned char Encrypt(plain) unsigned char plain;
{unsigned char cipher;
cipher = (plain ^ (r>>8));
r = (cipher + r) * c1 + c2;
return cipher;
}
(Naturally, there are myriads of small changes that can be made
to the
code, beginning with change of the constants... and they would
lead to
equally strong security!)
Question: why is C not providing ^^ ?
3.16 The NOT instruction
NOT dst (dst= d̃st)
NOT r m i
X X ×
Follows the format of INC,DEC, NEG.
Example:
XOR AX,0xFFFFh
NOT AX ; same result as above ?
Warning: NOT may produce drastically different results
depending on
the size of the number!
78
#include <stdlib.h>
#include <stdio.h>
int main() {
unsigned short s; unsigned char c; unsigned int i;
c=1; c=~c; printf("n negated byte %u ",c);
s=1; s=~s; printf("n negated short %u ",s);
i=1; i=~i; printf("n negated int %u ",i);
}
results in
negated byte 254
negated short 65534
negated int 4294967294
3.17 Shift and Rotate operations
This group of instructions allows to shift bits in a register in
different ways.
The format of all instructions in the group is the same; we will
start by
considering the
79
3.17.1 SHL
SHL r/m,amount (SHift Left)
instruction. The first argument of the instruction must be a
register or a
memory location; the second specified the number of positions
the bits of
the first argument will be shifted. There are three ways the shift
amount
can be specified
1. as number 1
2. as register CL (no other register is allowed)
3. as integer number greater than one. We separate this case
from the
first one since the encoding is different and this format did not
exist
in the original 8086 (added with 80186).
Here is a simple example of the SHL execution:
MOV AL,11
SHL AL,1
to see what is the resulting value in target, let’s write AL in
binary:
0 0 0 0 1 0 1 1
Shifting the bits left by one position would result in
0 0 0 1 0 1 1 0
where the leftmost bit (0) is moved out and a zero is written to
the last
position. Numerically the value is 22.
It is not an accident that 22 = 11 × 2; the effect of appending a
zero
to a binary number is multiplication by 2 (just like appending a
zero to a
decimal number is multiplication by 10). Unlike the “generic”
multiplica-
tion described in a later section, this is an efficient and cheap
operation –
moving bits is naturally cheaper than invoking the multiplier.
80
If the first bit of the number were 1, pushing it out of the
register will
result in “an overflow” and an incorrect result. For example,
shifting 128
in an 8-bit register will produce 0. (Notice that the correct value
of 256 is
impossible on an 8-bit register.)
The first bit, incidentally is not lost – it is moved to the Carry
flag;
checking the carry flag therefore offers a way to detect an
overflow and
possibly correct the calculation.
CF= 0 0 0 0 1 0 1 1 0
The other two forms of SHL arguments allow to shift the
register by
more than one position. Consider
MOV AL,11
SHL AL,3
or
MOV AL,11
MOV CL,3
SHL AL,CL
Both snippets will do the same : shift AL by 3 positions to the
left. Three
leftmost bits are pushed out, three zeros are appended at the
end, changing
CF= ? 0 0 0 0 1 0 1 1
to
CF= 0 0 1 0 1 1 0 0 0
It is an efficient way to multiply by 8 = 23,and in general left
shift by n
positions is equivalent to multiplication by 2n.
The bits pushed out cannot be all placed into the carry; two of
them are
irreversibly lost and only the last bit to be pushed out ends up
in the carry
flag. In the previous example the bit ending up in the carry is
shown in
red.
81
Unless an overflow occurs, SHL multiplication often works
correctly even
with negative numbers. For example, -1 (binary 11111111
becomes 11111110,
which is indeed -2).
This instruction does not change either zero or sign flags.
NOTE: shift instructions like SHL existed on many processors
even be-
fore Intel and got incorporated into the C language ( << ). Thus,
one can
write in C either x=y+y; or x=y<<1; instead of slower x=y*2.
Which of the
two options is better? In most cases, about the same. The right
choice also
depends on the particular formula. For instance
y=f(x)*2; // function call
should not be even changed to y=f(x)+f(x); since this would
result in a
double call of a function! – definitely slower and possibly a
bug, if the
function has side effects. Using the shift here is certainly fine.
On the other
hand, some languages do not even offer shift operator making
the addition
is the only alternative to (slower) multiplication
82
3.17.2 SHR
SHR r/m,amount (SHift Right)
instruction has the same syntax as SHL but shifts the bits to the
right. We
again look at an example:
MOV AL,11
SHR AL,1
In binary, 11 is
CF= ? 0 0 0 0 1 0 1 1
shifting this pattern right results in the last bit (1) expelled from
the register
into the carry flag, the remaining bits moving right by one
position and a
zero written into the leftmost position, obtaining
CF= 1 0 0 0 0 0 1 0 1
(decimal 5). It is not an accident that 5 = 11/2, for unsigned (or
nonnega-
tive) numbers shifting right by 1 is equivalent to division by 2,
and shifting
right by n bits is division by 2n. (Analogy: with decimal
numbers deleting
the last digit is the same as division by 10).
when shifting by more than one it is the last bit that shifted-out
from the
register that ends up in the carry.
83
One more example to consider :
MOV AL,255
SHR AL,1
the result of shifting
CF= ? 1 1 1 1 1 1 1 1
is
CF= 1 0 1 1 1 1 1 1 1
in terms of division we see 127 = 255/2, correct. However, in
MOV AL,-1
SHR AL,1
we see the same exact values (255 is -1), but obviously
incorrect result:
127 = −1/2. In fact the results will be always incorrect when
shifting a
negative number: after the shift the sign bit (1) is replaced by 0!
NOTE: in C/C++ bitshift operators << and >> use SHL and SHR
for un-
signed (or signed but positive) numbers. Applying them to
negative num-
bers is supported by syntax but unpredictable, the result may or
may not be
correct, depending on the compiler (it is correct under BCC).
C/C++ syn-
tax allows negative values of shift, but the results are again
unpredictable
(under BCC, the result would be always 0).
Division of negative numbers via a shift is, however, possible in
assem-
bler with the
84
3.17.3 SAR
SAR r/m,amount (Shift Arithmetic Right)
instruction. SAR is nearly identical to SHR; the only difference
is that SHR
inserts zeros at the left, SAR duplicates the first bit. Thus
number 255
(11111111)
CF= ? 1 1 1 1 1 1 1 1
would become
CF= 1 0 1 1 1 1 1 1 1
after a SAR (as seen above), but after SHR it is
CF= 1 1 1 1 1 1 1 1 1
where the first 1 is copied to the right but also left where it was.
SAR produces correct results for division of signed/negative
numbers: the
calculation in the example above is division of -1 by 2,
resulting in -1 –
correct result assuming rounding to −∞.
NOTE: Java implements both SAR and SHR as >> and >>>.
85
3.17.4 SAL
For symmetry, Intel also has
SAL r/m,amount (Shift Arithmetic Left)
while SAL is named and even encoded differently, it has the
same function-
ality as SHL.
86
3.17.5 ROL
ROL r/m,amount (ROtate Left)
rotates bits leftward, the high bit(s) pushed out appear on low
end.
MOV AL,79h ; AL is 01111001
ROL AL,1 ; AL is 11110010
ROL AL,1 ; AL is 11100101
MOV CL,2
ROL AL,CL ; AL is 10010111
ROL never loses any bits and does not use the Carry flag.
Notice that
ROL AL,8
does not do anything, yet another equivalent of NOP. Likewise
ROL AX,16
does not change any registers.
Exercise: What does ROL AX,8 do?
87
3.17.6 ROR
ROR r/m,amount (ROtate Right)
is similar, except that the bits are rotated right. Notice that the
two instruc-
tions below are equivalent.
ROL AL,3
ROR AL,5
ROL and ROR are rarely used, albeit this example should be of
interest:
ROL word ptr [100],8
88
3.17.7 RCL
RCL r/m,amount (Rotate Carry Left)
rotates the bits in register and the carry flag left. High bits
being rotated
out are moved to the carry while the carry flag is entered on the
low end
of the register.
MOV AL,79h ; AL is 01111001 CY=c (undefined)
RCL AL,1 ; AL is 1111001c CY=0
RCL AL,1 ; AL is 111001c0 CY=1
RCL AL,1 ; AL is 11001c01 CY=1
rotation by more than one can be seen as repeated rotations by
1, except
done as a single instruction.
89
3.17.8 RCR
RCR r/m,amount (Rotate Carry Right)
is similar to RCL except for the direction of the rotation.
!!! The code below uses some assembler syntax that has not
been cov-
ered yet !!!
While RCL and RCR may appear strange but they have a
number of us-
ages. One of them is the ability to shift (that is multiply or
divide by a
power of two) numbers that are larger than the register size.
Assume that x
is a 100-byte (800-bit) number stored in memory locations [500]
through
[599]. We would like to multiply it by 2. Remember that the
bytes are
stored in reverse order.
SHL byte ptr DS:[500],1 ;high bit is moved to the carry
RCL byte ptr DS:[501],1 ;carry to 2nd byte, high bit to carry.
RCL byte ptr DS:[502],1 ;carry to 3rd byte, high bit to carry.
...
RCL byte ptr DS:[599],1
or, as a loop:
CLC
MOV BX,500
MOV CX,100
L: RCL byte ptr DS:[BX],1
INC BX
LOOP L
90
Exercise: How would you divide this 800-bit number by 2?
Exercise: Would signed and unsigned numbers use the same
code?
to multiply by 2k, you need to shift by k positions. This must be
done as
1-position shifts repeated k times since the carry can hold only
one bit,
therefore a double loop is required.
another application for these instructions is shifting a
monochrome image
by one or several pixels. The code is similar to the one used for
shifting
long numbers above.
first two forms of all eight shift/rotate instructions are encoded
to-
gether, using bytecodes D0 through D3. The “register” field in
the 2nd byte
of the encoding actually denotes the operation. The third form
uses C0 and
C1, in a similar fashion.
80386 and all newer processors have double-register-shift
operations
SHLD and SHRD that allows to write more efficient code; we
do not describe
them here.
91
3.18 Data conversion
The arithmetic instructions studied in the previous sections are
sufficient to
implement many formulas, but not all. One particular problem is
mixing of
variables of different sizes, as seen in the following C-language
example
int x;
short y;
char z;
x = y+z;
If all three variables were declared the same way (for instanc e,
all three
int’s), we could have implemented the computation with already
known
instructions:
MOV EAX,y
ADD EAX,z
MOV x,EAX
but the size must match rule prevents us from mixing operands
of
different size. Thus, we are in need for data conversion
instructions – ones
that change the size of a variable without changing the value.
Before showing the available instructions, let’s understand the
exact
problem.
In the example above we would like to add byte (char) quantity
z with
word (short) quanity y. To do this, we need to convert z to also
a word
quantity that has the value equal to the original value of z, this
word quan-
tity can be added with y. (We would later need to convert the
sum – a word
quantity – to a doubleword, to be able to store it into x).
We actually have a way to do this conversion in some cases. For
exam-
ple, if we assume z to be equal to 1, the following assembler
code would
work:
MOV AL,z
SUB AH,AH
Bingo: by zeroing-up the high bits in the AX register, we
extended the
value into a word, and can add it with the value of y:
92
MOV AL,z
SUB AH,AH
ADD AX,y
Assuming y is equal to 3, the result would be 4, stored in AX.
To save it
into x we can append two more zero bytes:
MOV AL,z
SUB AH,AH
ADD AX,y
MOV word ptr x,AX
MOV word ptr x[2],0
ZERO EXTENSION, used in the example, is in fact a correct
solution, just
not for the our formula; with diffently chosen numbers it will
fail. Consider
int x;
short y=1;
char z=-1;
x = y+z;
ZERO extending z as we did above will result in the value 255
in the
AX; x therefore will be computed as 256.
Why is this happening? ZERO EXTENSION always produces
non-negative
numbers, even if the original value had the sign (high) bit set,
the extended
value will always have zero in the high position. Thus, it is
suitable for un-
signed data types, the example
unsigned int x;
unsigned short y;
unsigned char z;
x = y+z;
will work fine with any values of x and y. For signed data types
we need
a different type of extension. For simplicity, let’s look only at
conversion
AL ⇒ AX. The value in AL is considered as signed. If it
happens to be
nonnegative, we can ZERO-EXTEND, as we did above
SUB AH,AH
93
if it happens to negative, then we should instead fill AH with
1s, not 0’s:
MOV AH,0FFh
In general, the extension byte(or bytes) will be filled with the
sign of the
number, the procedure itself is therefore called SIGN
EXTENSION.
One practical difference between the two types of extensions:
while
the unsigned (ZERO) extension can be implemented efficiently
with the
instructions we already have, an implementation of the signed
extension
would require coding the logic like
if (AL>=0) AH=0; else AH=255;
can this pseudocode be written in assembly? Yes, of course. Can
it be
written efficiently? No.
Exercise: Implement the pseudocode above in assembler.
Therefore, the instruction set provides sign conversion
instructions which
we will now introduce
CBW (Convert Byte to Word)
CBW, just like other instructions in this group, does not have
arguments,
they all work on the accumulor. CBW specifically converts AL
to AX, using
sign extension.
CWD (Convert Word to Doubleword)
CWD converts 2-byte value stored in AX into 4-byte value
stored on DX and
AX, DX has the low bits.
Why not convert to a single doubleword register? Two reasons:
firstly,
CWD was added before the processor had 32-bit registers, but
see CWDE below.
94
Secondly, in some cases this is actually more convenient; see
the examples
related to division below.
Our example program, written in assembler, would use both of
these
instructions:
MOV AL,z
CBW
ADD AX,y
CWD
MOV word ptr x,AX
MOV word ptr x[2],DX
The following two instructions did not exist in the original 8086
and
were added only in 80386, when 32-bit registers were first
introduced:
CWDE (Convert Word to Doubleword Extended)
sign-extends AX into EAX.
CDQ (Convert Doubleword to Quadword)
sign-extends EAX into EDX:EAX (to store a quadword on a
single register
one would need a 64-register!)
The four forms given above are the most efficient to use and the
code
should be written in such a way as to have the data to be sign-
converted
on the accumulator; unsigned data conversion, on the other
hand, can be
done on other registers:
MOV BL,z
SUB BH,BH ; now BX has zero-extended value of z.
In addition to these instructions, Intel – beginning with 80386 –
offers
more general forms
95
MOVSX target,source (MOV with Sign eXtension)
MOVZX target,source (MOV with Zero eXtension)
where target and source are either memory or registers, at least
one of
the operands must be a register ( memory-to-memory operations
are not
allowed ), and – unlike the usual “size must match” rule, here
the target
should be larger than the source.
For example:
MOVSX AX,AL
MOVZX EBX,DL
MOVSX word ptr DS:[10],BL
MOVZX EAX,byte ptr DS:[BX]
Notice that the first instruction in the example is equivalent to
CBW – in
terms of what it does! – but its encoding is 4 bytes vs only 1
and execution
is slower. Thus, the accumulator forms are still the most
efficient.
Encoding: CBW is encoded as a single byte 0x98, CWD as a
single byte
0x99.
Data conversion instructions are not considered to be
arithmetic, there-
fore no flags are altered.
WARNING: X86 emulator does not support 32-bit instructions,
so CWDE,
CDQ, MOVSX, MOVZX would be rejected by it. Maybe one
day :P
Now, what about conversion from a larger data size to a smaller
one?
Such situations surely happen, for example:
short a, int b;
a=b;
96
Well, in general this cannot be done: the data range of int is
larger
than that of short, so not every int value can be correctly
represented as
a short. Most compilers would issue a warning and proceed by
saving only
the low word:
MOV EAX,b
MOV a,AX
half of the EAX register is not saved at all! This may or may
not work
correctly, depending on the value of b, and assembler cannot
solve what is
impossible to solve. Assembler, however, offers a simple test
that checks if
the operation will work correctly:
MOV AX,word ptr b % lo half of b
MOV BX,word ptr b+2 % hi half of b
CWD
CMP DX,BX
JNZ error
MOV a,AX
In other words, if the true value of b can be recovered from the
lo half,
the number fits in the short range. For unsigned types, the check
is even
simpler
CMP word ptr b+2,0 % hi half of b
JNZ error
MOV AX,word ptr b % lo half of b
MOV a,AX
3.19 Multiplication and Division
These two operations were left to the end, because of the
differences from
other operations.
One of the differences is the higher cost: multiplication is
slower (processer-
dependent, but 3 times slower is generally a correct estimate)
from other
operations; division is yet slower. (compare this with
performing the op-
eration on paper: addition and subtraction are easy,
multiplication is more
97
difficult, and if forgot how unpleasant division is, review your
elementary
school notes!).
Because of this one should always try to use these operations
sparingly,
or – if possible – not use them at all. For example,
multiplication by a
power of 2 should never be done as multiplication, same goes
for division
by a power of 2. This is far from the only possible savings.
Another problem – specific to multiplication – is the growth of
the
length of the result. With binary operations studied so far the
lenght of
the result is the same as the length of the arguments (AND, OR,
. . . ) or
longer by only one bit which can be saved in carry (ADD, SUB,
. . . ). Multi-
plication may double the lentgh – this requires a different type
of syntax,
and different understanding of the “size must match” rule.
(With division,
we will see yet another form of syntax and a direct violation of
the “size
must match” rule.)
Further, not every register is capable of doing these operations.
The
most common/usable forms require the accumulator (or work
best on the
accumulator).
Finally, multiplication and division provide different operation
for signed
and unsigned arithmetic.
With this in mind, let’s look at a sample instruction:
IMUL r/m (Integer (signed) MULtiplication)
We notice that while multiplication operation requires two
arguments,
the format specifies only one. The register or memory operand
is multiplied
by the accumulator of the same size as the operand. For
example:
IMUL CL ; multiply CL by AL (8 bit)
IMUL word ptr DS:[BX]; multiply memory word by AX (16 bit)
IMUL ESI ; multiply ESI by EAX (32 bit)
the result is saved in the extended accumulator, it is AX for 8
bit operands,
DX:AX for 16 bit arguments and EDX:EAX for 32 bit
arguments. In all
cases the length of the extended accumulator is twice the length
of the
arguments. This assures that the result is computed correctly,
but does not
98
suggest what to do with the result if it is too large to work with
(64 bit in a
32-bit program, or 128 bit after a 64-bit multiplication that also
exists.)
Example:
MOV AL,7
MOV CL,13
IMUL CL ; results in AX being 7*13=91
MOV AL,255
MOV CL,255
IMUL CL ; results in AX being 1.
in the second multiplication the arguments are interpreted as
signed num-
bers, so we are actually squaring -1!.
(Intel opcode names are not always consistent. IMUL uses the
word Integer
to mean Signed! Unsigned multiplication that we look at next is
also an
integer multiplication, all the x86 registers are integer!
The unsigned counterpart is
MUL r/m ((unsigned) MULtiplication)
and follows the same format as IMUL. The results, however,
may be differ-
ent:
MOV AL,7
MOV CL,13
MUL CL ; results in AX being 7*13=91
MOV AL,255
MOV CL,255
MUL CL ; results in AX being 0xFE01 = 65025
We now turn our attention to division. In addition to the already
men-
tioned, it has one extra feature: division is not always possible.
But let us
begin with the format:
99
IDIV r/m (Integer (signed) DIVision)
Once again, some of the arguments are implicit, the explicit
argument
is the Divisor in the operation. The quotent will appear in the
accumulator,
of the same size as the divisor; the divident is taken from the
extended
accumulator. For example:
MOV AX,100
MOV BL,7
IDIV BL ; result (14) in AL
...
IDIV word ptr ds:[10]
; DX:AX is divided by memory word, result in AX
...
IDIV ESI ; EDX:EAX is divided by ESI, result in EAX
Division operations compute the remainder, which is stored in
the 2nd
half of the extended accumular (AH, DX, EDX ), this makes
separate MOD
operation unnecessary.
DIV r/m ( (unsigned) DIVision)
it the unsigned counterpart of IDIV
Exercise: find inputs for which IDIV and DIV produce different
results.
In some cases division cannot be performed and results in an
exception
(often meaning a program crash or even an OS crash:
SUB CX,CX
IDIV CX
100
division by zero is the primary example of it, the above example
will fail
regardless of the value of the divident.
In case of a division overflow (division by zero is one example
of it,
but not the only one), the hardware will call INT 0 (see
explanation of
interrupts and handlers which is TBW). What happens next fully
depends
on the installed interrupt handler; the outcome may be an error
message
and program termination, crash of the operating system, or a
normal exit
from the program with a error message like your program has
performed an
illegal operation.
Division overflow may occur with a non-zero divisor too:
MOV AX,1000
MOV CL,2
IDIV CL
While we are dividing by 2, this division cannot be done:
1000/2 is 500,
too large for a byte register (AL). The outcome is the same as
with division
by zero, and in fact some exception handlers may report this as
a division
by zero – it is not.
In the next example, division result (same 500) is to be placed
into 2
byte accumulator.
MOV AX,1000
MOV CX,2
IDIV CX
this appears possible, but an overflow still may occur.
Exercise: Understand why
In general to avoid overflows one should use longer registers, a
byte
division is very overflow prone.
IMUL, unlike the other instructions in this group, has additional
formats,
including multiplication by an immediate.
TODO : discuss MULDIV
TODO : discuss DIVMOD
101
3.20 LEA : Load effective address
LEA s,t (Load Effective Address)
s=&t;
LEA is not an arithmeric operations sensu stricta, but it is
closely related
to address computation used in arithmetic instruction.
We begin with a closer look at what arithmetic instructions
actually do,
for example
ADD AX,[BX+SI+10]
Two parts of the computation are:
1. compute the memory address : BX+SI+10
2. retrieve the data from the computed address and add it to the
AX
register.
We notice that our sample instruction actually does three
additions, not
one!
The first part of the computation is exactly what LEA does, and
using
LEA we can break the addition into two steps:
LEA DI,[BX+SI+10]
ADD AX,[DI]
This is an equivalent, and obviously slower computation – we
bring it
for illustration purposes only, not a suggestion for coding.
LEA is more restrictive than other 2-operand instruction in
terms of the
allowed operands: the second operand must be a memory
(otherwise the
concept of an address does not apply; the first argument is
therefore always
a register.
LEA, not being an arithmetic instructions, does not affect flags.
The size must match rule does not apply to LEA – it actually
does not
move data. LEA allows the arguments to be of different sizes,
albeit such
forms of the instruction are rarely useful. The following are all
valid exam-
ples with different sizes of the arguments (16 and 32):
102
LEA DI,[BX+SI+3] ; 16 and 16
LEA EDI,[BX+SI+3] ; 32 and 16, 0-extension used
LEA DI,[EAX+EBX+3] ; 16 and 32, truncation
LEA EDI,[EAX+EBX+3] ; 32 and 32
With static addresses, LEA does not accomplish anything more
than MOV
can do :
msg DB ’Hello, World!’
LEA SI,msg
MOV SI,offset msg
(but notice the difference in the syntax). In this case, MOV is
slightly
more efficient (3-byte encoding rather than 4) and some
assemblers (TASM!)
will actually compile LEA as MOV!
With addresses that include registers, LEA can be emulated as
two or
three instructions:
nums DD 1000 dup(?)
......
LEA AX,nums[BX]
......
MOV AX,offset nums
ADD AX,BX
One important but rarely known detail is that the ability of LEA
to per-
form “free” additions translates to its ability to do free
multiplications, lead-
ing to a new fast way to multiply by small numbers.
Recall1 that in 32-bit mode the addresses are in the form
[base*scale+index+offset]
with base being any 32-bit register but ESP, index being any
32-bit register,
scale is 1,2,4 or 8.
Now, consider the following example:
1actually this section has not been written!—maybe it will be
yet
103
LEA EAX,[EAX+EAX*2]
This is a valid 32-bit addressing instruction, we can use the
same regis-
ter as the “index” and the “base”. The effect of the instruction
is multipli-
cation by 3, not surprisingly faster than IMUL. Using
LEA EAX,[EAX+EAX*2]
LEA EAX,[EAX+EAX*4]
to multiply by 15 is also faster than IMUL, despite being two
instructions.
LEA EAX,[EAX+EAX*2]
SHL EAX,4
is a faster way to multiply by 12, and so on. For many small
numbers there
are similar tricks.
3.21 Long integers: ADC and SBB
Integer arithmetic can be performed on numbers that consists of
more bits
than the register size. This was done already on 8-bit processors
in the Dark
Ages of CP/M; exactly the same approach works on modern 32-
and 64-
processors to work with numbers that are yet longer. While the
technique
is the same, the usability of it is less now since for most
computations even
32-bit integers are sufficient.
We will first work out the technique mathematically using 8-bit
registers
only, while trying to add 16-bit numbers. (The techique is fully
scalable, the
choice of small register sie is to make the examples easier to
understand.)
Consider this calculation:
short x,y,z;
z=x+y;
Using 16-bit registers we can write it in assembler as
; program 1
MOV AX,x
ADD AX,y
MOV z,AX
104
We now assume that the 16-bit registers are not available and
try to perform
it using 8-bit registers only. How about this code?
; program 2
MOV AL,byte ptr x
ADD AL,byte ptr y
MOV byte ptr z,AL
MOV AH,byte ptr x[1]
ADD AH,byte ptr y[1]
MOV byte ptr z[1],AH
for some values it will indeed work. For example, for x=1234h
and y=5678h,
the result would be 68ACh, same in both programs:
1234h 12h 34h
+ 5678h + 56h 78h
----- -------
68ACh 68h ACh
For others, the result would different. If x=80h (128 decimal)
and y=80h,
the 16-bit program will produce 100h (256 decimal) correctly,
whereas the
8-bit program above will produce 0: adding 80h with itself on
an 8-bit
register gives 0!.
0080h 00h 80h
+ 0080h + 00h 80h
----- -------
0100h 00h 00h
The source of the problem is the carry produced by the first 8-
bit addi-
tion: we are not using it. 80h+80h indeed results in 8 zero bits,
but also
a carry, indicating an extra one to be added to the 9th bit – but
we are
ignoring it.
It is actually possible to correct the above program without
introducing
new instructions:
; program 3
MOV AL,byte ptr x
ADD AL,byte ptr y
105
MOV byte ptr z,AL
MOV AH,byte ptr x[1]
JNC skip
INC AH
skip:
ADD AH,byte ptr y[1]
MOV byte ptr z[1],AH
(the INC instruction adds the previously ignored carry), but the
Intel in-
struction set has a shortcut that would result in a faster code:
; program 4
MOV AL,byte ptr x
ADD AL,byte ptr y
MOV byte ptr z,AL
MOV AH,byte ptr x[1]
ADC AH,byte ptr y[1]
MOV byte ptr z[1],AH
The new ADC instruction adds the 2nd argument and the carry
to the
1st argument – this is exactly the computation needed.
ADC t,s; t=t+s+carry (ADd with Carry)
Except for the addition of the carry, ADC follows exactly the
same rules as
ADD, including the syntax and setting of the flags.
We now scale the example: how about adding 128-bit numbers?
This
cannot be done with a single addition on any of the current
processors –
you would need 128-bit registers for that!
In this example we will assume that x occupies 16 bytes (128 =
16 ×
8) of memory beginning with address 100, y and z (same size
both) are
located at 200 and 300. We further assume that the bytes in out
128-bit
integers are written in the Intel reversed order. The objective is,
as before,
to compute z=x+y.
How about this?
106
; program 5
MOV EAX,dword ptr x
ADD EAX,dword ptr y
MOV dword ptr z,EAX
MOV EAX,dword ptr x[4]
ADC EAX,dword ptr y[4]
MOV dword ptr z[4],EAX
MOV EAX,dword ptr x[8]
ADC EAX,dword ptr y[8]
MOV dword ptr z[8],EAX
MOV EAX,dword ptr x[12]
ADC EAX,dword ptr y[12]
MOV dword ptr z[12],EAX
notice that this computation could have been done using only
two 64-bit
additions, or using 8 16-bit additions, or using 16 8-bit
additions. The code
would look similar in all cases; naturally the code using longer
registers
will have fewer instructions and would run faster.
The code shown in the previous program can be extended to
handle
interegs of any size. It may be preferrable, however, to write
this code as
a loop. We notice that all the editions are done using the ADC
instruction,
except for the very first one, done by ADD – this is because
there is no carry
on the very first addition. We can use ADC for all additions if
we are certain
that the initial value of the carry flag is 0. The loop form of the
previous
program thus becomes:
; program final??
CLC ; clear carry
MOV CX,4
SUB BX,BX
L: MOV EAX,dword ptr x[BX]
ADC EAX,dword ptr y[BX]
MOV dword ptr z[BX],EAX
ADD BX,4
LOOP L
NOTE: the program above should be seen just an idea; to make
it work
one needs to ensure that the carry flag is not corrupted by other
instruc-
tions! – this is left to the reader.
The new instruction
107
CLC (CLear Carry)
sets carry to 0, thus allowing us to use ADC for the first loop
iteration.
We may as well introduce two other instructions that also alter
the carry
flag.
STC (SeT Carry)
set carry to 1
CMC (CompleMent Carr)
toggle the carry bit
Yet longer numbers can be handled with the code above by
simply in-
creasing the number of times the loop is executed. The
execution time will
be proportional to O(n/r) where n is the bitlength of the
numbers, and r
is the bitlength of the register.
A similar problem for subtraction will not be examined in
details. We shall
only say that the analog of ADC for subtraction is the SBB
instruction:
SBB t,s; t=t-s-carry (SuBtract with Borrow)
and all of the above examples can be adapted to subtraction by
changing
all ADD’s to SUB’s and all ADC’s to SBB’s.
108
3.21.1 24-bit case
As a special example, consider 24-bit (3 byte) numbers.
Exercise: Why would one even want to look at such?
Addition of such numbers can be done as three byte size
additions or as
one byte size and one word size.
; program 6
x db 3 dup(?)
y db 3 dup(?)
....
MOV AL,byte ptr x[0]
ADD AL,byte ptr y[0]
MOV byte ptr z[0],AL
MOV AL,byte ptr x[1]
ADC AL,byte ptr y[1]
MOV byte ptr z[1],AL
MOV AL,byte ptr x[2]
ADC AL,byte ptr y[2]
MOV byte ptr z[2],AL
Notice that even in column addition of decimal numbers, chunks
do not
need to be of the same size:
923
+ 278
---
we can add 23+78, obtaining 01 and carry of 1!
In programs below we break the numbers into byte+word and
word+byte
chunks.
109
; program 6a
x db 3 dup(?)
y db 3 dup(?)
....
MOV AL,byte ptr x[0]
ADD AL,byte ptr y[0]
MOV byte ptr z[0],AL
MOV AX,word ptr x[1]
ADC AX,word ptr y[1]
MOV word ptr z[1],AL
; program 6b
x db 3 dup(?)
y db 3 dup(?)
....
MOV AX,word ptr x[0]
ADD AX,word ptr y[0]
MOV word ptr z[0],AX
MOV AL,byte ptr x[2]
ADC AL,byte ptr y[2]
MOV byte ptr z[2],AL
Exercise: Are 6a and 6b equivalent?
3.21.2 other operations
• Bitwise AND, OR, XOR, NOT simply repeat the same
operation on
each chunk (loops for long data).
• Shifts propagate carry.
• NEG is left as a (nice!) exercise.
• (I)MUL is more complicated
• (I)DIV is even more complicated
110
The idea of long MUL:
Assume x,y,z is twice the size of the register that is capable of
multipli-
cation (N-bit long).
Then x=xh ×R + xl, y=yh ×R + yl, where xl,yl are the low
halves of
the values, xh and yh are the high halves, R is 2N.
x∗ y = (xh∗ R+xl)∗ (yh∗ R+yl) = xhyh∗ R2 + (xhyl
+xlyh)∗ R+xlyl)
The formula contains 4 multiplications (multiplications by R are
simply
data movements)
Assume now that x and y are 64 bit, z is 128 bit.
x dd ?,?
y dd ?,?
z dd ?,?,?,?
SUB EAX,EAX
MOV z[8],EAX
MOV z[12],EAX
MOV EAX,x[0]
MUL y[0]
MOV z[0],EAX
MOV z[4],EDX
MOV EAX,x[0]
MUL y[4]
ADD z[4],EAX
ADC z[8],EDX
; ADC z[12],0
MOV EAX,x[4]
MUL y[0]
ADD z[4],EAX
ADC z[8],EDX
ADC z[12],0
111
MOV EAX,x[4]
MUL EDX
ADD z[8],EAX
ADC z[12],EDX
This seems to required 4 multiplications (and generally m*N
numbers
would require m2 multiplications); in reality only 3 are needed.
(Toom’s
algorith – google for more)
Idea:
x∗ y = (xh∗ R+xl)∗ (yh∗ R+yl) = xhyh∗ R2 + (xhyl
+xlyh)∗ R+xlyl)
requires 4 multiplications, but
x∗ y = (xh∗ R+xl)∗ (yh∗ R+yl) =
xhyh∗ R2+[(xh+xl)∗ (yh+yl)−xhyh−xlyl]∗ R+xlyl)
requires only 3:
• xh ∗ yh
• xl ∗ yl
• (xh + xl) ∗ (yh + yl)
Division algorithms
112
https://en.wikipedia.org/wiki/Division_algorithm
Chapter 4
Jumps
4.1 Unconditional Jumps
JMP label (unconditionally JuMP (goto) to label)
where label is any identifier, not a keyword, unique. Some
scope rules
apply.
JMP lab
lab:
JMP lab
(Jumps can go both forward and backward, multiple JMP
instructions
may target the same label).
l: JMP l
(fine, but infinite loop)
113
NOTE: label is a keyword, so this id cannot be used. In fact, the
colon
: is an abbreviation for label near.
Other forms of JMP syntax exist, including
JMP r/m (jump(goto) to address stored in r/m)
Only a trivial example for now:
MOV AX,offset lab
JMP AX
....
lab:
(offset is a keyword, cf. the “take address” & operator in C).
114
4.2 Conditional Jumps
J<cc> label (Jump based on cond code)
General idea: conditional jumps implement an equivalent of
if (<condition>) goto lab;
where <condition> is usually expressed in flags.
NOTE: of course, this is exactly the programming style not
recommended
for high-level languages!
115
4.2.1 Using Zero Flag
if (ZeroFlag) goto lab;
corresponds to an assembler instructio n, JZ.
JZ label (Jump if Zero flag)
Example:
; int x;
;
; if (x!=0)
; { <BODY> }
;
CMP x,0
JZ skip
; <BODY>
skip:
if x is already on a register, CMP may not be needed.
; int x,y,z;
;
; if (x+y+z!=0)
; { <BODY> }
;
MOV EAX,x
ADD EAX,y
ADD EAX,z
JZ skip
; <BODY>
skip:
116
JE label (Jump if equal)
is a synomym of JZ
; int x,y;
;
; if (x!=y)
; { <BODY> }
;
MOV EAX,x
SUB EAX,y
JE skip
; <BODY>
skip:
JNZ label (Jump if not Zero flag)
JNE label (Jump if not Equal)
; int x;
;
; if (x==0)
; { <BODY> }
;
MOV EAX,x
CMP EAX,0
JNZ skip
; <BODY>
skip:
117
; int x,y;
;
; if (x==y)
; { <BODY> }
;
MOV EAX,x
SUB EAX,y
JNE skip
; <BODY>
skip:
NOTE: The four (actually, two) instructions JZ, JE, JNZ, JNE
are based
on the Zero Flag only.
NOTE: The N “prefix” in the instruction name is used with
other instruc-
tions.
NOTE: Conditions in conditional jumps are always reversed: we
specify
skipping, not doing!
118
4.2.2 Using Sign Flag
JS label (Jump if Sign)
JNS label (Jump if Not Sign)
; int x,y;
;
; if (x<0)
; { <BODY> }
;
CMP x,0
JNS skip
; <BODY>
skip:
119
JL label (Jump if Less)
JLE label (Jump if Less or Equal)
JG label (Jump if Greater)
JGE label (Jump if Greater or Equal)
JNL label (Jump if Not Less)
JNLE label (Jump if Not Less or Equal)
JNG label (Jump if Not Greater)
JNGE label (Jump if Not Greater or Equal)
(There are only 4 different instructions above!)
120
; int x,y;
;
; if (x<=y)
; { <BODY> }
;
MOV EAX,x
SUB EAX,y
JNLE skip ; reverse Less-or-Equal cond
; <BODY>
skip:
121
WARNING: the explanation below is simplified, actually JS and
JL are
different, and the Overflow flag plays a role. For nearly all
practical code
the results are the same. For more details (and true picture)
check this.
Why this extra mess? Because CMP may overflow!
How does this work?
Primary instructions: JL (==JNGE), JLE (==JNG), JNL
(==JGE), JNLE
(==JG).
We assume JL ≈ JS. Then
JL jumps if sign flag is set
JLE jumps if sign or zero flag is set
JNL jumps if sign flag is not set
JNLE jumps if neither sign or zero flag is set
122
https://stackoverflow.com/questions/25031860/difference-
between-js-and-jl-x86-instructions/25055804
4.2.3 A few examples
Example: implement ABS function
abs(x) =
{−x if x < 0
x if x ≥ 0
Buggy!
; EAX=abs(EAX)
OR EAX,EAX
JNZ done
NEG EAX
done:
Fix the error above! The correct code is
; EAX=abs(EAX)
OR EAX,EAX
JNS done
NEG EAX
done:
Example: implement SIGN function
(Sign definition:
sign(x) =
{−1 if x < 0
0 if x = 0
1 if x > 0
Actually
∀ x,x = abs(x) × sign(x)
; EAX=sign(EAX)
OR EAX,EAX
JZ done
123
MOV EAX,1
JG done
NEG EAX
done:
Example:
What does this do?
MOV CX,10
SUB AX,AX
l: ADD AX,CX
DEC CX
JNZ l
124
These instructions are not suitable for unsigned arithmetic!
Example
MOV AL,-1
CMP AL,1
JL lab
...
lab:
This works correctly, JMP is taken
MOV AL,255
CMP AL,1
JL lab
...
lab:
Counter-intuitively, the jump is taken, this is because 255 = −1
and is
interpreted as such!
Correct way to program this is
MOV AL,255
CMP AL,1
JB lab
...
lab:
4.2.4 Using Carry Flag
JC label (Jump if Carry)
JNC label (Jump if Not Carry)
125
MOV EAX,x
ADD EAX,y
JC error ; overflow !
...
error:
126
JB label (Jump if Below)
JBE label (Jump if Below or Equal)
JA label (Jump if Above)
JAE label (Jump if Above or Equal)
JNB label (Jump if Not Below)
JNBE label (Jump if Not Below or Equal)
JNA label (Jump if Not Above)
JNAE label (Jump if Not Above or Equal)
Only 4 different instructions here, further, JB==JC, JNB==JNC.
127
Exercise: Work out how the flags are used
4.2.5 Mixing signed and unsigned
Consider two examples:
//example S
int x,y;
if (x<y) ....
//example U
unsigned int x,y;
if (x<y) ....
In Example S, compiler will generate a CMP instruction,
followed by an
inverted signed jump, specifically here JNL.
In Example U, compiler will generate a CMP instruction,
followed by an
inverted unsigned jump, specifically here JNB.
This makes it easy for the user — he does not need to think
which
instruction to use. But there is a problem:
//example M
int x; unsigned y;
if (x<y) ....
Using either signed or unsigned instruction will work
incorrectly in
some cases! Compilers would typically issue a warning (f.e.
“signed un-
signed mismatch”, often ignored by the programmer.
How to do this correctly?
Let’s consider a smaller case first.
128
//example MB
unsigned char x, /*signed*/ char y;
if (x<y) ....
8-bit comparison will not work. But conversion into a range that
in-
cludes both char and unsigned char will:
//example MB1
unsigned char x, /*signed*/ char y;
short sx,sy;
sx=x; sy=y;
if (sx<sy) ....
or, simply:
//example MB1
unsigned char x, /*signed*/ char y;
if ((short)x<(short)y) ....
Since short is a signed datatype, generated instruction will be
signed.
When comparing signed and unsigned shorts, we can convert
both to
int (in fact, we could have used int above too).
But what do we do when comparing signed and unsigned int’s?
In assembler, the above is of course also doable, but so is the
32-bit
comparison!
(The code cannot be shown at this time : we bypassed
conversion in-
structions).
In C a proper comparison function can be written too.
129
4.3 Jumps encoding
opc del
where opc is the opcode (0x74 for JE, 0x78 for JS,...) and del is
a signed
byte.
IP is first incremented to point to the next instruction, then, if
the jump
is taken, IP=IP+del;
The short form of JMP is encoded similarly, using the opcode
byte 0xEB
0xEB del
The near form of JMP uses opcode 0xE9 and 2-byte (16-bit
segs) or 4-
byte (32-bit segs) delta.
16 bit seg : 0xEB d1 d2
32 bit seg : 0xE9 d1 d2 d3 d4
If conditional jump cannot reach its target, one can overjump:
JZ lab
; more than 128 bytes
lab:
may not compile, but
JNZ temp
JMP lab
temp:
; more than 128 bytes
lab:
130
will.
80386+ has alternative longer forms of conditional jumps
instructions,
with 16-bit delta, they begin with bytes 0Fh 8?h ..
4.4 LOOPS
Revisiting the previously seen example:
MOV CX,10
SUB AX,AX
l: ADD AX,CX
DEC CX
JNZ l
We notice that adding numbers 10 down to 1 is more efficient
than
doing 1 to 10.
131
MOV CX,1
SUB AX,AX
l: ADD AX,CX
INC CX
CMP CX,11 ; EXTRA
JNZ l
Generally, running loops toward 0 is more efficient!.
We next notice that we actually implemented a do while loop.
; cx=1; ax=0; do { ax=ax+cx; cx++; } while (cx<=10);
What about a while ?
; cx=1; ax=0; while (cx<=10) {ax=ax+cx; cx++; }
MOV CX,1
SUB AX,AX
l: CMP CX,10 ; EXTRA
JA done
ADD AX,CX
INC CX
JMP l ; EXTRA
done:
Each execution of while in n iteration loop requires 2n jumps vs
n for
do while.
Generally, do-while is more efficient than while!.
We will not consider for – generally equivalent to while.
Returning to
MOV CX,10
SUB AX,AX
l: ADD AX,CX
DEC CX
JNZ l
This is a very common form of a loop, and Intel allows to
optimize this
further.
132
LOOP label (CX–; if (CX>0) JMP label)
Thus:
MOV CX,10
SUB AX,AX
l: ADD AX,CX
LOOP l
for one instruction fewer!
We can generalize this into
MOV CX,n
SUB AX,AX
l: ADD AX,CX
LOOP l
But what will happen if supplied n is 0 ?
One solution:
MOV CX,n
SUB AX,AX
OR CX,CX
JZ done
l: ADD AX,CX
LOOP l
done:
Again, Intel provides a shortcut,
133
JCXZ label (if (CX==0) JMP label)
Thus:
MOV CX,n
SUB AX,AX
JCXZ done
l: ADD AX,CX
LOOP l
done:
LOOP and JCXZ are encoded similarly to conditional jumps.
LOOP’s op-
code is 0E2h, JCXZ’s is 0xE3.
Mentioning only: there are also less commonly needed LOOPZ
and LOOPNZ;
All four instructions can work off ECX.
Double LOOPs should save CX!
MOV CX,n
L1:
....
**SAVE** CX
MOV CX,m
L2:
....
LOOP L2
**RESTORE** CX
LOOP L1
SAVE/RESTORE can be done using another reg (for example,
XCHG CX,DX),
memory, or stack (usually the best).
134
One more “real” example:
MOV CX,100
MOV AX,1
MOV BX,1
L: ; print AX
ADD BX,AX
XCHG BX,AX
LOOP L
4.5 Control statements templates
; goto lab;
;
JMP <lab>
lab:
; if <COND> <BODY>
;
; convert condition into a flag
JN<cond> skip
<BODY>
skip:
135
; if <COND> <BODY1> ;else <BODY2>
;
; convert condition into a flag
JN<cond> lelse
<BODY1>
JMP done
lelse:
<BODY2>
done:
; while (true) <BODY>
; for (;;) <BODY>
lagain:
<BODY>
JMP lagain
ldone: ; may be used for break in <BODY>
; while <COND> <BODY>
;
lagain:
; convert condition into a flag
JN<cond> ldone:
<BODY>
136
JMP lagain
ldone:
; do <BODY> while <COND>;
;
lagain:
<BODY>
; convert condition into a flag
J<cond> lagain
; while <COND> { <BODY1> break; <BODY2>}
;
lagain:
; convert condition into a flag
JN<cond> ldone:
<BODY1>
JMP ldone
<BODY2>
JMP lagain
ldone:
137
; while <COND> { <BODY1> continue; <BODY2>}
;
lagain:
; convert condition into a flag
JN<cond> ldone:
<BODY1>
JMP lcont
<BODY2>
lcont:
JMP lagain
ldone:
NOT covered: for, switch, and how to deal with more complex
expres-
sions (see “complete and shortcut” section).
138
Chapter 5
Variables
5.1 Declaring variables
Assembler language includes operators for declaring variables.
The primitive data declarations operators correspond to the
primitive
types in the assembler language:
139
[name] DB value[s] (Declare Byte)
[name] DW value[s] (Declare Word)
[name] DD value[s] (Declare Doubleword)
[name] DQ value[s] (Declare Qword)
[name] DT value[s] (Declare Tenbyte)
The [name] field of the declaration should be a unique
identifier; this
field is usually present but not required. The [name] field is
used to refer to
the variable and if absent, there is no way to refer to it.
The value field refers to the initial value of the variable, this
field is
always required.
Examples:
year dw 2020
a1 db ’a’ ; as char
a2 db 97 ; as decimal
a3 db 61h ; as hex
a4 db 01100001b ; as binary
dwrd dd 12345678
notice that a1,a2,a3 and a4 all provide the same initial value, in
different
140
formats. ’a’ is 97 since the character ’a’ is in the 97th position
in the
ASCII table.
Multiple initialization values are allowed, in such cases the
declared
variable is actually an array:
primes DW 2,3,5,7,11,13
str1 DB ’H’,’e’,’l’,’l’,’o’
str2 DB ’Hello’
str3 DB "H",’e’,"L","L",’o’
str4 DB "HELLO"
str5 DB ’HE’,"LLO"
All five strings above contain identical data: both single of
double quotes
are allowed – but single quotes must match single quotes and
double quotes
must match double. Further, strings can be enteres either as
strings or as
sequences of characters.
s DB ’Hello, World’,0
shows a zero-terminated string (C language string format).
One would use single quotes to enter a string that includes
double
quotes and vice versa:
m DB ’"Hello", he said’
and will have to break the string into parts if it includes both
single and
double quotes
m DB ’"Don’,"’",’t", he said’
Long sequence of initialization values can be broken into
multiple dec-
larations:
msg DB "ERROR: you must not use"
DB "this operator the way you do"
DB "please delete the program and"
DB ’read the textbook’,0
141
in the example above we do not name the lines after the first –
the program
will not refer to them.
Often, one needs to calculate the length of a string like msg
above. This
can be done during the run time, by counting the characters up
to the
terminator, or during compilation:
msg DB "ERROR: you must not use"
DB "this operator the way you do"
DB "please delete the program and"
DB ’read the textbook’
msgend DB 0
msglen DW offset msgend-offset msg
....
MOV CX,offset msgend-offset msg
The offset operator is similar to the address operator & in
C/C++.
offset msgend-offset msg is a constant, computed during the
compile
time, it is equal to the number of the characters in the message,
excluding
the terminating null character. With this value known we
actually do not
need the null terminator at all and can instead use
msg DB "ERROR: you must not use"
DB "this operator the way you do"
DB "please delete the program and"
DB ’read the textbook’
msgend LABEL BYTE
msglen DW offset msgend-offset msg
....
MOV CX,offset msgend-offset msg
where LABEL declares a byte type variable with NO initial
values. LABEL
is used instead of DB since the latter will always allocate at
least one byte.
Other types that may follow LABEL include WORD, DWORD,
QWORD, TENBYTE.
To specify an uninitialized variable, use “?”:
x DD ?
y DD ?
142
Declared variable can be referred to in code:
x DW ?
y DW ?
z DW ?
.....
MOV AX,x
ADD AX,y
MOV z,AX
Notice that declared variables have size and the usual size rules
apply:
MOV x,10 ; OK, size known from x
MOV AL,x ; error, size mismatch
MOV AL,byte ptr x ; OK, size casted to byte
The dup operator can be used to specify large arrays without
having to
list all the initial values:
ar1 DW 1,1,1,1,1
ar2 DW 5 dup(1)
two arrays above contain exactly the same data. More
practically useful
example would be
ar3 DW 1000 dup(?)
– a 1000-element unitialized array.
Initialized and unitialized entries can be used within the same
declara-
tion:
aa DW 1,2,?,?,5,6
fib DW 1,1,998 dup(?)
the fib array will be used to compute the Fibonacci numbers
below. First
two elements are initialized statically, the subsequent entries
will be com-
puted.
143
Data declarations are not executable instructions and in general
one
should separate them from code so they do not get executed.
This is usually
accomplished by placing them into a separate segment. If there
is only one
segment (.com files), then the typical approach is to structure
the program
as follows:
start: ; entry point of the program
JMP real_start
.....
data and procedure declarations
.....
real_start:
; code begins here
because of overjumping the data, it will never get executed.
An exception to the above is the case when we know the exactly
which
instructions the data corresponds to. For example,
MOV AX,BX
DB 90h
ADD SI,DI
is totally safe, 90h is the NOP operation! While there is no good
reason to
code NOP as data, similar approach can be used to enter an
instruction that
is not supported by the assembler compiler, or a form of the
instruction
assembler will not produce.
There are two different ways to encode ADD BX,CX as machine
code.
One of the is normally produced by the assembler compiler, the
other can
only be entered using db’s.
Exercise: Do it.
5.2 Using arrays
This section contains several examples of common snippets of
code.
Example 1: initialize a 256-element byte array to contain all
possible
256 characters in ascending order.
144
ba DB 256 dup(?)
.....
SUB BX,BX
MOV CX,256
l: MOV ba[BX],BL
INC BX
LOOP l
notice that we use the same register as both index and the
current value to
be stored; this is possible only because of the byte size of the
data.
Example 2: initialize a 1000-element word array to contain
numbers 0
through 999 in ascending order.
nums DW 1000 dup(?)
.....
SUB AX,AX
SUB BX,BX
MOV CX,1000
l: MOV nums[BX],AX
INC AX
ADD BX,2
LOOP l
Exercise: Explain what would happen if BX is incremented only
by 1.
Generally, the index in loops like above needs to be
incremented by the
byte size of the data – the index counts bytes, and not elements.
One very
important implication is that in high language loops of this type
there is
a hidden multiplication by the element size, depending on the
compiler it
may or may not be implemented as a multiplication.
Consider two ways to implement the above code in C :
short s[1000]; int i;
for (i=0; i<1000; i++)
s[i]=i;
145
this code has a hidden multiplication by 2, index i needs to be
translated
into an offset to compute the address of s[i]. The other way to
code the
example
short s[1000]; int i; short *p=s;
for (i=0; i<1000; i++)
*p++=i;
does not have a hidden multiplication and often would result in
better com-
piled code.
Example 3: for collection, let us also initialize a 1000-element
double-
word (int) array to contain numbers 0 through 999 in ascending
order.
nums DD 1000 dup(?)
.....
SUB EAX,EAX
SUB BX,BX
MOV CX,1000
l: MOV nums[BX],EAX
INC EAX
ADD BX,4
LOOP l
Example 4: Compute sum and average of the elements in a
1000-
element word array (in this and subsequent examples we assume
nums to
be initialized with some values first.)
nums DW 1000 dup(?)
sum DW ?
ave DW ?
.....
SUB AX,AX
SUB BX,BX
MOV CX,1000
l: ADD AX,nums[BX]
ADD BX,2
LOOP l
MOV sum,AX
CWD
146
MOV BX,1000
IDIV BX
MOV ave,AX
(note that the answer will be rounded down to an integer)
Example 5: Find the largest element in an array of signed
numbers
nums DW 1000 dup(?)
max DW ?
.....
MOV AX,nums[0]
MOV BX,2
MOV CX,999
l: CMP AX,nums[BX]
JGE skip ; use JAE for unsigned
MOV AX,nums[BX]
skip:ADD BX,2
LOOP l
MOV max,AX
Use JLE and JBE if searching for the smallest entry.
Example 6: Compute Fibonacci numbers
fibs DW 1,1,998 dup(?)
.....
MOV BX,4
MOV CX,998
l: MOV AX,fibs[BX-4]
ADD AX,fibs[BX-2]
MOV fibs[BX],AX
ADD BX,2
LOOP l
147
Exercise: Would all the values be computed correctly?
Example 7: Find value t in an array of numbers
nums DW 1000 dup(?)
max DW ?
......
MOV AX,t
SUB BX,BX
MOV CX,1000
l: CMP AX,nums[BX]
JE found
ADD BX,2
LOOP l
not_found:
.....
found:
.....
A more interesting program would be a binary search... it not all
that
difficult in assembler. We assume nums contains signed
numbers in ascend-
ing order.
nums DW 1000 dup(?)
......
MOV AX,t
MOV SI,0 ; left interval bound
MOV DI,999*2 ;right interval bound
l: CMP SI,DI
JA not_found
148
MOV BX,SI
ADD BX,DI
SHR BX,2
SHL BX,1 ; middle point
CMP t,nums[BX]
JE found
JG right
left:
MOV DI,BX
SUB DI,2
JMP l
right:
MOV SI,BX
ADD SI,2
JMP l
....
found:
....
not_found:
....
Exercise: Reverse the elements in an word array.
Exercise: Implement bubble sort.
149
5.3 Memory addressing syntax
16 bit addressing offers only the following nine schemes of
entering the
offset
• const
• BX+const
• BP+const (***)
• SI+const
• DI+const
• BX+SI+const
• BX+DI+const
• BP+SI+const (***)
• BP+DI+const (***)
Notes:
How to remember:
Base, or Index, or Both, or Neither
Double register addressing allows to efficiently access two
dimensional
array, but more often is used to access dynamic one-dimensional
arrays
(base points to the beginning of an array, index “indexes” it.)
This limitation allows for a compact encoding of the addressing
scheme.
Despite nine choices offered, only three bits are needed.
Options marked with (***) imply SS: segment use (others
default to
DS:).
150
5.4 Video memory in text mode.
For the default text video mode, the video memory begins in the
middle
of the B band, corresponding to the segment value B800. (the
reasons are
historical).
Let us begin with an example showing the idea, we will specify
the exact
rules later.
MOV AX,0B800h
MOV ES,AX ; ES=>video!
MOV byte ptr ES:[0],’A’
INT 20h ; terminate the program
If compiled (you would need to add extra lines to the file per
assembler
language syntax requirements) and run this program will deposit
letter A
into the upper left corner of the screen!
The general layout of video memory is the following:
The screen has 25 rows and 80 columns of characters; each cell
cor-
responds to two bytes in the video memory. The even bytes (0,
2, 4, ...)
are the ASCII characters, the odd bytes define the attributes
(colors) of the
character in the preceding even address. Let us provide an
example of this:
MOV AX,0B800h
MOV ES,AX ; ES=>video!
MOV byte ptr ES:[0],’A’
MOV byte ptr ES:[1],71h
INT 20h ; terminate the program
151
71h defines the attributes of the letter A on screen. The first
nibble (7)
is the background color, the second (1) is the foreground color:
A would
appear as blue on white, following these definitions:
• 0 – black (0x000000)
• 1 – blue (0x0000AA)
• 2 – green (0x00AA00)
• 3 – cyan (0x00AAAA)
• 4 – red (0xAA0000)
• 5 – magenta (0xAA00AA)
• 6 – brown (0xAA5500)
• 7 – white/light gray (0xAAAAAA)
• 8 – (dark) gray (0x555555)
• 9 – bright blue (0x5555FF)
• 10 – bright green (0x55FF55)
• 11 – bright cyan (0x55FFFF)
• 12 – bright red (0xFF5555)
• 13 – bright magenta (0xFF55FF)
• 14 – yellow (0xFFFF55)
• 15 – bright white (0xFFFFFF)
(In some modes the high bit of the background color indicates
blinking
rather than color, other variations of interpretation exist).
While MOV is the most common instruction to change video
memory,
any other instruction can be used. Consider:
152
MOV AX,0B800h
MOV ES,AX ; ES=>video!
MOV byte ptr ES:[0],’A’
MOV CX,256
L: INC byte ptr ES:[0]
LOOP L
INT 20h ; terminate the program
The program above will show A, then change it to B,....
eventually
coming back to an A.
Will you be actually able to see the letters changing onscreen?
Well..
no! This will happen too fast to notice.
Exercise: Fix it
Writing to video memory is the best way of doing full-screen
output,
most suitable for tables and games; other methods are used to
produce
scrollable console output (see INT 21h).
Assuming that the rows are numbered up-to-down 0 through 24,
and
the columns are numbered left-to-right 0 through 79, to write at
position
(X,Y) we should store information at offset 160 ∗ Y + 2 ∗ X in
the video
segment.
153
Chapter 6
Project #1
Due Date: 04/19/2021
Goal: implement Game of 1024.
This a new simple board game(rather a puzzle), you can play the
online
version Here.
Project specifics:
1. Use direct write to the video memory (seg 0xB800).
2. Output will be explained(today), input can be done using int
16h
(look up).
3. Recommendation to use color and line draw characters (2nd
half of
the ASCII OEM set)
4. Projects are individual and should not be copies of source
code found
elsewhere; such submissions will not be accepted.
5. You do not need to follow the layout exactly, shortcuts that
simplify
programming but essentially keep the game the same are fine.
154
https://1024game.org/
Basic grading guidelines:
• C something works (but perhaps not really playable)
• B playable with glitches/deficiencies.
• A enjoyable to play (well... to the degree the entire idea of
such
puzzle is)
6.0.1 Program outline
Hopefully, of some help – this is an overall structure of the
program one
can use.
0 Initialize the program, display.
1 Initialize the configuration
2 Display the board
3 Wait for a key (int 16h)
4 If the key indicates a move of a piece, update the
configuration, go to
step 2.
5 .If the key indicates new game, go to step 1.
6 If the key indicates quit, quit.
7 Ignore the key, go to step 3.
6.0.2 Submission
Submit the source (.asm) and the executable (.com or .exe),
ideally by
email.
Please not that some email providers kill attachments of certain
types.
Google is likely to reject .com or .exe, CCNY email does not
like .asm.
To avoid problems : rename the files to give them “safe”
extension, then
zip(or rar) files together. Please do not name the archive
“project.zip”,
rather use your name to name the archive.
(Example: rename project.com to project.com.txt)
155
When submitting, CC: yourself – this way you would see that
the project
does come through.
Submission can be made either my CCNY email or to
[email protected] (the
latter address does not kill any attachments!).
Submission is your responsibility.
156
6.1 How to compile and run a program.
Assembler distribution
(Download, unpack, make sure it works. The distribution
contains DOS-
BOX, TASM, TLINK).
At this time we only need to know how to create a binary from
an
assembler program, a sample HELLO.ASM is provided.
.model tiny
.code
org 100h
start:
; BEGIN BODY
mov dx, offset hello
mov ah, 9
int 21h
mov ah, 4ch
int 21h
hello db ’Hello, world.’,13,10,’$’
; END BODY
end start
The part between BEGIN BODY and END BODY is replaced by
your own
code.
To compile: (assuming you are in the directory where hello.asm
is
unpacked; under DOSBOX or 32-bit OS)
C> BINTASM hello
This will create file HELLO.OBJ. To link
C>BINTLINK hello /t
157
http://tinyurl.com/cjdyruy
This will create file HELLO.COM. To run
C>Hello
This will say (guess what?)
This output method (console write) will not work right for full-
screen
apps, write into video directly, instead:
.model tiny
.code
org 100h
start: jmp real_start
; place your variables and/or procedures here.
real_start:
; BEGIN BODY
MOV AX,0B800h
MOV ES,AX ; ES=>video!
MOV byte ptr ES:[0],’A’
MOV CX,256
L: INC byte ptr ES:[0]
LOOP L
INT 20h ; terminate the program
end start
158
Chapter 7
Stack and procedures
Hardware stack discussed in the chapter is not a necessary
feature of a
hardware, many earlier designs did not provide it. However,
having it
helps in many ways, most notably in the ability to implement
procedures
efficiently.
7.1 Using stack
We will begin working with the stack by using it. For this
section it is
not important just how it works psysically, we only state that
there is –
somewhere in memory – a data structure, implemented in
hardware, that
parallels the stack software data structure. Namely, we can push
things
on the stack, we can pop them back, and the operations work in
the LIFO
fashion: Last In, First Out.
Let us begin with the syntax:
PUSH w/d m/r (PUSH word or doubleword)
POP w/d m/r (POP word or doubleword)
Notice that byte operations are not supported. Further, notice
that push-
159
ing or poping memory actually results in a memory-to-memory
operation!
These stack operations are one of the exceptions to the general
no memory-
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu
Chapter 1SyllabusCatalog Description Computer structu

Contenu connexe

Similaire à Chapter 1SyllabusCatalog Description Computer structu

Simplified instructional computer
Simplified instructional computerSimplified instructional computer
Simplified instructional computerKirby Fabro
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
 
other-architectures.ppt
other-architectures.pptother-architectures.ppt
other-architectures.pptJaya Chavan
 
COA Lecture 01(Introduction).pptx
COA Lecture 01(Introduction).pptxCOA Lecture 01(Introduction).pptx
COA Lecture 01(Introduction).pptxsyed rafi
 
VTU University Micro Controllers-06ES42 lecturer Notes
VTU University Micro Controllers-06ES42 lecturer NotesVTU University Micro Controllers-06ES42 lecturer Notes
VTU University Micro Controllers-06ES42 lecturer Notes24x7house
 
Introduction to-microprocessors
Introduction to-microprocessorsIntroduction to-microprocessors
Introduction to-microprocessorsVolodymyr Ushenko
 
Assembly Language In Electronics
Assembly Language In ElectronicsAssembly Language In Electronics
Assembly Language In ElectronicsAsaduzzaman Kanok
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackTomer Zait
 
CO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptxCO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptxgagarwazir7
 
Embedded system (Chapter 2) part A
Embedded system (Chapter 2) part AEmbedded system (Chapter 2) part A
Embedded system (Chapter 2) part AIkhwan_Fakrudin
 

Similaire à Chapter 1SyllabusCatalog Description Computer structu (20)

Simplified instructional computer
Simplified instructional computerSimplified instructional computer
Simplified instructional computer
 
C programming part2
C programming part2C programming part2
C programming part2
 
C programming part2
C programming part2C programming part2
C programming part2
 
C programming part2
C programming part2C programming part2
C programming part2
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
Highridge ISA
Highridge ISAHighridge ISA
Highridge ISA
 
other-architectures.ppt
other-architectures.pptother-architectures.ppt
other-architectures.ppt
 
COA Lecture 01(Introduction).pptx
COA Lecture 01(Introduction).pptxCOA Lecture 01(Introduction).pptx
COA Lecture 01(Introduction).pptx
 
VTU University Micro Controllers-06ES42 lecturer Notes
VTU University Micro Controllers-06ES42 lecturer NotesVTU University Micro Controllers-06ES42 lecturer Notes
VTU University Micro Controllers-06ES42 lecturer Notes
 
Mips architecture
Mips architectureMips architecture
Mips architecture
 
Introduction to-microprocessors
Introduction to-microprocessorsIntroduction to-microprocessors
Introduction to-microprocessors
 
Assembly Language In Electronics
Assembly Language In ElectronicsAssembly Language In Electronics
Assembly Language In Electronics
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
 
CO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptxCO&AL-lecture-04 about the procedures in c language (1).pptx
CO&AL-lecture-04 about the procedures in c language (1).pptx
 
EEE226a.ppt
EEE226a.pptEEE226a.ppt
EEE226a.ppt
 
Cao 2012
Cao 2012Cao 2012
Cao 2012
 
Embedded system (Chapter 2) part A
Embedded system (Chapter 2) part AEmbedded system (Chapter 2) part A
Embedded system (Chapter 2) part A
 

Plus de EstelaJeffery653

Individual ProjectMedical TechnologyWed, 9617Num.docx
Individual ProjectMedical TechnologyWed, 9617Num.docxIndividual ProjectMedical TechnologyWed, 9617Num.docx
Individual ProjectMedical TechnologyWed, 9617Num.docxEstelaJeffery653
 
Individual ProjectThe Post-Watergate EraWed, 3817Numeric.docx
Individual ProjectThe Post-Watergate EraWed, 3817Numeric.docxIndividual ProjectThe Post-Watergate EraWed, 3817Numeric.docx
Individual ProjectThe Post-Watergate EraWed, 3817Numeric.docxEstelaJeffery653
 
Individual ProjectArticulating the Integrated PlanWed, 31.docx
Individual ProjectArticulating the Integrated PlanWed, 31.docxIndividual ProjectArticulating the Integrated PlanWed, 31.docx
Individual ProjectArticulating the Integrated PlanWed, 31.docxEstelaJeffery653
 
Individual Multilingualism Guidelines1)Where did the a.docx
Individual Multilingualism Guidelines1)Where did the a.docxIndividual Multilingualism Guidelines1)Where did the a.docx
Individual Multilingualism Guidelines1)Where did the a.docxEstelaJeffery653
 
Individual Implementation Strategiesno new messagesObjectives.docx
Individual Implementation Strategiesno new messagesObjectives.docxIndividual Implementation Strategiesno new messagesObjectives.docx
Individual Implementation Strategiesno new messagesObjectives.docxEstelaJeffery653
 
Individual Refine and Finalize WebsiteDueJul 02View m.docx
Individual Refine and Finalize WebsiteDueJul 02View m.docxIndividual Refine and Finalize WebsiteDueJul 02View m.docx
Individual Refine and Finalize WebsiteDueJul 02View m.docxEstelaJeffery653
 
Individual Cultural Communication Written Assignment  (Worth 20 of .docx
Individual Cultural Communication Written Assignment  (Worth 20 of .docxIndividual Cultural Communication Written Assignment  (Worth 20 of .docx
Individual Cultural Communication Written Assignment  (Worth 20 of .docxEstelaJeffery653
 
Individual ProjectThe Basic Marketing PlanWed, 3117N.docx
Individual ProjectThe Basic Marketing PlanWed, 3117N.docxIndividual ProjectThe Basic Marketing PlanWed, 3117N.docx
Individual ProjectThe Basic Marketing PlanWed, 3117N.docxEstelaJeffery653
 
Individual ProjectFinancial Procedures in a Health Care Organiza.docx
Individual ProjectFinancial Procedures in a Health Care Organiza.docxIndividual ProjectFinancial Procedures in a Health Care Organiza.docx
Individual ProjectFinancial Procedures in a Health Care Organiza.docxEstelaJeffery653
 
Individual Expanded Website PlanView more »Expand view.docx
Individual Expanded Website PlanView more  »Expand view.docxIndividual Expanded Website PlanView more  »Expand view.docx
Individual Expanded Website PlanView more »Expand view.docxEstelaJeffery653
 
Individual Expanded Website PlanDueJul 02View more .docx
Individual Expanded Website PlanDueJul 02View more .docxIndividual Expanded Website PlanDueJul 02View more .docx
Individual Expanded Website PlanDueJul 02View more .docxEstelaJeffery653
 
Individual Communicating to Management Concerning Information Syste.docx
Individual Communicating to Management Concerning Information Syste.docxIndividual Communicating to Management Concerning Information Syste.docx
Individual Communicating to Management Concerning Information Syste.docxEstelaJeffery653
 
Individual Case Analysis-MatavIn max 4 single-spaced total pag.docx
Individual Case Analysis-MatavIn max 4 single-spaced total pag.docxIndividual Case Analysis-MatavIn max 4 single-spaced total pag.docx
Individual Case Analysis-MatavIn max 4 single-spaced total pag.docxEstelaJeffery653
 
Individual Assignment Report Format• Report should contain not m.docx
Individual Assignment Report Format• Report should contain not m.docxIndividual Assignment Report Format• Report should contain not m.docx
Individual Assignment Report Format• Report should contain not m.docxEstelaJeffery653
 
Include LOCO api that allows user to key in an address and get the d.docx
Include LOCO api that allows user to key in an address and get the d.docxInclude LOCO api that allows user to key in an address and get the d.docx
Include LOCO api that allows user to key in an address and get the d.docxEstelaJeffery653
 
Include the title, the name of the composer (if known) and of the .docx
Include the title, the name of the composer (if known) and of the .docxInclude the title, the name of the composer (if known) and of the .docx
Include the title, the name of the composer (if known) and of the .docxEstelaJeffery653
 
include as many events as possible to support your explanation of th.docx
include as many events as possible to support your explanation of th.docxinclude as many events as possible to support your explanation of th.docx
include as many events as possible to support your explanation of th.docxEstelaJeffery653
 
Incorporate the suggestions that were provided by your fellow projec.docx
Incorporate the suggestions that were provided by your fellow projec.docxIncorporate the suggestions that were provided by your fellow projec.docx
Incorporate the suggestions that were provided by your fellow projec.docxEstelaJeffery653
 
inal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docx
inal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docxinal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docx
inal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docxEstelaJeffery653
 
include 1page proposal- short introduction to research paper and yo.docx
include 1page proposal- short introduction to research paper and yo.docxinclude 1page proposal- short introduction to research paper and yo.docx
include 1page proposal- short introduction to research paper and yo.docxEstelaJeffery653
 

Plus de EstelaJeffery653 (20)

Individual ProjectMedical TechnologyWed, 9617Num.docx
Individual ProjectMedical TechnologyWed, 9617Num.docxIndividual ProjectMedical TechnologyWed, 9617Num.docx
Individual ProjectMedical TechnologyWed, 9617Num.docx
 
Individual ProjectThe Post-Watergate EraWed, 3817Numeric.docx
Individual ProjectThe Post-Watergate EraWed, 3817Numeric.docxIndividual ProjectThe Post-Watergate EraWed, 3817Numeric.docx
Individual ProjectThe Post-Watergate EraWed, 3817Numeric.docx
 
Individual ProjectArticulating the Integrated PlanWed, 31.docx
Individual ProjectArticulating the Integrated PlanWed, 31.docxIndividual ProjectArticulating the Integrated PlanWed, 31.docx
Individual ProjectArticulating the Integrated PlanWed, 31.docx
 
Individual Multilingualism Guidelines1)Where did the a.docx
Individual Multilingualism Guidelines1)Where did the a.docxIndividual Multilingualism Guidelines1)Where did the a.docx
Individual Multilingualism Guidelines1)Where did the a.docx
 
Individual Implementation Strategiesno new messagesObjectives.docx
Individual Implementation Strategiesno new messagesObjectives.docxIndividual Implementation Strategiesno new messagesObjectives.docx
Individual Implementation Strategiesno new messagesObjectives.docx
 
Individual Refine and Finalize WebsiteDueJul 02View m.docx
Individual Refine and Finalize WebsiteDueJul 02View m.docxIndividual Refine and Finalize WebsiteDueJul 02View m.docx
Individual Refine and Finalize WebsiteDueJul 02View m.docx
 
Individual Cultural Communication Written Assignment  (Worth 20 of .docx
Individual Cultural Communication Written Assignment  (Worth 20 of .docxIndividual Cultural Communication Written Assignment  (Worth 20 of .docx
Individual Cultural Communication Written Assignment  (Worth 20 of .docx
 
Individual ProjectThe Basic Marketing PlanWed, 3117N.docx
Individual ProjectThe Basic Marketing PlanWed, 3117N.docxIndividual ProjectThe Basic Marketing PlanWed, 3117N.docx
Individual ProjectThe Basic Marketing PlanWed, 3117N.docx
 
Individual ProjectFinancial Procedures in a Health Care Organiza.docx
Individual ProjectFinancial Procedures in a Health Care Organiza.docxIndividual ProjectFinancial Procedures in a Health Care Organiza.docx
Individual ProjectFinancial Procedures in a Health Care Organiza.docx
 
Individual Expanded Website PlanView more »Expand view.docx
Individual Expanded Website PlanView more  »Expand view.docxIndividual Expanded Website PlanView more  »Expand view.docx
Individual Expanded Website PlanView more »Expand view.docx
 
Individual Expanded Website PlanDueJul 02View more .docx
Individual Expanded Website PlanDueJul 02View more .docxIndividual Expanded Website PlanDueJul 02View more .docx
Individual Expanded Website PlanDueJul 02View more .docx
 
Individual Communicating to Management Concerning Information Syste.docx
Individual Communicating to Management Concerning Information Syste.docxIndividual Communicating to Management Concerning Information Syste.docx
Individual Communicating to Management Concerning Information Syste.docx
 
Individual Case Analysis-MatavIn max 4 single-spaced total pag.docx
Individual Case Analysis-MatavIn max 4 single-spaced total pag.docxIndividual Case Analysis-MatavIn max 4 single-spaced total pag.docx
Individual Case Analysis-MatavIn max 4 single-spaced total pag.docx
 
Individual Assignment Report Format• Report should contain not m.docx
Individual Assignment Report Format• Report should contain not m.docxIndividual Assignment Report Format• Report should contain not m.docx
Individual Assignment Report Format• Report should contain not m.docx
 
Include LOCO api that allows user to key in an address and get the d.docx
Include LOCO api that allows user to key in an address and get the d.docxInclude LOCO api that allows user to key in an address and get the d.docx
Include LOCO api that allows user to key in an address and get the d.docx
 
Include the title, the name of the composer (if known) and of the .docx
Include the title, the name of the composer (if known) and of the .docxInclude the title, the name of the composer (if known) and of the .docx
Include the title, the name of the composer (if known) and of the .docx
 
include as many events as possible to support your explanation of th.docx
include as many events as possible to support your explanation of th.docxinclude as many events as possible to support your explanation of th.docx
include as many events as possible to support your explanation of th.docx
 
Incorporate the suggestions that were provided by your fellow projec.docx
Incorporate the suggestions that were provided by your fellow projec.docxIncorporate the suggestions that were provided by your fellow projec.docx
Incorporate the suggestions that were provided by your fellow projec.docx
 
inal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docx
inal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docxinal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docx
inal ProjectDUE Jun 25, 2017 1155 PMGrade DetailsGradeNA.docx
 
include 1page proposal- short introduction to research paper and yo.docx
include 1page proposal- short introduction to research paper and yo.docxinclude 1page proposal- short introduction to research paper and yo.docx
include 1page proposal- short introduction to research paper and yo.docx
 

Dernier

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Dernier (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Chapter 1SyllabusCatalog Description Computer structu

  • 1. Chapter 1 Syllabus Catalog Description: Computer structure, machine representation of data, addressing and indexing, computation and control instructions, assembly language and assemblers; procedures (subroutines) and data segments, linkages and subroutine calling conventions, loaders; practical use of an assembly language for computer implementation of illustrative examples. Course Goals 0 Knowledge of the basic structure of microcomputers - registers, mem- ory, addressing I/O devices, etc. 1 Knowledge of most non-privileged hardware instructions for the Ar- chitecture being studied. 2 Ability to write small programs in assembly language 3 Knowledge of computer representations of data, and how to do simple arithmetic in binary & hexadecimal, including conversions 4 Being able to implementing a moderately complicated
  • 2. algorithm in assembler, with emphasis on efficiency. 5 Knowledge of procedure calling conventions and interfacing with high- level languages. Optional Text: Kip Irvine, Assembly Language for the IBM PC, Prentice Hall, 4th or 5th edition 1 Additional References: Intel and DOS API documentation as presented in Intel publications and online at www.x86.org; lecture notes (to be sup- plied as we go). Prerequisites by Topic. Working knowledge of some programming lan- guage (102/103: C/C++); Minimal programming experience Major Topics Covered in the Course: 1 Low-level and high-level languages; why learn assembler? 2 How does one study a new computer: the CPU, memory, addressing modes, operation modes. 3 History of the Intel family of microprocessors. 4-5 Registers; simple arithmetic instructions; byte order;
  • 3. Arithmetic and logical operations. 6 Implementing longer integer type support; carry and overflow. 7 Shifts, multiplication and division. 8 Memory layout. 9 Direct video memory access; discussion of the first project. 10 Assembler syntax; how to use the tools. 11-13 Conditional & unconditional jumps; loops; emulating high-level lan- guage constructions; Stack; call and return; procedures 14-15 String instructions: effcient memory-to-memory operations. 16 Interrupts overview: interrupt table; how do interrupts work; classif- cation. 17 Summary of the most important interrupts. 18-20 DOS interrupt; File I/O functions; file-copy program; discussion of the second project 21 Interrupt handlers; keyboard drivers; timer-driven processes; viruses and virus-protection software. 2
  • 4. 22 Debug interrupts; how do debuggers and profilers work. 23-24 (Optional).interfacing with high level languages; Protected mode fun- damentals Grading The grading is based on two projects, midterm project is 49% and the final is 51%. Please note that the projects are individual, submitting projects that are similar to submissions of others and/or are essentially downloads from the Web would result in a fail. Office Hours My hours this term for CSc 210 will be 3:45 ¶Ł 4:45 on Mondays. Zoom links: 11am https://ccny.zoom.us/j/85378437821 2pm https://ccny.zoom.us/j/87625527827 3 https://ccny.zoom.us/j/87625527827 https://ccny.zoom.us/j/85378437821 Chapter 2 Preliminary material
  • 5. 4 : Why assembler? • Why take this class? • Why program assembler? • Why know assembler? 5 : NOTE: think Binary! Why binary? Binary numbers (WIKI) (brief answer: because this is easy to implement) Why hex? Hexadecimal numbers (WIKI) (brief answer: because it is much easier to work with shorter strings) What about DNA? 6 https://en.wikipedia.org/wiki/Hexadecimal https://en.wikipedia.org/wiki/Binary_number 2.1 Introduction #1: looking at new hard- ware • CPU, general purpose (arithmetic) registers
  • 6. – How large? – How many? – Are they all the same? – Modes? • Memory Model – Is all memory the same? – Flat? – Segmented? – Paged? • Other hardware (peripherals) • OS • Special features 7 2.2 Introduction #2: History Intel Processors Over the Years The History Of Intel CPUs -1971 before Intel 1971 4004
  • 7. • Intention • Name • Usage • What can you do with 4 bits? 1972 8008 - Doubling – what can you do with 8 bits 1974 8080 1975 8085 1975 Z80 1974 CP/M – Digital Research, Gary Kildall 1978 - 8086 – X86 architecture. 8 8bit registers, 8(+6) 16 bit registers. 1mb limit. 1mb mystery? 1979 8088 – cost cutting 1981 iAPX 432 – an attempted 32 bit processor 1982 80186 – minor improvements/corrections 1981 IBM PC 8
  • 9. Problem: Let AH = 2,AL = 3. What is AX? Solution : 00000010 00000011 AH=00000010b=02h AL=00000011b=03h AX=0000001000000011b = 0203h = 515d Note: suffices b, h and d are part of the Assembly language syntax; d(ecimal) is the default. Assignment syntax, however, is different, it is only used for an illustration here. Fast solution: AX = 2*256+3
  • 10. AX = (2<<8)+3 Problem: Let AX = 2020. What are AL and AH ? Registers BX,CX, DX are divided similarly. General purpose aka Arithmetic registers: Sequence A,B,C,D is an illusion, these letters stand for Accumulator, Base, Count, Data. 8 8bit registers: AH,AL,BH,BL,CH,CL,DH,DL 8 16bit registers: AX,BX,CX,DX and SI,DI,BP(?),SP(??) SP generally cannot be used for calculations, BP usually cannot be used either. (32 bit to be described later) 11
  • 11. IP – Instruction pointer points to the first byte of the current instruction. Code: B B B B B B B B B B B B B B B B B B Code is essentially a one dimensional array of bytes (in C/C++ – un- signed char type). IP initially is 0, after one instruction is executed it should be 2, then 5, then 6, .... Simplified logic (one instruction) byte code[MAXCODE]; byte opcode; opcode=code[IP++]; switch (opcode) { case 0x00: ...
  • 12. case 0x01: ... ... case 0xFF: ... } each subcase will read additional bytes if needed to complete reading of the instruction. 12 Simplified logic (full execution) byte code[MAXCODE]; byte opcode; while(true) { opcode=code[IP++];
  • 13. switch(opcode) { case 0x00: ... .... case 0xFF: ... } } Why is this simplified? • CS is also used. • how do we terminate? • how do we change the executed sequence? What do we do with this? Is switch efficient? Question: what would IP=k do (if such instruction exists). 13
  • 14. FLAGS register Should be seen not as a single 16-bit register but as a collection of 16 1-bit registers. More important ones: ZF, SF, CF, DF Neither FLAGS nor the names above are keywords. 14 Segment registers : CS, DS, SS, ES – specify where segments (“parts”) of the program are located. • CS Code Segment • DS Data Segment • SS Stack Segment
  • 15. • ES Extra Segment 15 2.5 8086 registers – full list • AX Accumulator eXtended • AL Accumulator Low • AH Accumulator High • BX Base eXtended • BL Base Low • BH Base High • CX Count eXtended • CL Count Low • CH Count High
  • 16. • DX Data eXtended • DL Data Low • DH Data High • SI Source Index • DI Destination Index • BP Base Pointer • SP Stack Pointer • CS Code Segment • DS Data Segment • SS Stack Segment • ES Extra Segment • IP Instruction Pointer (not a keyword) • Flags Flags (not a keyword)
  • 17. 16 2.6 General addressing scheme Three distinct ways to address memory: • Absolute address : mem[offset] (flat model–generally cannot be done) • Segmented address : mem[f(seg,offset)] (done by hardware). Usual notation: ssss:oooo (hex digits) • Expressing segmented address in assembly syntax – to be covered later The f(seg,offset) function is mode-dependent. In real mode, f(seg,offset)=seg*16+offset. This allows to build 20 bit numbers out of 16 bit quantities. Examples 0000:0000 =⇒ 00000
  • 18. 1234:5678 =⇒ 179B8 + 12340 05678 -------- 179B8 The mapping is not one-to-one! Different (seg,offset) pairs may point to the same address. 0000:0100 =⇒ 00100 0010:0000 =⇒ 00100 Puzzle FFFF:FFFF =⇒ ????? (ref: A10 address line) == 17
  • 19. Code segment is effectively mem[f(CS,i)], Data segment is effectively mem[f(DS,i)] Protected memory addressing function uses Segment Descriptor Table lookup. Fields include Base, Limit, Access Rights. Implication: instructions Segment<-value are very costly in protected mode. 18 2.7 Back to History: Original IBM PC (1981) Distorted: Timeline IBM’s brand recognition, along with a massive marketing
  • 20. campaign, ignites the fast growth of the personal computer mar - ket with the announcement of its own personal computer (PC). The first IBM PC, formally known as the IBM Model 5150, was based on a 4.77 MHz Intel 8088 microprocessor and used Mi - crosofts MS-DOS operating system. The IBM PC revolutionized business computing by becoming the first PC to gain widespread adoption by industry. The IBM PC was widely copied (“cloned”) and led to the creation of a vast “ecosystem” of software, pe - ripherals, and other commodities for use with the platform. Better: WIKIPEDIA article Additional link (on reaction): Orson Scott Card’s novel 19 https://en.wikipedia.org/wiki/Lost_Boys_(novel) https://en.wikipedia.org/wiki/IBM_Personal_Computer https://www.computerhistory.org/timeline/1981/ No OS !
  • 21. Three options: • CP/M-86 (Control program for Microcomputers), see also DR page • UCSD p-System • PC DOS/MS DOS, see also 86-DOS See also: PL/M Introduction #2: History (cont) 1982 80186, 80188 1982-1991 80286 1985-2007 80386 80186 : almost not used in PC’s, many improvements in instructions (kept). 80286 : 16mb protected mode–promise not fullfilled. Real mode −→−→ Prot mode
  • 23. Doubling of registers again EAX = xxxxxxxxxxxxxxxx ahahahah alalalal 22 Flags register becomes EFLAGS : Additionally: • Control Registers CR0..CR7 (CR0=MSW(Machine Status Word) on 80286) • Test Registers TR0..TR7 • Debug Registers DR0..DR7 64 bit mode adds RAX,... 23
  • 24. 24 On paging Virtual memory allows to execute programs larger than physical mem- ory. Generally cannot be controlled by the programmer, paging algorithms are implemented by the OS Page replacement algorithms Application algorithms can be tailored for paging environment. Example: #define N 1024 int x[N][N],y[N][N],z[N][N]; int i,j;
  • 25. for (int i=0; i<N; i++) for (int j=0; j<N; j++) z[i][j]=x[i][j]+y[i][j]; vs #define N 1024 int x[N][N],y[N][N],z[N][N]; int i,j; for (int i=0; i<N; i++) for (int j=0; j<N; j++) z[j][i]=x[j][i]+y[j][i]; Will the two programs run equally fast ? 25 https://www.geeksforgeeks.org/page-replacement-algorithms-in-
  • 26. operating-systems/ Assume 3 pages are available. (1 page is exactly a row of a matrix above.) Two dimensional arrays are stored row by row. First program : 1024 swaps. Second program : 10242 swaps. Technical info 26 https://wiki.osdev.org/Paging 2.8 Back to History: 32 bit OS?) OS/2 1987-2001 27 https://en.wikipedia.org/wiki/OS/2
  • 27. 80486 : 1989 8087 : 1980 80187 (for 80186), 80287 (for 80286), 80387(for 80386). Other coprocessors existed. Stack design, 8 80-bit registers ST(0), ST(1),.. ST(7). 80486 = 80386 + 80387 Datatypes: 32bit single (float in C/C++) 64bit double 80bit extended (internal format) Pentium : 1993- 1993 Pentium (P5), why not 80586? (80486.00+100.00=???) 28 https://en.wikipedia.org/wiki/Pentium https://en.wikipedia.org/wiki/Pentium
  • 28. https://en.wikipedia.org/wiki/Intel_8087 https://en.wikipedia.org/wiki/Intel_80486 1995 Pentium Pro (P6), MMX addition 1997 Pentium II 1999 Pentium III 2000 Pentium 4 MMX: • MultiMedia eXtension • Multiple Math eXtension • Matrix Math eXtension Intel Core (from 2006) 29 https://en.wikipedia.org/wi ki/Intel_Core
  • 29. https://en.wikipedia.org/wiki/Pentium_4 https://en.wikipedia.org/wiki/Pentium_III https://en.wikipedia.org/wiki/Pentium_II https://en.wikipedia.org/wiki/Pentium_Pro Chapter 3 Instructions 3.1 Overall structure of asm program • Header – TBD • Sequence of instructions • Trailer – TBD Instructions generally are written one per line (minor exceptions later) Instructions generally follow the following format: [<label>:] <opcode> [<operands>] [;comment] [<label>:] [;comment]
  • 30. where <label> – optional label (any identifier that is not a keyword or defined oth- erwise). <opcode> – name of the instruction (keyword) <operands> – comma-separated operands, if any; their number (0-3) depends on the opcode ;comment – any text, ignored up to the EOL. Trivial example: 30 lab: ; this line does not do anything Symbolic representation of instructions corresponds to particular se-
  • 31. quence of bytes which are actually executed. 3.2 The NOP instruction NOP (do nothing) Binary representation: one byte, hex value 90h. Execution: Before: bb bb bb bb bb bb bb bb bb ↑IP 90 bb bb bb bb bb bb After: bb bb bb bb bb bb bb bb bb 90 ↑IP bb bb bb bb bb bb IP is incremented by 1; no other register is changed 31
  • 32. WHY have it? • delay? • padding for sloppy compilers • patching (code deletion) • reserving space for patching(code addition) 32 3.3 The MOV instruction MOV dst,src (copy src to dst) Example: MOV AL,BL ;
  • 33. ; before : AL=3 BL=7 ; after : AL=7 BL=7 Example: MOV DL,CH MOV DL,DL MOV AX,CX MOV AX,SP MOV SP,CX ; very dangerous MOV EDI,EDI MOV EDI,ESP MOV AL,CX ; illegal MOV EDI,CX ; illegal
  • 34. MOV IP,AX ; illegal MOV AX,CS ; ok, special case (see below) MOV DS,AX ; ok, special case (see below) MOV CS,DX ; special case, illegal MOV DS,EDI ; illegal MOV CR0,EAX ; priveleged MOV DR0,EAX ; ok, special case (see below) RULE #1: size of src and dst must match Most instructions support only gp regis- ters 33 Argument types:
  • 35. • (r)egister • (m)emory • (i)mmediate • (s)pecial register Argument size: • (b)yte • (w)ord • (d)oubleword • ... MOV DL,CH ; brr instruction 34 General template for 2-arg instructions:
  • 36. r m i r . . . m . . . i . . . Move-specific template: r m i s r . . . . m . . . . i . . . . s . . . . 35 Right now: r m i r X . . m . . . i . . .
  • 37. Examples: MOV AL,[100] ; brm MOV BX,[200] ; wrm MOV EDI,[400] ; drm MOV [100],AL ; bmr MOV [200],BX ; wmr MOV [400],EDI ; dmr Thus r m i r X X . m X . . i . . . What does [#] really mean? Answer: bytes beginning with byte #. in
  • 38. MOV AX,[100] which byte goes where? 36 Examples: MOV AL,1 ; bri MOV DX,2 ; wri MOV EDI,4 ; dri r m i r X X X m X . . i . . . Examples: MOV AL,97 ;
  • 39. MOV AL,61h ; all four lines are equivalent MOV AL,01100001b MOV AL,’a’ ; ... MOV AL,1000 ; ??? 37 No storing into immediates, this would be like 1=x; in C. Thus: r m i r X X X m X . . i × × ×
  • 40. Important: MOV with immediate is a fundamentally different operation from the rr,rm, mr forms. 38 RULE #2: no memory-to-memory (2 exceptions later) Thus: r m i r X X X m X × ? i × × × MOV [100],1 ; should not compile RULE #3: size must be known Correct syntax:
  • 41. MOV byte ptr [100],1 MOV word ptr [100],1 MOV dword ptr [100],1 MOV qword ptr [100],1 ; 64 bit only MOV tbyte ptr [100],1 ; ??? What about MOV [100],AL MOV byte ptr [100],AL ; unneeded MOV word ptr [100],AL ; will not compile Final result: r m i r X X X m X × X i × × ×
  • 42. 39 Full table (MOV only): r m i s r X X X X m X × X X i × × × × s X X × × 3.3.1 Examples Here is how C/C++ assignments may be compiled: char c1,c2; c1=c2; ------------------- MOV AL,c2 MOV c1,AL short s1,s2; s1=s2;
  • 43. ------------------- MOV AX,s2 MOV s1,AX int x,y; x=y; ------------------- MOV EAX,y; MOV x,EAX; 40 int x,y,z; x=y=z; ------------------- MOV EAX,z;
  • 44. MOV x,EAX; MOV y,EAX; int x; x=0; ------------------- MOV x,0; int x,y,z; x=y=z=0; ------------------- MOV x,0 MOV y,0 MOV z,0 perhaps, a better implementation? MOV EAX,0 ; could be even better MOV x,EAX
  • 45. MOV y,EAX MOV z,EAX 41 Exercise: Exchange bytes in [100] and [101] MOV AL,[100] MOV AH,[101] MOV [100],AH MOV [101],AL can this be done in fewer lines of code? MOV AX,[100] MOV [100],AH
  • 46. MOV [101],AL Note: Byte order matters. 42 3.3.2 Byte order Consider: MOV [100],AX Does LE,reversed AL go into [100] and AH into [101] or, instead: BE,normal AH go into [100] and AL into [101] More than you want to know on Endianness LE,reversed : Intel, Dec BE,normal : IBM mainframe, Motorola, Sun
  • 47. Practical implications: • it is important to know the endiness of the hardware and the data. • it is important to be able to swap. • it is important to be able determine the endiness. How? Specific example of byte order importance: short s=1; FILE *f=fopen("try.dat","wb"); if (!f) { ... error handling ... } fwrite(&s,1,sizeof(s),f); fclose(f); Should create a 2-byte file try.dat. Now,
  • 48. 43 https://en.wikipedia.org/wiki/Endianness short s; FILE *f=fopen("try.dat","rb"); if (!f) { ... error handling ... } fread(&s,1,sizeof(s),f); fclose(f); cout << s; should print the value of s – indeed 1. But: what will happen if we run the Writing program on an Intel comp, move the data file to a Sun, and run the reading program there? Exercise: Can a high-level program be written that determines the order of bytes?
  • 49. 44 3.4 The XCHG instruction XCHG dst,src (exchange src with dst) XCHG r m i r X X × m X × × i × × × Segment and other non-gp registers are not supported. The syntax and examples from MOV apply, except for non-use of non-gp registers and immediates. Examples (which of the following are valid?) XCHG AL,AH XCHG AX,SP
  • 50. XCHG EAX,EDI XCHG AL,[400] XCHG [400],AL ;same as above XCHG AL,DI XCHG DI,DS XCHG EAX,7 XCHG [100],[101] XCHG AX,AX ; nop? XCHG DI,DI ; nop? XCHG CL,CL ; nop? 45 Can a better version of byte swap program be now written?
  • 51. Better: MOV AX,[100] XCHG AL,AH MOV [100],AX Yet better: XCHG AX,[100] XCHG AL,AH XCHG [100],AX Q: can a shorter program be written (perhaps with another instruc- tion)? 46
  • 52. 3.4.1 Binary encoding of XCHG We only consider accumulator exchanges now. Instructions XCHG AX,reg are extra optimized in the intel architecture. 90h XCHG AX,AX 91h XCHG AX,CX 92h XCHG AX,DX 93h XCHG AX,BX 94h XCHG AX,SP 95h XCHG AX,BP 96h XCHG AX,SI 97h XCHG AX,DI
  • 53. Q: Why the # of registers is a power of 2 ? A: Because this allows to represent registers as in a fixed number of bits. 47 16-bit register representation: 000b AX 001b CX 010b DX 011b BX 100b SP 101b BP 110b SI 111b DI
  • 54. An emulator may use code like unsigned short regs[8]; #define AX regs[0] #define CX regs[1] #define DX regs[2] #define BX regs[3] #define SP regs[4] #define BP regs[5] #define SI regs[6] #define DI regs[7] Notes: • this is just an example!
  • 55. • 8 bit registers have their own 3-bit keys • 32 bit registers parallel 16 bit registers • 64 bit registers use 4-bit keys • The above code should define 8-bit regs properly (f.e. setting AX should set AL,AH too! 48 • The above code should be modified to support 32 bit registers texttt{XCHG AX,AX} is NOP. General encoding scheme of XCHG (with accumulator): 1 0 0 1 0 r e g This idea is used in other instructions. XCHG without accumulator uses a l8engthier encoding, with first byte 86h/87h.
  • 56. XCHG encoding 49 https://c9x.me/x86/html/file_module_x86_id_328.html NOTE: MOV has several different forms, including optimized forms for the accumulator. Similar scheme is used for the segment registers: 00b ES 01b CS 50 10b SS 11b DS
  • 57. 3.5 The ADD instruction ADD dst,src (dst += src) (proper name should be increment by.) General 2-operand instruction layout applies: ADD r m i r X X X m X × X i × × × Given that syntax of ADD is largely similar to MOV, the examples are sim- ilar: ADD AX,BX ADD EAX,ESP ADD DL,CL ADD AX,[100] ADD [150],EAX
  • 58. ADD AX,DS ; illegal ADD AX,DL ; illegal ADD [10],5 ; syntax error ADD word ptr [10],5 ; fine C example: int x,y,z; x=y+z; ----- MOV EAX,y 51 ADD EAX,z MOV x,EAX
  • 59. int x,y,z; x=x+y; ----- MOV EAX,y ADD x,EAX int x,y,z; x=x+25; ----- ADD x,25 (NOTE: size specification is not required if x is declared to be a double word) 52 Consider ADD AL,AL
  • 60. Generally, multiplication by 2 should not be done as multiplication (generally about 3x slower than addition). Writing int x; x=2*x; is wrong! One should use either addition or a shift (if available). (What is better depends on the situation and hardware). Q: Should we replace multiplication by addition in : int f(int); int x; x=2*f(x); More simple examples: Consider ADD AL,0 ; nop ? ADD AL,1 ; increment ? ADD AL,-1 ; decrement ?
  • 61. ADD AL,AL ; double 53 MOV AL,1 ; AL=1 ADD AL,AL ; AL=2 ADD AL,AL ; AL=? ADD AL,AL ; AL=? ADD AL,AL ; AL=? ADD AL,AL ; AL=? ADD AL,AL ; AL=? ADD AL,AL ; AL=? ADD AL,AL ; AL=?
  • 62. ADD AL,AL ; AL=? MOV AL,1 ; AL=1 binary |00000001 ADD AL,AL ; AL=2 binary |00000010 ADD AL,AL ; AL=4 binary |00000100 ADD AL,AL ; AL=8 binary |00001000 ADD AL,AL ; AL=16 binary |00010000 ADD AL,AL ; AL=32 binary |00100000 ADD AL,AL ; AL=64 binary |01000000 ADD AL,AL ; AL=128 binary |10000000 ADD AL,AL ; AL=0 binary 1|00000000 << overflow ADD AL,AL ; AL=0 binary 00000000 Is this an assembler problem ? unsigned char c;
  • 63. c=1; printf("%d",c); c=c+c; printf("%d",c); c=c+c; printf("%d",c); c=c+c; printf("%d",c); c=c+c; printf("%d",c); c=c+c; .... Note: if you like C++ and cout<<, make sure to cast! 54 Q: what would be the output if we use char rather than unsigned char? Is this a size problem ? Try
  • 64. MOV AX,1 ADD AX,AX ... OR MOV EAX,1 ADD EAX,EAX ... OR C/C++ versions. 55 Unlike MOV and XCHG, ADD is an arithmetic instruction: it sets flags. Warning: the discussion of the flags is slightly simplified, I’m
  • 65. not con- sidering the OF. Thus there are slight differences between the behavior described and the actual behavior of the processor. This makes no differ- ence for most programs, but there are rare instances where this matters. In particular, I will consider JS and JL as equivalent, in reality they are not exactly the same. ZF Zero Flag SF Sign Flag CF Carry Flag OF Overflow Flag ; ZF SF CF MOV AL,1 ; AL=1 binary |00000001 ? ? ? ADD AL,AL ; AL=2 binary |00000010 0 0 0
  • 66. ADD AL,AL ; AL=4 binary |00000100 0 0 0 ADD AL,AL ; AL=8 binary |00001000 0 0 0 ADD AL,AL ; AL=16 binary |00010000 0 0 0 ADD AL,AL ; AL=32 binary |00100000 0 0 0 ADD AL,AL ; AL=64 binary |01000000 0 0 0 ADD AL,AL ; AL=128 binary |10000000 0 1 0 ADD AL,AL ; AL=0 binary 1|00000000 1 0 1 << overflow ADD AL,AL ; AL=0 binary 00000000 1 0 0 WARNING: This is slightly simplified (there is also OF) Flags can be used to • implement conditionals (IF, WHILE,...) • implement “long” arithmetic • check for overflow
  • 67. 56 3.5.1 Overflow detection unsigned int x,y,z; .... x=y+z; // concern about overflow unsigned int x,y,z; .... y=0x90000000; z=0x90000000; x=y+z; // overflow will occur here, result will be incorrect. can we check for it like this?
  • 68. unsigned int x,y,z; .... if (y+z>0xFFFFFFFF) error("overflow"); x=y+z; Correct way: unsigned int x,y,z; .... if (y>0xFFFFFFFF-z) error("overflow"); x=y+z; 57
  • 69. Exercise: what about signed types? A: you will need to check both for “positive” overflow (adding two large positive number) and for the “negative;; overflow (adding two large nega- tive numbers). In assembler, flags report overflow condition – no need for extra check- ing! 3.6 The SUB instruction SUB dst,src (dst -= src) (proper name should be decrement by.) General 2-operand instruction layout applies: SUB r m i r X X X m X × X i × × ×
  • 70. Given that syntax of SUB is identical to ADD, syntax examples are similar and omitted. ADD AX,100 SUB AX,-100 ; same as above ; ADD AX,-100 SUB AX,100 ; same as above What do these instructions do? ADD AX,0 SUB AX,0 58 What does this instruction do?
  • 71. SUB EAX,EAX Answer: most efficient way to zero up a register. What is the difference between the two instructions below? SUB EAX,EAX MOV EAX,0 Answer: the former is more efficient; the latter is rarely used, only in the situations when flags must be preserved. (an example, involving an if, will be given later.) Revising example we saw above, more efficient code: int x,y,z; x=y=z=0; ------------------- SUB EAX,EAX
  • 72. MOV x,EAX MOV y,EAX MOV z,EAX with SUB, Carry flag indicates borrowing. 59 3.7 The INC instruction INC dst (dst++) Do we write ADD AX,1 ADD byte ptr [10],1 A: yes, we can, but usually we would use the optimized form General 1-operand instruction layout applies:
  • 73. INC r m i X X × (Same format applies to three more instructions, explained later). Register form is optimized to one-byte encoding: 40h INC AX 41h INC CX 42h INC DX 43h INC BX 44h INC SP 45h INC BP 46h INC SI 47h INC DI Other forms of INC are encoded in lengthier way beginning with 0FFh
  • 74. and 0FEh. Warning: this encoding applies to BOTH 16 and 32 registers! What is better? 60 INC AX INC AX ;or ADD AX,2 A: former. But do not do this with memory arguments. 61 3.8 The DEC instruction
  • 75. DEC dst (dst- -) DEC r m i X X × Comments on INC above are applicable. Optimized form: 48h DEC AX 49h DEC CX 4Ah DEC DX 4Bh DEC BX 4Ch DEC SP 4Dh DEC BP 4Eh DEC SI 4Fh DEC DI
  • 76. Other forms of DEC are encoded in lengthier way beginning with 0FFh and 0FEh. 62 3.9 The NEG instruction NEG dst (dst=-dst) NEG r m i X X × How to negate without using NEG? NEG EAX ;is the same as SUB EBX,EBX SUB EBX,EAX
  • 77. MOV EAX,EBX Solve equation x = −x ? MOV AL,x NEG AL if AL did not change, does this mean x is 0? No, it is either 0 or 128! For short, the solutions are 0 and 215=8000h, for 32-bit they are .... 63 #include <stdlib.h> #include <stdio.h> int main() { short s,s1;
  • 78. s=0; s1=-s; if (s!=s1) printf("for s=%d, changes;n",s); else printf("for s=%d, does not change;n",s); s=1000;s1=-s; if (s!=s1) printf("for s=%d, changes;n",s); else printf("for s=%d, does not change;n",s); s=0x8000;s1=-s; if (s!=s1) printf("for s=%d, changes;n",s); else printf("for s=%d, does not change;n",s); return 0; } } results in for s=0, does not change; for s=1000, changes;
  • 79. for s=-32768, does not change; 3.10 The CMP instruction CMP dst,src (dst-src) General 2-operand instruction layout applies: CMP r m i r X X X m X × X i × × × Conditionals (if,while) are implemented in two stages: • compute flags • do (or not) the operation depending on a flag(or flags) – this is later. 64 Examples:
  • 80. ; if (x==0) ... ; ; compute ZF from value of x ; if (x!=0) ... ; ; compute ZF from value of x ; if (x<0) ... ; ; compute SF from value of x ; if (x<=0) ... ; ; compute SF and ZF from value of x What about
  • 81. ; if (x==y) ... ; ; to set the flags, we compute the difference ; if (x==y) ... ; MOV EAX,x SUB EAX,y ; this sets ZF for our use 65 ; if (x<y) ...
  • 82. ; MOV EAX,x SUB EAX,y ; this sets SF for our use ; if (x<y) ... ; MOV EAX,x SUB EAX,y ; this sets SF and ZF for our use Problem: this computation destroys value of x which is often needed. So, instead: ; if (x<y) ... ;
  • 83. MOV EAX,x CMP EAX,y ; this sets SF and ZF for our use Flags are set exactly as for SUB, but EAX retains the value of x. ; if (x==5) ... ; MOV EAX,x SUB EAX,5 ; inefficient ; SUB x,5 ; worse yet: destroys variable ; CMP x,5 ; fine
  • 84. 66 3.11 Logical or Bitwise What kind of AND operation should be implemented in hardware? (same question of course can be asked of OR etc but we will concentrate on AND) • logical AND (in C, &&) ? • bitwise AND (in C, &) ? • both ? Logical AND works with True and False concepts: 1 && 1 = 1 1 && 0 = 0 0 && 1 = 0
  • 85. 0 && 0 = 0 Q: what about 5 && 6 ? (5 is first converted to 1 (true), 6 is converted to 1 (true), we compute 1&&1 and get result 1). This is a multi-step operation! Bitwise AND: apply the AND operation to every bit (every column): 0101 == 5 & 0110 == 6 ---- 0100 == 4 Thus 5 & 6 = 4. Notice that on single bits (or multibit 0 and 1), the results of & and &&
  • 86. are identical. 1 & 1 = 1 1 & 0 = 0 0 & 1 = 0 0 & 0 = 0 67 We summarize: • On appropriate logical values, logical and bitwise AND are the same. • Logical AND is a multistep operation – hard to implement in hard- ware. • Bitwise AND can be done easily and efficiently (array of AND-gates)
  • 87. • on non-standard input values, logical AND offers minimal advantages (how often do we care about 5&&6?) • on non-standard input values, bitwise AND offers huge advantages: – masks – sets – images 3.11.1 Bits and Masks Most x86 2-operand instructions are encoded as follows: o p c o d e d w m d r e g r - m ..... for example, for ADD, opcode is 000000, MOV, opcode is 100010, etc (Encoding for forms with accumulator and immediate are similar). The w field states if the operation is word or byte. Register
  • 88. reg=000 means AX if w==1 and AL if w==0. The d field states if the direction is from register (0) or to register(1). So decoding should retrieve these bytes. currentbyte=code[ip++]; .... wordop=currentbyte & 1; from_reg=(currentbyte & 2)!=0; .... 68 alternately one can use division and mod (yuck). 3.11.2 Checking for odd/even
  • 89. Q: how do we check for a number being odd or even? if (x is odd) { ... } 69 if (x % 2 == 1) { ... } better: if (x % 2) { ... } much better: better: if (x & 1) { ... } Q: how do we check for a number is divisible by 4? if ((x & 3)==0) { ... } if (!(x & 3) { ... }
  • 90. Q: how do we check for a number is divisible by 2k? if ((x & 2k−1)==0) { ... } 70 3.11.3 Sets How do we represent a set? Consider {2,3,5,7,11,13} Idea #1: Representation as a link list: 2 ⇒ 3 ⇒ 5 ⇒ 7 ⇒ 11 ⇒ 13 ⇒ # Idea #2: Representation as an array: 2 3 5 7 11 13 Idea #3: Representation as a bit array (set) 0 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0
  • 91. Total space : 2 bytes. Search time: • list Linear • array Linear or log (binary search) • set Constant! Exercise: write the formula If A,B are similar sets, one can use & to compute the intersection: unsigned char A[N]; unsigned char B[N]; unsigned char C[N]; for (int i=0;i<N;i++) C[i]=A[i]&B[i]; Set representation is not always the right one, consider {1,10000000}!.
  • 92. 71 3.11.4 Images Consider images. Actual representation (b/w image): one dimensional bit array. What would this produce? & Conclusion: Bitwise AND is a much better choice for assembler implen- tation. 72 3.12 The AND instruction AND dst,src (dst &= src) AND r m i
  • 93. r X X X m X × X i × × × Syntax is the same as ADD,SUB,... Examples: MOV AX,5 AND AX,6 ; AX is 4 AND AX,1 ; check evenness AND AX,7 ; what does this do? AND AX,0 ; what does this do? AND AX,AX ; what does this do? AND AX,CH ; what does this do? 3.13 The TEST instruction TEST dst,src (dst & src)
  • 94. TEST is non-descructive AND, cf. CMP/SUB. TEST r m i r X X X m X × X i × × × 73 Syntax is the same as AND, ADD, SUB,... Encoding is less efficient than for other operations – TEST is less com- mon. TEST is helpful when we want to extract different bits from the same number. 3.14 The OR instruction OR dst,src (dst |= src) OR r m i
  • 95. r X X X m X × X i × × × Syntax is the same as ADD,SUB, AND... In C, bitwise or is coded as |. Hardware implements the bitwise form. Examples: MOV AX,5 OR AX,6 ; AX is 7 0101 == 5 | 0110 == 6 ---- 0111 == 7 OR AX,0 (set flags – not as efficient as RR form)
  • 96. OR AX,AX 74 (set flags – just as efficient as RR form of AND) OR AX,1 (set last bit to 1; numerically round up to an odd number.) OR AX,2 (set next to the last last bit to 1.) OR AX,3 (set last two bits to 1) For sets (and graphics) OR implements union. 3.15 The XOR instruction XOR dst,src (dst ˆ= src)
  • 97. OR r m i r X X X m X × X i × × × Syntax is the same as ADD,SUB, OR... Truth table for XOR: 1 XOR 1 = 0 1 XOR 0 = 1 0 XOR 1 = 1 0 XOR 0 = 0 75 XOR AX,AX (clear register – just as efficient as RR form of SUB)
  • 98. XOR AX,0 (only sets flags) XOR AX,1 (toggle last bit) XOR AX,7 (toggle last three bits) XOR AX,0xFFFF (toggle all bits) Involution property of XOR: ((d XOR k) XOR k) == (d XOR (k XOR k)) == (d XOR 0) == d Application #1: temporary lines: (same idea can be used for the mouse pointer or inverting selected text). 76
  • 99. Application #2: cypher: ((data XOR key) XOR key) == data char data[N]; for (int i=0; i<N; i++) data[i]=data[i] ^ key; In pure form, variation of a substitution cipher: char data[N]; char subst[256]; // contains permutation of ASCII chars for (int i=0; i<N; i++) data[i]=subst[data[i]]; Not secure, but good for puzzles! Cryptogram puzzles. With key chaining: char data[N];
  • 100. for (int i=0; i<N; i++) { char c; c=data[i]; data[i]=c ^ key; key=f(key,c); } 77 https://api.razzlepuzzles.com/cryptogram nearly unbreakable; cf. the Type 1 font encryption episode. The unbreakable Adobe encryption was actually implemented with this code: unsigned short int r; // this is the key
  • 101. unsigned short int c1 = 52845; unsigned short int c2 = 22719; unsigned char Encrypt(plain) unsigned char plain; {unsigned char cipher; cipher = (plain ^ (r>>8)); r = (cipher + r) * c1 + c2; return cipher; } (Naturally, there are myriads of small changes that can be made to the code, beginning with change of the constants... and they would lead to equally strong security!) Question: why is C not providing ^^ ? 3.16 The NOT instruction
  • 102. NOT dst (dst= d̃st) NOT r m i X X × Follows the format of INC,DEC, NEG. Example: XOR AX,0xFFFFh NOT AX ; same result as above ? Warning: NOT may produce drastically different results depending on the size of the number! 78 #include <stdlib.h> #include <stdio.h>
  • 103. int main() { unsigned short s; unsigned char c; unsigned int i; c=1; c=~c; printf("n negated byte %u ",c); s=1; s=~s; printf("n negated short %u ",s); i=1; i=~i; printf("n negated int %u ",i); } results in negated byte 254 negated short 65534 negated int 4294967294 3.17 Shift and Rotate operations This group of instructions allows to shift bits in a register in different ways. The format of all instructions in the group is the same; we will
  • 104. start by considering the 79 3.17.1 SHL SHL r/m,amount (SHift Left) instruction. The first argument of the instruction must be a register or a memory location; the second specified the number of positions the bits of the first argument will be shifted. There are three ways the shift amount can be specified 1. as number 1 2. as register CL (no other register is allowed) 3. as integer number greater than one. We separate this case
  • 105. from the first one since the encoding is different and this format did not exist in the original 8086 (added with 80186). Here is a simple example of the SHL execution: MOV AL,11 SHL AL,1 to see what is the resulting value in target, let’s write AL in binary: 0 0 0 0 1 0 1 1 Shifting the bits left by one position would result in 0 0 0 1 0 1 1 0 where the leftmost bit (0) is moved out and a zero is written to the last position. Numerically the value is 22. It is not an accident that 22 = 11 × 2; the effect of appending a
  • 106. zero to a binary number is multiplication by 2 (just like appending a zero to a decimal number is multiplication by 10). Unlike the “generic” multiplica- tion described in a later section, this is an efficient and cheap operation – moving bits is naturally cheaper than invoking the multiplier. 80 If the first bit of the number were 1, pushing it out of the register will result in “an overflow” and an incorrect result. For example, shifting 128 in an 8-bit register will produce 0. (Notice that the correct value of 256 is impossible on an 8-bit register.) The first bit, incidentally is not lost – it is moved to the Carry flag; checking the carry flag therefore offers a way to detect an overflow and
  • 107. possibly correct the calculation. CF= 0 0 0 0 1 0 1 1 0 The other two forms of SHL arguments allow to shift the register by more than one position. Consider MOV AL,11 SHL AL,3 or MOV AL,11 MOV CL,3 SHL AL,CL Both snippets will do the same : shift AL by 3 positions to the left. Three leftmost bits are pushed out, three zeros are appended at the end, changing
  • 108. CF= ? 0 0 0 0 1 0 1 1 to CF= 0 0 1 0 1 1 0 0 0 It is an efficient way to multiply by 8 = 23,and in general left shift by n positions is equivalent to multiplication by 2n. The bits pushed out cannot be all placed into the carry; two of them are irreversibly lost and only the last bit to be pushed out ends up in the carry flag. In the previous example the bit ending up in the carry is shown in red. 81 Unless an overflow occurs, SHL multiplication often works correctly even with negative numbers. For example, -1 (binary 11111111
  • 109. becomes 11111110, which is indeed -2). This instruction does not change either zero or sign flags. NOTE: shift instructions like SHL existed on many processors even be- fore Intel and got incorporated into the C language ( << ). Thus, one can write in C either x=y+y; or x=y<<1; instead of slower x=y*2. Which of the two options is better? In most cases, about the same. The right choice also depends on the particular formula. For instance y=f(x)*2; // function call should not be even changed to y=f(x)+f(x); since this would result in a double call of a function! – definitely slower and possibly a bug, if the function has side effects. Using the shift here is certainly fine. On the other hand, some languages do not even offer shift operator making the addition
  • 110. is the only alternative to (slower) multiplication 82 3.17.2 SHR SHR r/m,amount (SHift Right) instruction has the same syntax as SHL but shifts the bits to the right. We again look at an example: MOV AL,11 SHR AL,1 In binary, 11 is CF= ? 0 0 0 0 1 0 1 1 shifting this pattern right results in the last bit (1) expelled from the register into the carry flag, the remaining bits moving right by one
  • 111. position and a zero written into the leftmost position, obtaining CF= 1 0 0 0 0 0 1 0 1 (decimal 5). It is not an accident that 5 = 11/2, for unsigned (or nonnega- tive) numbers shifting right by 1 is equivalent to division by 2, and shifting right by n bits is division by 2n. (Analogy: with decimal numbers deleting the last digit is the same as division by 10). when shifting by more than one it is the last bit that shifted-out from the register that ends up in the carry. 83 One more example to consider : MOV AL,255 SHR AL,1
  • 112. the result of shifting CF= ? 1 1 1 1 1 1 1 1 is CF= 1 0 1 1 1 1 1 1 1 in terms of division we see 127 = 255/2, correct. However, in MOV AL,-1 SHR AL,1 we see the same exact values (255 is -1), but obviously incorrect result: 127 = −1/2. In fact the results will be always incorrect when shifting a negative number: after the shift the sign bit (1) is replaced by 0! NOTE: in C/C++ bitshift operators << and >> use SHL and SHR for un- signed (or signed but positive) numbers. Applying them to negative num-
  • 113. bers is supported by syntax but unpredictable, the result may or may not be correct, depending on the compiler (it is correct under BCC). C/C++ syn- tax allows negative values of shift, but the results are again unpredictable (under BCC, the result would be always 0). Division of negative numbers via a shift is, however, possible in assem- bler with the 84 3.17.3 SAR SAR r/m,amount (Shift Arithmetic Right) instruction. SAR is nearly identical to SHR; the only difference is that SHR inserts zeros at the left, SAR duplicates the first bit. Thus number 255 (11111111)
  • 114. CF= ? 1 1 1 1 1 1 1 1 would become CF= 1 0 1 1 1 1 1 1 1 after a SAR (as seen above), but after SHR it is CF= 1 1 1 1 1 1 1 1 1 where the first 1 is copied to the right but also left where it was. SAR produces correct results for division of signed/negative numbers: the calculation in the example above is division of -1 by 2, resulting in -1 – correct result assuming rounding to −∞. NOTE: Java implements both SAR and SHR as >> and >>>. 85 3.17.4 SAL
  • 115. For symmetry, Intel also has SAL r/m,amount (Shift Arithmetic Left) while SAL is named and even encoded differently, it has the same function- ality as SHL. 86 3.17.5 ROL ROL r/m,amount (ROtate Left) rotates bits leftward, the high bit(s) pushed out appear on low end. MOV AL,79h ; AL is 01111001 ROL AL,1 ; AL is 11110010 ROL AL,1 ; AL is 11100101
  • 116. MOV CL,2 ROL AL,CL ; AL is 10010111 ROL never loses any bits and does not use the Carry flag. Notice that ROL AL,8 does not do anything, yet another equivalent of NOP. Likewise ROL AX,16 does not change any registers. Exercise: What does ROL AX,8 do? 87 3.17.6 ROR ROR r/m,amount (ROtate Right)
  • 117. is similar, except that the bits are rotated right. Notice that the two instruc- tions below are equivalent. ROL AL,3 ROR AL,5 ROL and ROR are rarely used, albeit this example should be of interest: ROL word ptr [100],8 88 3.17.7 RCL RCL r/m,amount (Rotate Carry Left) rotates the bits in register and the carry flag left. High bits being rotated out are moved to the carry while the carry flag is entered on the
  • 118. low end of the register. MOV AL,79h ; AL is 01111001 CY=c (undefined) RCL AL,1 ; AL is 1111001c CY=0 RCL AL,1 ; AL is 111001c0 CY=1 RCL AL,1 ; AL is 11001c01 CY=1 rotation by more than one can be seen as repeated rotations by 1, except done as a single instruction. 89 3.17.8 RCR RCR r/m,amount (Rotate Carry Right) is similar to RCL except for the direction of the rotation.
  • 119. !!! The code below uses some assembler syntax that has not been cov- ered yet !!! While RCL and RCR may appear strange but they have a number of us- ages. One of them is the ability to shift (that is multiply or divide by a power of two) numbers that are larger than the register size. Assume that x is a 100-byte (800-bit) number stored in memory locations [500] through [599]. We would like to multiply it by 2. Remember that the bytes are stored in reverse order. SHL byte ptr DS:[500],1 ;high bit is moved to the carry RCL byte ptr DS:[501],1 ;carry to 2nd byte, high bit to carry. RCL byte ptr DS:[502],1 ;carry to 3rd byte, high bit to carry. ... RCL byte ptr DS:[599],1
  • 120. or, as a loop: CLC MOV BX,500 MOV CX,100 L: RCL byte ptr DS:[BX],1 INC BX LOOP L 90 Exercise: How would you divide this 800-bit number by 2? Exercise: Would signed and unsigned numbers use the same code? to multiply by 2k, you need to shift by k positions. This must be
  • 121. done as 1-position shifts repeated k times since the carry can hold only one bit, therefore a double loop is required. another application for these instructions is shifting a monochrome image by one or several pixels. The code is similar to the one used for shifting long numbers above. first two forms of all eight shift/rotate instructions are encoded to- gether, using bytecodes D0 through D3. The “register” field in the 2nd byte of the encoding actually denotes the operation. The third form uses C0 and C1, in a similar fashion. 80386 and all newer processors have double-register-shift operations SHLD and SHRD that allows to write more efficient code; we do not describe them here.
  • 122. 91 3.18 Data conversion The arithmetic instructions studied in the previous sections are sufficient to implement many formulas, but not all. One particular problem is mixing of variables of different sizes, as seen in the following C-language example int x; short y; char z; x = y+z; If all three variables were declared the same way (for instanc e, all three int’s), we could have implemented the computation with already known
  • 123. instructions: MOV EAX,y ADD EAX,z MOV x,EAX but the size must match rule prevents us from mixing operands of different size. Thus, we are in need for data conversion instructions – ones that change the size of a variable without changing the value. Before showing the available instructions, let’s understand the exact problem. In the example above we would like to add byte (char) quantity z with word (short) quanity y. To do this, we need to convert z to also a word quantity that has the value equal to the original value of z, this word quan- tity can be added with y. (We would later need to convert the
  • 124. sum – a word quantity – to a doubleword, to be able to store it into x). We actually have a way to do this conversion in some cases. For exam- ple, if we assume z to be equal to 1, the following assembler code would work: MOV AL,z SUB AH,AH Bingo: by zeroing-up the high bits in the AX register, we extended the value into a word, and can add it with the value of y: 92 MOV AL,z SUB AH,AH
  • 125. ADD AX,y Assuming y is equal to 3, the result would be 4, stored in AX. To save it into x we can append two more zero bytes: MOV AL,z SUB AH,AH ADD AX,y MOV word ptr x,AX MOV word ptr x[2],0 ZERO EXTENSION, used in the example, is in fact a correct solution, just not for the our formula; with diffently chosen numbers it will fail. Consider int x; short y=1;
  • 126. char z=-1; x = y+z; ZERO extending z as we did above will result in the value 255 in the AX; x therefore will be computed as 256. Why is this happening? ZERO EXTENSION always produces non-negative numbers, even if the original value had the sign (high) bit set, the extended value will always have zero in the high position. Thus, it is suitable for un- signed data types, the example unsigned int x; unsigned short y; unsigned char z; x = y+z; will work fine with any values of x and y. For signed data types
  • 127. we need a different type of extension. For simplicity, let’s look only at conversion AL ⇒ AX. The value in AL is considered as signed. If it happens to be nonnegative, we can ZERO-EXTEND, as we did above SUB AH,AH 93 if it happens to negative, then we should instead fill AH with 1s, not 0’s: MOV AH,0FFh In general, the extension byte(or bytes) will be filled with the sign of the number, the procedure itself is therefore called SIGN EXTENSION. One practical difference between the two types of extensions: while
  • 128. the unsigned (ZERO) extension can be implemented efficiently with the instructions we already have, an implementation of the signed extension would require coding the logic like if (AL>=0) AH=0; else AH=255; can this pseudocode be written in assembly? Yes, of course. Can it be written efficiently? No. Exercise: Implement the pseudocode above in assembler. Therefore, the instruction set provides sign conversion instructions which we will now introduce CBW (Convert Byte to Word) CBW, just like other instructions in this group, does not have arguments, they all work on the accumulor. CBW specifically converts AL to AX, using sign extension.
  • 129. CWD (Convert Word to Doubleword) CWD converts 2-byte value stored in AX into 4-byte value stored on DX and AX, DX has the low bits. Why not convert to a single doubleword register? Two reasons: firstly, CWD was added before the processor had 32-bit registers, but see CWDE below. 94 Secondly, in some cases this is actually more convenient; see the examples related to division below. Our example program, written in assembler, would use both of these instructions: MOV AL,z
  • 130. CBW ADD AX,y CWD MOV word ptr x,AX MOV word ptr x[2],DX The following two instructions did not exist in the original 8086 and were added only in 80386, when 32-bit registers were first introduced: CWDE (Convert Word to Doubleword Extended) sign-extends AX into EAX. CDQ (Convert Doubleword to Quadword) sign-extends EAX into EDX:EAX (to store a quadword on a single register one would need a 64-register!)
  • 131. The four forms given above are the most efficient to use and the code should be written in such a way as to have the data to be sign- converted on the accumulator; unsigned data conversion, on the other hand, can be done on other registers: MOV BL,z SUB BH,BH ; now BX has zero-extended value of z. In addition to these instructions, Intel – beginning with 80386 – offers more general forms 95 MOVSX target,source (MOV with Sign eXtension) MOVZX target,source (MOV with Zero eXtension)
  • 132. where target and source are either memory or registers, at least one of the operands must be a register ( memory-to-memory operations are not allowed ), and – unlike the usual “size must match” rule, here the target should be larger than the source. For example: MOVSX AX,AL MOVZX EBX,DL MOVSX word ptr DS:[10],BL MOVZX EAX,byte ptr DS:[BX] Notice that the first instruction in the example is equivalent to CBW – in terms of what it does! – but its encoding is 4 bytes vs only 1 and execution is slower. Thus, the accumulator forms are still the most efficient.
  • 133. Encoding: CBW is encoded as a single byte 0x98, CWD as a single byte 0x99. Data conversion instructions are not considered to be arithmetic, there- fore no flags are altered. WARNING: X86 emulator does not support 32-bit instructions, so CWDE, CDQ, MOVSX, MOVZX would be rejected by it. Maybe one day :P Now, what about conversion from a larger data size to a smaller one? Such situations surely happen, for example: short a, int b; a=b; 96
  • 134. Well, in general this cannot be done: the data range of int is larger than that of short, so not every int value can be correctly represented as a short. Most compilers would issue a warning and proceed by saving only the low word: MOV EAX,b MOV a,AX half of the EAX register is not saved at all! This may or may not work correctly, depending on the value of b, and assembler cannot solve what is impossible to solve. Assembler, however, offers a simple test that checks if the operation will work correctly: MOV AX,word ptr b % lo half of b MOV BX,word ptr b+2 % hi half of b CWD
  • 135. CMP DX,BX JNZ error MOV a,AX In other words, if the true value of b can be recovered from the lo half, the number fits in the short range. For unsigned types, the check is even simpler CMP word ptr b+2,0 % hi half of b JNZ error MOV AX,word ptr b % lo half of b MOV a,AX 3.19 Multiplication and Division These two operations were left to the end, because of the differences from
  • 136. other operations. One of the differences is the higher cost: multiplication is slower (processer- dependent, but 3 times slower is generally a correct estimate) from other operations; division is yet slower. (compare this with performing the op- eration on paper: addition and subtraction are easy, multiplication is more 97 difficult, and if forgot how unpleasant division is, review your elementary school notes!). Because of this one should always try to use these operations sparingly, or – if possible – not use them at all. For example, multiplication by a power of 2 should never be done as multiplication, same goes for division
  • 137. by a power of 2. This is far from the only possible savings. Another problem – specific to multiplication – is the growth of the length of the result. With binary operations studied so far the lenght of the result is the same as the length of the arguments (AND, OR, . . . ) or longer by only one bit which can be saved in carry (ADD, SUB, . . . ). Multi- plication may double the lentgh – this requires a different type of syntax, and different understanding of the “size must match” rule. (With division, we will see yet another form of syntax and a direct violation of the “size must match” rule.) Further, not every register is capable of doing these operations. The most common/usable forms require the accumulator (or work best on the accumulator). Finally, multiplication and division provide different operation
  • 138. for signed and unsigned arithmetic. With this in mind, let’s look at a sample instruction: IMUL r/m (Integer (signed) MULtiplication) We notice that while multiplication operation requires two arguments, the format specifies only one. The register or memory operand is multiplied by the accumulator of the same size as the operand. For example: IMUL CL ; multiply CL by AL (8 bit) IMUL word ptr DS:[BX]; multiply memory word by AX (16 bit) IMUL ESI ; multiply ESI by EAX (32 bit) the result is saved in the extended accumulator, it is AX for 8 bit operands, DX:AX for 16 bit arguments and EDX:EAX for 32 bit arguments. In all cases the length of the extended accumulator is twice the length
  • 139. of the arguments. This assures that the result is computed correctly, but does not 98 suggest what to do with the result if it is too large to work with (64 bit in a 32-bit program, or 128 bit after a 64-bit multiplication that also exists.) Example: MOV AL,7 MOV CL,13 IMUL CL ; results in AX being 7*13=91 MOV AL,255 MOV CL,255
  • 140. IMUL CL ; results in AX being 1. in the second multiplication the arguments are interpreted as signed num- bers, so we are actually squaring -1!. (Intel opcode names are not always consistent. IMUL uses the word Integer to mean Signed! Unsigned multiplication that we look at next is also an integer multiplication, all the x86 registers are integer! The unsigned counterpart is MUL r/m ((unsigned) MULtiplication) and follows the same format as IMUL. The results, however, may be differ- ent: MOV AL,7 MOV CL,13 MUL CL ; results in AX being 7*13=91
  • 141. MOV AL,255 MOV CL,255 MUL CL ; results in AX being 0xFE01 = 65025 We now turn our attention to division. In addition to the already men- tioned, it has one extra feature: division is not always possible. But let us begin with the format: 99 IDIV r/m (Integer (signed) DIVision) Once again, some of the arguments are implicit, the explicit argument is the Divisor in the operation. The quotent will appear in the accumulator, of the same size as the divisor; the divident is taken from the extended accumulator. For example:
  • 142. MOV AX,100 MOV BL,7 IDIV BL ; result (14) in AL ... IDIV word ptr ds:[10] ; DX:AX is divided by memory word, result in AX ... IDIV ESI ; EDX:EAX is divided by ESI, result in EAX Division operations compute the remainder, which is stored in the 2nd half of the extended accumular (AH, DX, EDX ), this makes separate MOD operation unnecessary. DIV r/m ( (unsigned) DIVision)
  • 143. it the unsigned counterpart of IDIV Exercise: find inputs for which IDIV and DIV produce different results. In some cases division cannot be performed and results in an exception (often meaning a program crash or even an OS crash: SUB CX,CX IDIV CX 100 division by zero is the primary example of it, the above example will fail regardless of the value of the divident. In case of a division overflow (division by zero is one example of it, but not the only one), the hardware will call INT 0 (see explanation of
  • 144. interrupts and handlers which is TBW). What happens next fully depends on the installed interrupt handler; the outcome may be an error message and program termination, crash of the operating system, or a normal exit from the program with a error message like your program has performed an illegal operation. Division overflow may occur with a non-zero divisor too: MOV AX,1000 MOV CL,2 IDIV CL While we are dividing by 2, this division cannot be done: 1000/2 is 500, too large for a byte register (AL). The outcome is the same as with division by zero, and in fact some exception handlers may report this as a division by zero – it is not.
  • 145. In the next example, division result (same 500) is to be placed into 2 byte accumulator. MOV AX,1000 MOV CX,2 IDIV CX this appears possible, but an overflow still may occur. Exercise: Understand why In general to avoid overflows one should use longer registers, a byte division is very overflow prone. IMUL, unlike the other instructions in this group, has additional formats, including multiplication by an immediate. TODO : discuss MULDIV TODO : discuss DIVMOD
  • 146. 101 3.20 LEA : Load effective address LEA s,t (Load Effective Address) s=&t; LEA is not an arithmeric operations sensu stricta, but it is closely related to address computation used in arithmetic instruction. We begin with a closer look at what arithmetic instructions actually do, for example ADD AX,[BX+SI+10] Two parts of the computation are: 1. compute the memory address : BX+SI+10
  • 147. 2. retrieve the data from the computed address and add it to the AX register. We notice that our sample instruction actually does three additions, not one! The first part of the computation is exactly what LEA does, and using LEA we can break the addition into two steps: LEA DI,[BX+SI+10] ADD AX,[DI] This is an equivalent, and obviously slower computation – we bring it for illustration purposes only, not a suggestion for coding. LEA is more restrictive than other 2-operand instruction in terms of the allowed operands: the second operand must be a memory (otherwise the concept of an address does not apply; the first argument is
  • 148. therefore always a register. LEA, not being an arithmetic instructions, does not affect flags. The size must match rule does not apply to LEA – it actually does not move data. LEA allows the arguments to be of different sizes, albeit such forms of the instruction are rarely useful. The following are all valid exam- ples with different sizes of the arguments (16 and 32): 102 LEA DI,[BX+SI+3] ; 16 and 16 LEA EDI,[BX+SI+3] ; 32 and 16, 0-extension used LEA DI,[EAX+EBX+3] ; 16 and 32, truncation LEA EDI,[EAX+EBX+3] ; 32 and 32
  • 149. With static addresses, LEA does not accomplish anything more than MOV can do : msg DB ’Hello, World!’ LEA SI,msg MOV SI,offset msg (but notice the difference in the syntax). In this case, MOV is slightly more efficient (3-byte encoding rather than 4) and some assemblers (TASM!) will actually compile LEA as MOV! With addresses that include registers, LEA can be emulated as two or three instructions: nums DD 1000 dup(?) ...... LEA AX,nums[BX]
  • 150. ...... MOV AX,offset nums ADD AX,BX One important but rarely known detail is that the ability of LEA to per- form “free” additions translates to its ability to do free multiplications, lead- ing to a new fast way to multiply by small numbers. Recall1 that in 32-bit mode the addresses are in the form [base*scale+index+offset] with base being any 32-bit register but ESP, index being any 32-bit register, scale is 1,2,4 or 8. Now, consider the following example: 1actually this section has not been written!—maybe it will be yet
  • 151. 103 LEA EAX,[EAX+EAX*2] This is a valid 32-bit addressing instruction, we can use the same regis- ter as the “index” and the “base”. The effect of the instruction is multipli- cation by 3, not surprisingly faster than IMUL. Using LEA EAX,[EAX+EAX*2] LEA EAX,[EAX+EAX*4] to multiply by 15 is also faster than IMUL, despite being two instructions. LEA EAX,[EAX+EAX*2] SHL EAX,4 is a faster way to multiply by 12, and so on. For many small
  • 152. numbers there are similar tricks. 3.21 Long integers: ADC and SBB Integer arithmetic can be performed on numbers that consists of more bits than the register size. This was done already on 8-bit processors in the Dark Ages of CP/M; exactly the same approach works on modern 32- and 64- processors to work with numbers that are yet longer. While the technique is the same, the usability of it is less now since for most computations even 32-bit integers are sufficient. We will first work out the technique mathematically using 8-bit registers only, while trying to add 16-bit numbers. (The techique is fully scalable, the choice of small register sie is to make the examples easier to understand.) Consider this calculation:
  • 153. short x,y,z; z=x+y; Using 16-bit registers we can write it in assembler as ; program 1 MOV AX,x ADD AX,y MOV z,AX 104 We now assume that the 16-bit registers are not available and try to perform it using 8-bit registers only. How about this code? ; program 2
  • 154. MOV AL,byte ptr x ADD AL,byte ptr y MOV byte ptr z,AL MOV AH,byte ptr x[1] ADD AH,byte ptr y[1] MOV byte ptr z[1],AH for some values it will indeed work. For example, for x=1234h and y=5678h, the result would be 68ACh, same in both programs: 1234h 12h 34h + 5678h + 56h 78h ----- ------- 68ACh 68h ACh For others, the result would different. If x=80h (128 decimal)
  • 155. and y=80h, the 16-bit program will produce 100h (256 decimal) correctly, whereas the 8-bit program above will produce 0: adding 80h with itself on an 8-bit register gives 0!. 0080h 00h 80h + 0080h + 00h 80h ----- ------- 0100h 00h 00h The source of the problem is the carry produced by the first 8- bit addi- tion: we are not using it. 80h+80h indeed results in 8 zero bits, but also a carry, indicating an extra one to be added to the 9th bit – but we are ignoring it. It is actually possible to correct the above program without introducing
  • 156. new instructions: ; program 3 MOV AL,byte ptr x ADD AL,byte ptr y 105 MOV byte ptr z,AL MOV AH,byte ptr x[1] JNC skip INC AH skip: ADD AH,byte ptr y[1] MOV byte ptr z[1],AH
  • 157. (the INC instruction adds the previously ignored carry), but the Intel in- struction set has a shortcut that would result in a faster code: ; program 4 MOV AL,byte ptr x ADD AL,byte ptr y MOV byte ptr z,AL MOV AH,byte ptr x[1] ADC AH,byte ptr y[1] MOV byte ptr z[1],AH The new ADC instruction adds the 2nd argument and the carry to the 1st argument – this is exactly the computation needed. ADC t,s; t=t+s+carry (ADd with Carry)
  • 158. Except for the addition of the carry, ADC follows exactly the same rules as ADD, including the syntax and setting of the flags. We now scale the example: how about adding 128-bit numbers? This cannot be done with a single addition on any of the current processors – you would need 128-bit registers for that! In this example we will assume that x occupies 16 bytes (128 = 16 × 8) of memory beginning with address 100, y and z (same size both) are located at 200 and 300. We further assume that the bytes in out 128-bit integers are written in the Intel reversed order. The objective is, as before, to compute z=x+y. How about this? 106
  • 159. ; program 5 MOV EAX,dword ptr x ADD EAX,dword ptr y MOV dword ptr z,EAX MOV EAX,dword ptr x[4] ADC EAX,dword ptr y[4] MOV dword ptr z[4],EAX MOV EAX,dword ptr x[8] ADC EAX,dword ptr y[8] MOV dword ptr z[8],EAX MOV EAX,dword ptr x[12] ADC EAX,dword ptr y[12]
  • 160. MOV dword ptr z[12],EAX notice that this computation could have been done using only two 64-bit additions, or using 8 16-bit additions, or using 16 8-bit additions. The code would look similar in all cases; naturally the code using longer registers will have fewer instructions and would run faster. The code shown in the previous program can be extended to handle interegs of any size. It may be preferrable, however, to write this code as a loop. We notice that all the editions are done using the ADC instruction, except for the very first one, done by ADD – this is because there is no carry on the very first addition. We can use ADC for all additions if we are certain that the initial value of the carry flag is 0. The loop form of the previous program thus becomes: ; program final??
  • 161. CLC ; clear carry MOV CX,4 SUB BX,BX L: MOV EAX,dword ptr x[BX] ADC EAX,dword ptr y[BX] MOV dword ptr z[BX],EAX ADD BX,4 LOOP L NOTE: the program above should be seen just an idea; to make it work one needs to ensure that the carry flag is not corrupted by other instruc- tions! – this is left to the reader. The new instruction
  • 162. 107 CLC (CLear Carry) sets carry to 0, thus allowing us to use ADC for the first loop iteration. We may as well introduce two other instructions that also alter the carry flag. STC (SeT Carry) set carry to 1 CMC (CompleMent Carr) toggle the carry bit Yet longer numbers can be handled with the code above by simply in- creasing the number of times the loop is executed. The execution time will
  • 163. be proportional to O(n/r) where n is the bitlength of the numbers, and r is the bitlength of the register. A similar problem for subtraction will not be examined in details. We shall only say that the analog of ADC for subtraction is the SBB instruction: SBB t,s; t=t-s-carry (SuBtract with Borrow) and all of the above examples can be adapted to subtraction by changing all ADD’s to SUB’s and all ADC’s to SBB’s. 108 3.21.1 24-bit case As a special example, consider 24-bit (3 byte) numbers. Exercise: Why would one even want to look at such?
  • 164. Addition of such numbers can be done as three byte size additions or as one byte size and one word size. ; program 6 x db 3 dup(?) y db 3 dup(?) .... MOV AL,byte ptr x[0] ADD AL,byte ptr y[0] MOV byte ptr z[0],AL MOV AL,byte ptr x[1] ADC AL,byte ptr y[1] MOV byte ptr z[1],AL MOV AL,byte ptr x[2]
  • 165. ADC AL,byte ptr y[2] MOV byte ptr z[2],AL Notice that even in column addition of decimal numbers, chunks do not need to be of the same size: 923 + 278 --- we can add 23+78, obtaining 01 and carry of 1! In programs below we break the numbers into byte+word and word+byte chunks. 109
  • 166. ; program 6a x db 3 dup(?) y db 3 dup(?) .... MOV AL,byte ptr x[0] ADD AL,byte ptr y[0] MOV byte ptr z[0],AL MOV AX,word ptr x[1] ADC AX,word ptr y[1] MOV word ptr z[1],AL ; program 6b x db 3 dup(?) y db 3 dup(?)
  • 167. .... MOV AX,word ptr x[0] ADD AX,word ptr y[0] MOV word ptr z[0],AX MOV AL,byte ptr x[2] ADC AL,byte ptr y[2] MOV byte ptr z[2],AL Exercise: Are 6a and 6b equivalent? 3.21.2 other operations • Bitwise AND, OR, XOR, NOT simply repeat the same operation on each chunk (loops for long data). • Shifts propagate carry.
  • 168. • NEG is left as a (nice!) exercise. • (I)MUL is more complicated • (I)DIV is even more complicated 110 The idea of long MUL: Assume x,y,z is twice the size of the register that is capable of multipli- cation (N-bit long). Then x=xh ×R + xl, y=yh ×R + yl, where xl,yl are the low halves of the values, xh and yh are the high halves, R is 2N. x∗ y = (xh∗ R+xl)∗ (yh∗ R+yl) = xhyh∗ R2 + (xhyl +xlyh)∗ R+xlyl) The formula contains 4 multiplications (multiplications by R are simply
  • 169. data movements) Assume now that x and y are 64 bit, z is 128 bit. x dd ?,? y dd ?,? z dd ?,?,?,? SUB EAX,EAX MOV z[8],EAX MOV z[12],EAX MOV EAX,x[0] MUL y[0] MOV z[0],EAX MOV z[4],EDX MOV EAX,x[0]
  • 170. MUL y[4] ADD z[4],EAX ADC z[8],EDX ; ADC z[12],0 MOV EAX,x[4] MUL y[0] ADD z[4],EAX ADC z[8],EDX ADC z[12],0 111 MOV EAX,x[4]
  • 171. MUL EDX ADD z[8],EAX ADC z[12],EDX This seems to required 4 multiplications (and generally m*N numbers would require m2 multiplications); in reality only 3 are needed. (Toom’s algorith – google for more) Idea: x∗ y = (xh∗ R+xl)∗ (yh∗ R+yl) = xhyh∗ R2 + (xhyl +xlyh)∗ R+xlyl) requires 4 multiplications, but x∗ y = (xh∗ R+xl)∗ (yh∗ R+yl) = xhyh∗ R2+[(xh+xl)∗ (yh+yl)−xhyh−xlyl]∗ R+xlyl) requires only 3: • xh ∗ yh
  • 172. • xl ∗ yl • (xh + xl) ∗ (yh + yl) Division algorithms 112 https://en.wikipedia.org/wiki/Division_algorithm Chapter 4 Jumps 4.1 Unconditional Jumps JMP label (unconditionally JuMP (goto) to label) where label is any identifier, not a keyword, unique. Some scope rules apply. JMP lab
  • 173. lab: JMP lab (Jumps can go both forward and backward, multiple JMP instructions may target the same label). l: JMP l (fine, but infinite loop) 113 NOTE: label is a keyword, so this id cannot be used. In fact, the colon : is an abbreviation for label near. Other forms of JMP syntax exist, including JMP r/m (jump(goto) to address stored in r/m)
  • 174. Only a trivial example for now: MOV AX,offset lab JMP AX .... lab: (offset is a keyword, cf. the “take address” & operator in C). 114 4.2 Conditional Jumps J<cc> label (Jump based on cond code) General idea: conditional jumps implement an equivalent of if (<condition>) goto lab; where <condition> is usually expressed in flags.
  • 175. NOTE: of course, this is exactly the programming style not recommended for high-level languages! 115 4.2.1 Using Zero Flag if (ZeroFlag) goto lab; corresponds to an assembler instructio n, JZ. JZ label (Jump if Zero flag) Example: ; int x; ; ; if (x!=0)
  • 176. ; { <BODY> } ; CMP x,0 JZ skip ; <BODY> skip: if x is already on a register, CMP may not be needed. ; int x,y,z; ; ; if (x+y+z!=0) ; { <BODY> } ; MOV EAX,x
  • 177. ADD EAX,y ADD EAX,z JZ skip ; <BODY> skip: 116 JE label (Jump if equal) is a synomym of JZ ; int x,y; ; ; if (x!=y)
  • 178. ; { <BODY> } ; MOV EAX,x SUB EAX,y JE skip ; <BODY> skip: JNZ label (Jump if not Zero flag) JNE label (Jump if not Equal) ; int x; ; ; if (x==0) ; { <BODY> }
  • 179. ; MOV EAX,x CMP EAX,0 JNZ skip ; <BODY> skip: 117 ; int x,y; ; ; if (x==y) ; { <BODY> }
  • 180. ; MOV EAX,x SUB EAX,y JNE skip ; <BODY> skip: NOTE: The four (actually, two) instructions JZ, JE, JNZ, JNE are based on the Zero Flag only. NOTE: The N “prefix” in the instruction name is used with other instruc- tions. NOTE: Conditions in conditional jumps are always reversed: we specify skipping, not doing! 118
  • 181. 4.2.2 Using Sign Flag JS label (Jump if Sign) JNS label (Jump if Not Sign) ; int x,y; ; ; if (x<0) ; { <BODY> } ; CMP x,0 JNS skip ; <BODY>
  • 182. skip: 119 JL label (Jump if Less) JLE label (Jump if Less or Equal) JG label (Jump if Greater) JGE label (Jump if Greater or Equal) JNL label (Jump if Not Less) JNLE label (Jump if Not Less or Equal) JNG label (Jump if Not Greater) JNGE label (Jump if Not Greater or Equal) (There are only 4 different instructions above!) 120
  • 183. ; int x,y; ; ; if (x<=y) ; { <BODY> } ; MOV EAX,x SUB EAX,y JNLE skip ; reverse Less-or-Equal cond ; <BODY> skip: 121
  • 184. WARNING: the explanation below is simplified, actually JS and JL are different, and the Overflow flag plays a role. For nearly all practical code the results are the same. For more details (and true picture) check this. Why this extra mess? Because CMP may overflow! How does this work? Primary instructions: JL (==JNGE), JLE (==JNG), JNL (==JGE), JNLE (==JG). We assume JL ≈ JS. Then JL jumps if sign flag is set JLE jumps if sign or zero flag is set JNL jumps if sign flag is not set JNLE jumps if neither sign or zero flag is set
  • 185. 122 https://stackoverflow.com/questions/25031860/difference- between-js-and-jl-x86-instructions/25055804 4.2.3 A few examples Example: implement ABS function abs(x) = {−x if x < 0 x if x ≥ 0 Buggy! ; EAX=abs(EAX) OR EAX,EAX JNZ done NEG EAX
  • 186. done: Fix the error above! The correct code is ; EAX=abs(EAX) OR EAX,EAX JNS done NEG EAX done: Example: implement SIGN function (Sign definition: sign(x) = {−1 if x < 0 0 if x = 0 1 if x > 0 Actually
  • 187. ∀ x,x = abs(x) × sign(x) ; EAX=sign(EAX) OR EAX,EAX JZ done 123 MOV EAX,1 JG done NEG EAX done: Example: What does this do? MOV CX,10
  • 188. SUB AX,AX l: ADD AX,CX DEC CX JNZ l 124 These instructions are not suitable for unsigned arithmetic! Example MOV AL,-1 CMP AL,1 JL lab ... lab:
  • 189. This works correctly, JMP is taken MOV AL,255 CMP AL,1 JL lab ... lab: Counter-intuitively, the jump is taken, this is because 255 = −1 and is interpreted as such! Correct way to program this is MOV AL,255 CMP AL,1 JB lab ...
  • 190. lab: 4.2.4 Using Carry Flag JC label (Jump if Carry) JNC label (Jump if Not Carry) 125 MOV EAX,x ADD EAX,y JC error ; overflow ! ... error: 126
  • 191. JB label (Jump if Below) JBE label (Jump if Below or Equal) JA label (Jump if Above) JAE label (Jump if Above or Equal) JNB label (Jump if Not Below) JNBE label (Jump if Not Below or Equal) JNA label (Jump if Not Above) JNAE label (Jump if Not Above or Equal) Only 4 different instructions here, further, JB==JC, JNB==JNC. 127 Exercise: Work out how the flags are used
  • 192. 4.2.5 Mixing signed and unsigned Consider two examples: //example S int x,y; if (x<y) .... //example U unsigned int x,y; if (x<y) .... In Example S, compiler will generate a CMP instruction, followed by an inverted signed jump, specifically here JNL. In Example U, compiler will generate a CMP instruction, followed by an inverted unsigned jump, specifically here JNB.
  • 193. This makes it easy for the user — he does not need to think which instruction to use. But there is a problem: //example M int x; unsigned y; if (x<y) .... Using either signed or unsigned instruction will work incorrectly in some cases! Compilers would typically issue a warning (f.e. “signed un- signed mismatch”, often ignored by the programmer. How to do this correctly? Let’s consider a smaller case first. 128 //example MB
  • 194. unsigned char x, /*signed*/ char y; if (x<y) .... 8-bit comparison will not work. But conversion into a range that in- cludes both char and unsigned char will: //example MB1 unsigned char x, /*signed*/ char y; short sx,sy; sx=x; sy=y; if (sx<sy) .... or, simply: //example MB1 unsigned char x, /*signed*/ char y; if ((short)x<(short)y) ....
  • 195. Since short is a signed datatype, generated instruction will be signed. When comparing signed and unsigned shorts, we can convert both to int (in fact, we could have used int above too). But what do we do when comparing signed and unsigned int’s? In assembler, the above is of course also doable, but so is the 32-bit comparison! (The code cannot be shown at this time : we bypassed conversion in- structions). In C a proper comparison function can be written too. 129 4.3 Jumps encoding
  • 196. opc del where opc is the opcode (0x74 for JE, 0x78 for JS,...) and del is a signed byte. IP is first incremented to point to the next instruction, then, if the jump is taken, IP=IP+del; The short form of JMP is encoded similarly, using the opcode byte 0xEB 0xEB del The near form of JMP uses opcode 0xE9 and 2-byte (16-bit segs) or 4- byte (32-bit segs) delta. 16 bit seg : 0xEB d1 d2 32 bit seg : 0xE9 d1 d2 d3 d4 If conditional jump cannot reach its target, one can overjump:
  • 197. JZ lab ; more than 128 bytes lab: may not compile, but JNZ temp JMP lab temp: ; more than 128 bytes lab: 130 will. 80386+ has alternative longer forms of conditional jumps
  • 198. instructions, with 16-bit delta, they begin with bytes 0Fh 8?h .. 4.4 LOOPS Revisiting the previously seen example: MOV CX,10 SUB AX,AX l: ADD AX,CX DEC CX JNZ l We notice that adding numbers 10 down to 1 is more efficient than doing 1 to 10. 131
  • 199. MOV CX,1 SUB AX,AX l: ADD AX,CX INC CX CMP CX,11 ; EXTRA JNZ l Generally, running loops toward 0 is more efficient!. We next notice that we actually implemented a do while loop. ; cx=1; ax=0; do { ax=ax+cx; cx++; } while (cx<=10); What about a while ? ; cx=1; ax=0; while (cx<=10) {ax=ax+cx; cx++; } MOV CX,1 SUB AX,AX
  • 200. l: CMP CX,10 ; EXTRA JA done ADD AX,CX INC CX JMP l ; EXTRA done: Each execution of while in n iteration loop requires 2n jumps vs n for do while. Generally, do-while is more efficient than while!. We will not consider for – generally equivalent to while. Returning to MOV CX,10 SUB AX,AX
  • 201. l: ADD AX,CX DEC CX JNZ l This is a very common form of a loop, and Intel allows to optimize this further. 132 LOOP label (CX–; if (CX>0) JMP label) Thus: MOV CX,10 SUB AX,AX l: ADD AX,CX LOOP l
  • 202. for one instruction fewer! We can generalize this into MOV CX,n SUB AX,AX l: ADD AX,CX LOOP l But what will happen if supplied n is 0 ? One solution: MOV CX,n SUB AX,AX OR CX,CX JZ done
  • 203. l: ADD AX,CX LOOP l done: Again, Intel provides a shortcut, 133 JCXZ label (if (CX==0) JMP label) Thus: MOV CX,n SUB AX,AX JCXZ done l: ADD AX,CX LOOP l
  • 204. done: LOOP and JCXZ are encoded similarly to conditional jumps. LOOP’s op- code is 0E2h, JCXZ’s is 0xE3. Mentioning only: there are also less commonly needed LOOPZ and LOOPNZ; All four instructions can work off ECX. Double LOOPs should save CX! MOV CX,n L1: .... **SAVE** CX MOV CX,m L2:
  • 205. .... LOOP L2 **RESTORE** CX LOOP L1 SAVE/RESTORE can be done using another reg (for example, XCHG CX,DX), memory, or stack (usually the best). 134 One more “real” example: MOV CX,100 MOV AX,1 MOV BX,1 L: ; print AX
  • 206. ADD BX,AX XCHG BX,AX LOOP L 4.5 Control statements templates ; goto lab; ; JMP <lab> lab: ; if <COND> <BODY> ; ; convert condition into a flag JN<cond> skip
  • 207. <BODY> skip: 135 ; if <COND> <BODY1> ;else <BODY2> ; ; convert condition into a flag JN<cond> lelse <BODY1> JMP done lelse: <BODY2> done:
  • 208. ; while (true) <BODY> ; for (;;) <BODY> lagain: <BODY> JMP lagain ldone: ; may be used for break in <BODY> ; while <COND> <BODY> ; lagain: ; convert condition into a flag JN<cond> ldone: <BODY>
  • 209. 136 JMP lagain ldone: ; do <BODY> while <COND>; ; lagain: <BODY> ; convert condition into a flag J<cond> lagain ; while <COND> { <BODY1> break; <BODY2>} ; lagain:
  • 210. ; convert condition into a flag JN<cond> ldone: <BODY1> JMP ldone <BODY2> JMP lagain ldone: 137 ; while <COND> { <BODY1> continue; <BODY2>} ; lagain:
  • 211. ; convert condition into a flag JN<cond> ldone: <BODY1> JMP lcont <BODY2> lcont: JMP lagain ldone: NOT covered: for, switch, and how to deal with more complex expres- sions (see “complete and shortcut” section). 138 Chapter 5
  • 212. Variables 5.1 Declaring variables Assembler language includes operators for declaring variables. The primitive data declarations operators correspond to the primitive types in the assembler language: 139 [name] DB value[s] (Declare Byte) [name] DW value[s] (Declare Word) [name] DD value[s] (Declare Doubleword) [name] DQ value[s] (Declare Qword) [name] DT value[s] (Declare Tenbyte)
  • 213. The [name] field of the declaration should be a unique identifier; this field is usually present but not required. The [name] field is used to refer to the variable and if absent, there is no way to refer to it. The value field refers to the initial value of the variable, this field is always required. Examples: year dw 2020 a1 db ’a’ ; as char a2 db 97 ; as decimal a3 db 61h ; as hex a4 db 01100001b ; as binary dwrd dd 12345678 notice that a1,a2,a3 and a4 all provide the same initial value, in
  • 214. different 140 formats. ’a’ is 97 since the character ’a’ is in the 97th position in the ASCII table. Multiple initialization values are allowed, in such cases the declared variable is actually an array: primes DW 2,3,5,7,11,13 str1 DB ’H’,’e’,’l’,’l’,’o’ str2 DB ’Hello’ str3 DB "H",’e’,"L","L",’o’ str4 DB "HELLO" str5 DB ’HE’,"LLO"
  • 215. All five strings above contain identical data: both single of double quotes are allowed – but single quotes must match single quotes and double quotes must match double. Further, strings can be enteres either as strings or as sequences of characters. s DB ’Hello, World’,0 shows a zero-terminated string (C language string format). One would use single quotes to enter a string that includes double quotes and vice versa: m DB ’"Hello", he said’ and will have to break the string into parts if it includes both single and double quotes m DB ’"Don’,"’",’t", he said’
  • 216. Long sequence of initialization values can be broken into multiple dec- larations: msg DB "ERROR: you must not use" DB "this operator the way you do" DB "please delete the program and" DB ’read the textbook’,0 141 in the example above we do not name the lines after the first – the program will not refer to them. Often, one needs to calculate the length of a string like msg above. This can be done during the run time, by counting the characters up to the terminator, or during compilation:
  • 217. msg DB "ERROR: you must not use" DB "this operator the way you do" DB "please delete the program and" DB ’read the textbook’ msgend DB 0 msglen DW offset msgend-offset msg .... MOV CX,offset msgend-offset msg The offset operator is similar to the address operator & in C/C++. offset msgend-offset msg is a constant, computed during the compile time, it is equal to the number of the characters in the message, excluding the terminating null character. With this value known we actually do not
  • 218. need the null terminator at all and can instead use msg DB "ERROR: you must not use" DB "this operator the way you do" DB "please delete the program and" DB ’read the textbook’ msgend LABEL BYTE msglen DW offset msgend-offset msg .... MOV CX,offset msgend-offset msg where LABEL declares a byte type variable with NO initial values. LABEL is used instead of DB since the latter will always allocate at least one byte. Other types that may follow LABEL include WORD, DWORD, QWORD, TENBYTE.
  • 219. To specify an uninitialized variable, use “?”: x DD ? y DD ? 142 Declared variable can be referred to in code: x DW ? y DW ? z DW ? ..... MOV AX,x ADD AX,y MOV z,AX
  • 220. Notice that declared variables have size and the usual size rules apply: MOV x,10 ; OK, size known from x MOV AL,x ; error, size mismatch MOV AL,byte ptr x ; OK, size casted to byte The dup operator can be used to specify large arrays without having to list all the initial values: ar1 DW 1,1,1,1,1 ar2 DW 5 dup(1) two arrays above contain exactly the same data. More practically useful example would be ar3 DW 1000 dup(?) – a 1000-element unitialized array.
  • 221. Initialized and unitialized entries can be used within the same declara- tion: aa DW 1,2,?,?,5,6 fib DW 1,1,998 dup(?) the fib array will be used to compute the Fibonacci numbers below. First two elements are initialized statically, the subsequent entries will be com- puted. 143 Data declarations are not executable instructions and in general one should separate them from code so they do not get executed. This is usually accomplished by placing them into a separate segment. If there is only one
  • 222. segment (.com files), then the typical approach is to structure the program as follows: start: ; entry point of the program JMP real_start ..... data and procedure declarations ..... real_start: ; code begins here because of overjumping the data, it will never get executed. An exception to the above is the case when we know the exactly which instructions the data corresponds to. For example, MOV AX,BX
  • 223. DB 90h ADD SI,DI is totally safe, 90h is the NOP operation! While there is no good reason to code NOP as data, similar approach can be used to enter an instruction that is not supported by the assembler compiler, or a form of the instruction assembler will not produce. There are two different ways to encode ADD BX,CX as machine code. One of the is normally produced by the assembler compiler, the other can only be entered using db’s. Exercise: Do it. 5.2 Using arrays This section contains several examples of common snippets of code.
  • 224. Example 1: initialize a 256-element byte array to contain all possible 256 characters in ascending order. 144 ba DB 256 dup(?) ..... SUB BX,BX MOV CX,256 l: MOV ba[BX],BL INC BX LOOP l notice that we use the same register as both index and the current value to
  • 225. be stored; this is possible only because of the byte size of the data. Example 2: initialize a 1000-element word array to contain numbers 0 through 999 in ascending order. nums DW 1000 dup(?) ..... SUB AX,AX SUB BX,BX MOV CX,1000 l: MOV nums[BX],AX INC AX ADD BX,2 LOOP l
  • 226. Exercise: Explain what would happen if BX is incremented only by 1. Generally, the index in loops like above needs to be incremented by the byte size of the data – the index counts bytes, and not elements. One very important implication is that in high language loops of this type there is a hidden multiplication by the element size, depending on the compiler it may or may not be implemented as a multiplication. Consider two ways to implement the above code in C : short s[1000]; int i; for (i=0; i<1000; i++) s[i]=i; 145
  • 227. this code has a hidden multiplication by 2, index i needs to be translated into an offset to compute the address of s[i]. The other way to code the example short s[1000]; int i; short *p=s; for (i=0; i<1000; i++) *p++=i; does not have a hidden multiplication and often would result in better com- piled code. Example 3: for collection, let us also initialize a 1000-element double- word (int) array to contain numbers 0 through 999 in ascending order. nums DD 1000 dup(?) .....
  • 228. SUB EAX,EAX SUB BX,BX MOV CX,1000 l: MOV nums[BX],EAX INC EAX ADD BX,4 LOOP l Example 4: Compute sum and average of the elements in a 1000- element word array (in this and subsequent examples we assume nums to be initialized with some values first.) nums DW 1000 dup(?) sum DW ? ave DW ?
  • 229. ..... SUB AX,AX SUB BX,BX MOV CX,1000 l: ADD AX,nums[BX] ADD BX,2 LOOP l MOV sum,AX CWD 146 MOV BX,1000
  • 230. IDIV BX MOV ave,AX (note that the answer will be rounded down to an integer) Example 5: Find the largest element in an array of signed numbers nums DW 1000 dup(?) max DW ? ..... MOV AX,nums[0] MOV BX,2 MOV CX,999 l: CMP AX,nums[BX] JGE skip ; use JAE for unsigned MOV AX,nums[BX]
  • 231. skip:ADD BX,2 LOOP l MOV max,AX Use JLE and JBE if searching for the smallest entry. Example 6: Compute Fibonacci numbers fibs DW 1,1,998 dup(?) ..... MOV BX,4 MOV CX,998 l: MOV AX,fibs[BX-4] ADD AX,fibs[BX-2] MOV fibs[BX],AX ADD BX,2
  • 232. LOOP l 147 Exercise: Would all the values be computed correctly? Example 7: Find value t in an array of numbers nums DW 1000 dup(?) max DW ? ...... MOV AX,t SUB BX,BX MOV CX,1000 l: CMP AX,nums[BX]
  • 233. JE found ADD BX,2 LOOP l not_found: ..... found: ..... A more interesting program would be a binary search... it not all that difficult in assembler. We assume nums contains signed numbers in ascend- ing order. nums DW 1000 dup(?) ...... MOV AX,t
  • 234. MOV SI,0 ; left interval bound MOV DI,999*2 ;right interval bound l: CMP SI,DI JA not_found 148 MOV BX,SI ADD BX,DI SHR BX,2 SHL BX,1 ; middle point CMP t,nums[BX] JE found
  • 235. JG right left: MOV DI,BX SUB DI,2 JMP l right: MOV SI,BX ADD SI,2 JMP l .... found: .... not_found:
  • 236. .... Exercise: Reverse the elements in an word array. Exercise: Implement bubble sort. 149 5.3 Memory addressing syntax 16 bit addressing offers only the following nine schemes of entering the offset • const • BX+const • BP+const (***) • SI+const
  • 237. • DI+const • BX+SI+const • BX+DI+const • BP+SI+const (***) • BP+DI+const (***) Notes: How to remember: Base, or Index, or Both, or Neither Double register addressing allows to efficiently access two dimensional array, but more often is used to access dynamic one-dimensional arrays (base points to the beginning of an array, index “indexes” it.) This limitation allows for a compact encoding of the addressing scheme. Despite nine choices offered, only three bits are needed. Options marked with (***) imply SS: segment use (others
  • 238. default to DS:). 150 5.4 Video memory in text mode. For the default text video mode, the video memory begins in the middle of the B band, corresponding to the segment value B800. (the reasons are historical). Let us begin with an example showing the idea, we will specify the exact rules later. MOV AX,0B800h MOV ES,AX ; ES=>video! MOV byte ptr ES:[0],’A’
  • 239. INT 20h ; terminate the program If compiled (you would need to add extra lines to the file per assembler language syntax requirements) and run this program will deposit letter A into the upper left corner of the screen! The general layout of video memory is the following: The screen has 25 rows and 80 columns of characters; each cell cor- responds to two bytes in the video memory. The even bytes (0, 2, 4, ...) are the ASCII characters, the odd bytes define the attributes (colors) of the character in the preceding even address. Let us provide an example of this: MOV AX,0B800h MOV ES,AX ; ES=>video! MOV byte ptr ES:[0],’A’
  • 240. MOV byte ptr ES:[1],71h INT 20h ; terminate the program 151 71h defines the attributes of the letter A on screen. The first nibble (7) is the background color, the second (1) is the foreground color: A would appear as blue on white, following these definitions: • 0 – black (0x000000) • 1 – blue (0x0000AA) • 2 – green (0x00AA00) • 3 – cyan (0x00AAAA) • 4 – red (0xAA0000) • 5 – magenta (0xAA00AA)
  • 241. • 6 – brown (0xAA5500) • 7 – white/light gray (0xAAAAAA) • 8 – (dark) gray (0x555555) • 9 – bright blue (0x5555FF) • 10 – bright green (0x55FF55) • 11 – bright cyan (0x55FFFF) • 12 – bright red (0xFF5555) • 13 – bright magenta (0xFF55FF) • 14 – yellow (0xFFFF55) • 15 – bright white (0xFFFFFF) (In some modes the high bit of the background color indicates blinking rather than color, other variations of interpretation exist).
  • 242. While MOV is the most common instruction to change video memory, any other instruction can be used. Consider: 152 MOV AX,0B800h MOV ES,AX ; ES=>video! MOV byte ptr ES:[0],’A’ MOV CX,256 L: INC byte ptr ES:[0] LOOP L INT 20h ; terminate the program The program above will show A, then change it to B,.... eventually coming back to an A.
  • 243. Will you be actually able to see the letters changing onscreen? Well.. no! This will happen too fast to notice. Exercise: Fix it Writing to video memory is the best way of doing full-screen output, most suitable for tables and games; other methods are used to produce scrollable console output (see INT 21h). Assuming that the rows are numbered up-to-down 0 through 24, and the columns are numbered left-to-right 0 through 79, to write at position (X,Y) we should store information at offset 160 ∗ Y + 2 ∗ X in the video segment. 153
  • 244. Chapter 6 Project #1 Due Date: 04/19/2021 Goal: implement Game of 1024. This a new simple board game(rather a puzzle), you can play the online version Here. Project specifics: 1. Use direct write to the video memory (seg 0xB800). 2. Output will be explained(today), input can be done using int 16h (look up). 3. Recommendation to use color and line draw characters (2nd half of the ASCII OEM set) 4. Projects are individual and should not be copies of source code found elsewhere; such submissions will not be accepted.
  • 245. 5. You do not need to follow the layout exactly, shortcuts that simplify programming but essentially keep the game the same are fine. 154 https://1024game.org/ Basic grading guidelines: • C something works (but perhaps not really playable) • B playable with glitches/deficiencies. • A enjoyable to play (well... to the degree the entire idea of such puzzle is) 6.0.1 Program outline Hopefully, of some help – this is an overall structure of the program one can use.
  • 246. 0 Initialize the program, display. 1 Initialize the configuration 2 Display the board 3 Wait for a key (int 16h) 4 If the key indicates a move of a piece, update the configuration, go to step 2. 5 .If the key indicates new game, go to step 1. 6 If the key indicates quit, quit. 7 Ignore the key, go to step 3. 6.0.2 Submission Submit the source (.asm) and the executable (.com or .exe), ideally by email.
  • 247. Please not that some email providers kill attachments of certain types. Google is likely to reject .com or .exe, CCNY email does not like .asm. To avoid problems : rename the files to give them “safe” extension, then zip(or rar) files together. Please do not name the archive “project.zip”, rather use your name to name the archive. (Example: rename project.com to project.com.txt) 155 When submitting, CC: yourself – this way you would see that the project does come through. Submission can be made either my CCNY email or to [email protected] (the latter address does not kill any attachments!).
  • 248. Submission is your responsibility. 156 6.1 How to compile and run a program. Assembler distribution (Download, unpack, make sure it works. The distribution contains DOS- BOX, TASM, TLINK). At this time we only need to know how to create a binary from an assembler program, a sample HELLO.ASM is provided. .model tiny .code org 100h start:
  • 249. ; BEGIN BODY mov dx, offset hello mov ah, 9 int 21h mov ah, 4ch int 21h hello db ’Hello, world.’,13,10,’$’ ; END BODY end start The part between BEGIN BODY and END BODY is replaced by your own code. To compile: (assuming you are in the directory where hello.asm is
  • 250. unpacked; under DOSBOX or 32-bit OS) C> BINTASM hello This will create file HELLO.OBJ. To link C>BINTLINK hello /t 157 http://tinyurl.com/cjdyruy This will create file HELLO.COM. To run C>Hello This will say (guess what?) This output method (console write) will not work right for full- screen apps, write into video directly, instead: .model tiny
  • 251. .code org 100h start: jmp real_start ; place your variables and/or procedures here. real_start: ; BEGIN BODY MOV AX,0B800h MOV ES,AX ; ES=>video! MOV byte ptr ES:[0],’A’ MOV CX,256 L: INC byte ptr ES:[0] LOOP L INT 20h ; terminate the program
  • 252. end start 158 Chapter 7 Stack and procedures Hardware stack discussed in the chapter is not a necessary feature of a hardware, many earlier designs did not provide it. However, having it helps in many ways, most notably in the ability to implement procedures efficiently. 7.1 Using stack We will begin working with the stack by using it. For this section it is not important just how it works psysically, we only state that there is –
  • 253. somewhere in memory – a data structure, implemented in hardware, that parallels the stack software data structure. Namely, we can push things on the stack, we can pop them back, and the operations work in the LIFO fashion: Last In, First Out. Let us begin with the syntax: PUSH w/d m/r (PUSH word or doubleword) POP w/d m/r (POP word or doubleword) Notice that byte operations are not supported. Further, notice that push- 159 ing or poping memory actually results in a memory-to-memory operation! These stack operations are one of the exceptions to the general no memory-