This document discusses data storage formats for different types of multimedia files. It begins by explaining that sound, pictures, video, text and numbers are stored digitally in different formats. It then discusses various file formats for storing audio like MIDI, MP3, WAV and lossy/lossless compression techniques. For images, it covers JPEG, GIF and vector/bitmap formats. Video compression techniques like MPEG and MP4 are also summarized. The document concludes by covering text/number representation using ASCII and error detection methods like parity checks and checksums.
1. UNIT 1.1.3- DATA STORAGE
• show understanding that sound (music), pictures, video,
text and numbers are stored in different formats
• identify and describe methods of error detection and
correction, such as parity checks, check digits, checksums
and Automatic Repeat reQuests (ARQ)
• show understanding of the concept of Musical Instrument
Digital Interface (MIDI) files, JPEG files, MP3 and MP4 files
• show understanding of the principles of data
compression (lossless and lossy) applied to music/ video,
photos and text files
2. WHAT IS "DIGITAL"?
Multimedia: means multiple methods of Communication. It is a Method through which computer
information can be represented e.g. audio, video, animation plus (traditional media like text,
graphics/drawings and images).
Multimedia Application: An application that uses a collection of multiple media sources e.g. text,
graphics, sound/audio, animation or video. Multimedia can be stored in different formats.
SOUND - It is an analogue signal for human ears to hear but is stored in digital form in a computer
3. ANALOGUE & DIGITAL
SIGNAL
Analogue signals vary continuously.
The sound waves from the mouth are
analogue and can be converted into
electrical signal by a microphone.
Sound is an audio signal (analogue)
and has to be converted into digital
for a computer to interpret.
Digital Sound can either be sampled
or synthesized
4. Sampling: Standard method of capturing analogue sound wave of natural occurring
or pre-recorded sounds in digital form.
Synthesized Sound: Computer generated involving use of software or digital
synthesizers.
Sound Sample: When a sound is recorded, the sound card takes a measurement of
the height of the signal many times per second.
Sampling Rate: Number of times the sample is measured per second & measured in
kilohertz.
The higher the sampling rate, the better is the sound reproduction but the larger the
file size.
Users require to exchange sound between computers in various standard file
WAV, MP3 & MIDI.
5. Multimedia files can be quite large, and larger files mean more storage space
and slower downloads. Compression is the process of removing data to shrink
the file size (file size reduction). Compression schemes can be lossy or lossless
REASONS FOR COMPRESSION
To reduce file size
To save space on disk
To increase data transfer speed
To allow real transfer at a given rate
6.
TWO TYPES OF COMPRESSION METHODS:
LOSSY→ It creates smaller files by discarding some information about original
image. More commonly used to compress multimedia data (audio, video & still
images), (especially in streaming media & internet telephony)
•It discards some image details in order to get more compression but is done
cleverly and is unnoticeable. (JPEG, MPEG & MP3)
Lossy: MP3, WMA, ogg Worbis (.ogg). Redundant and non-auditory data is removed
to allow for more compact storage, some data has been lost.
LOSSLESS→ Stores data in less space by eliminating unnecessary data without
degradation in quality and data is compressed without any loss of data. (PNG,
GIF, ZIP). Lossless compression allows the original file to be recovered in full.
•Most computer data files must be compressed without loss & ZIP is the most
common method. [WINZIP & WINRAR] →Compression is also built on some
hardware e.g modems (to make more effective).
Lossless: WAVE (files are large and complete; nothing has been lost).
7. LOSSY compression is used with
music, photos, videos, medical
images, scanned documents and
fax machines
LOSSLESS compression is used
with databases, emails,
spreadsheets, office documents
and source code.
8. AUDIO COMPRESSION
MP3 {MPEG Layer-3}: MP3 recordings have a high-level compression & yet
retain a reasonably high quality sound. MP3 is used for layer audio files on the
internet. [1 minute of CD quality audio can be stored in 1MB]. File extension:
.MP3
MPEG-1 files are about one-twelfth the size of WAV files. This is why MP3
can accommodate hundreds of songs on a tiny chunk of storage space.
MP3 is a file format to compress CD quality music to about one-tenth of the file
size compared to normal CD. MP3 files are small in size but of good quality.
Example: 80 MB CD track equals to 8 MB when converted to MP3.
Most devices come with MP3 players installed (even car/home CD players,
portable player, ipads etc…)
9. Advantages
-Distribution of music is less expensive.
-Files can be shared easily on the internet.
-MP3 can be played by many devices.
-Large number of files can be ripped on a disc
since file size is small.
Disadvantages
-Low audio quality as MP3 uses a ‛lossy′
algorithm that deletes the ‟lesser audible” music
content.
-Data is susceptible to losses due to malware or
virus attacks.
Note: Rip (raster image processor) is the process of converting a vector image into
a raster image
AUDIO COMPRESSION - ADVANTAGES & DISADVANTAGES
10. AUDIO COMPRESSION
WAVE: (created by Microsoft & IBM) it is a standard method of storing analogue sound
in digital form. WAV was one of the first audio file types developed for the PC. WAV
files come in 8, 16, 24 & 32-bit formats. To reduce file size & transfer time, large WAV
files need to be compressed resulting in poor sound quality. File extension: .WAV
WAV files can be downloaded from the internet & played through your browser using
Audio plug-in (e.g. Real Audio Player).
MIDI {Musical Instrument digital interface}: An analogue file format from a musical
instrument into a digital format that the computer can manipulate and store. MIDI files
are made up of a series of standard parameters which describe the sound. {E.g. note,
pitch, length, volume, stereo position, attack & delay}. The sound itself is not stored,
except the instructions on how to recreate it, MIDI files are therefore small in size.
File extension: .MID
• MIDI files can be downloaded quickly & easily on the internet.
• Normal musical instruments have a MIDI port for input into a MIDI interface in the
computer.
• Most musical keyboards & many electric guitars have MIDI connections.
• The notes are converted into digital data & saved as a file on a computer and this
data can be converted back into notes or edited by a computer software
11. AUDIO COMPRESSION
Advantages of MIDI
Data from musical instruments is easily
captured & edited with a computer.
MIDI files are smaller.
Sound effects can be added.
Voice tracks can be integrated with music.
Disadvantages of MIDI
Audio cannot be recorded directly as audio
files such as MP3.
Only the notes & the timing are stored.
Playback depends on the instrument / sound
card & may not sound the same as the
original.
12. AUDIO COMPRESSION
WINDOWS MEDIA AUDIO (.wma) - It was developed to compete with the MP3 format for Windows
Media Player. Microsoft claims that the WMA files are compressed three times more than MP3s yet
retain their original sound quality.
OGG VORBIS (.ogg) – It is another compressed source code similar to MP3, but like WMA, more
compressed. It is also open source (free to all, unlicensed, no strings attached). While MP3
compresses data at a constant bit rate, Ogg uses a variable bit rate.
Other Audio File Types
Audio Interchange File (.aif, .aifc or .aiff.) - was developed for the Macintosh computer to store
audio files.
Sun Audio (.au) - Sun Audio (.au) or Audio/Basic was developed by Sun Microsystems for use on
UNIX systems.
Emblaze Audio (.ea) - was created by Geo and offers compression similar to MP3 formats, but its
purpose is to be played with a JAVA applet-a miniature Internet program. Online greeting cards
often use JAVA applet programs for motion and .ea sound files to play music.
A wav file is an audio file which is either uncompressed or uses a lossless compression for
encoding. Usually audio CDs use this format along with the .cda format.
Midi files are audio files too, but they don't contain the musical notes. Instead they only contain
commands. Using these commands, the file can control the audio hardware on the computer or
any compatible system to play the music.
13. PICTURE COMPRESSION
PICTURE - Picture created by drawing or paint programmed or scanned
can be stored in a variety of format (Raster or Vector).
Raster / bitmap: Image is composed of a field of pixels. It is of lose quality
Vector format: Images are represented as mathematical formula. Created
by a drawing/CAD program & consists of features like Curves, Shades &
Characters & not pixels. Vector Graphics are more flexible than bit-
mapped. Therefore, they look the same when re-size. They are called
scalable fonts or vector fonts (e.g. Postscript)
Vector Graphics require less memory than bit-mapped images.
Most output devices like monitor & printer are raster devices (plotter use
Vector Graphics). All vector graphics must be translated into bitmap
before being output. Vector Graphics are not translated to bitmap until the
last possible moment.
14. VIDEO COMPRESSION
VIDEO - Technology of electronically capturing, recording, processing, storing,
transmitting & reconstructing a sequence of still images representing scenes in
motion. Videos consist of a series of still images sometimes embedded with audio
information, united in such a way as to produce a single playable file.
→MPEG & Quicktime (examples of movie formats on the internet)
→Raw video can be regarded as being a series of single images. There are typically
25, 30 or 50 frames per second.
Examples
•Monochrome images (512*512) take: 0.25*25 =6.25 MB for a second to store
uncompressed.
•PAL digital video (720*576 pixels per colour frame): 1.24*25 =31 MB for a second to
store uncompressed.
•High definition video on Blu-ray (1920*1080 =2 Megapixels per frame): 6.2*25 =155
MB for a second to store uncompressed.
Digital video clearly needs to be compressed most of the time.
15. VIDEO COMPRESSION
MPEG 4 (MP4)
A format to store video, audio, still images & subtitles together in a single file & have the capability
to include advanced content like menus, user interaction and graphics.
Advantages
-They offer a greater degree of compression
(smaller files) without noticeable loss of
quality.
-It is an open standard that everyone can use.
Disadvantages
-Require pre-buffering before viewing contents
thus increasing the number of times it takes to
view video.
-Even if the file size is small, it takes time to
download the file.
16. TEXT & NUMBER COMPRESSION
TEXT & NUMBERS - ASCII (American standard code for information
interchange) is a code representing characters such as text and numbers as
binary codes
JPEG FILES (Joint photographic experts group)
Jpg files are true colour (16.7 million colours, or 24-bit) images that are
compressed. Files may degrade in quality when they are JPEG encoded. This
degradation is not noticeable in most scanned photographs & images with
smoothly coloured areas. JPEG files are significantly smaller than most other
formats & can be opened & saved on all platforms.
PRINCIPLES of data compression - Compression is the process of removing data
to shrink the file size. Data is compressed to save memory space or
transmission time
17. TEXT & NUMBER COMPRESSION
ASCII CODE (American Standard code for information interchange)
– A code to represent characters.
There is great difficulty in transferring information from one PC to
another, therefore computers used different set of codes. Most
PC’s use the ASCII code but many mainframes use the EBCDIC
codes – 8 bits code with 256 characters (Extended Binary Coded
Decimal Interchange code)
ASCII originally used the 7-bit codes (128 characters) and later the
extended ASCII (8-bit) was developed. The first 32 ASCII
characters were used for communication protocols and not
characters.
UNICODE –It is an international 16-bits coding scheme that can
represent characters in any language (Chinese, Arabic, Hindi,
Egyptian hieroglyphics etc.)
18. BINARY CODED DECIMAL (BCD)
It is a 4-bits binary code
Example: 0 is represented by 0000, 1 by 0001 etc...
3765 = 0011 0111 0110 0101
Advantages
Easy to convert BCD to denary and
versa.
When storing fractional numbers, no
rounding of numbers occurs. BCD
arithmetic is used in business
applications.
Disadvantages
More bits are required to store a
number than when using pure
binary.
Calculations are more complex than
with pure binary.
19. ERROR-CHECKING METHODS
There is high risk that data may be corrupted while being transmitted.
Thus, checking for errors is important as computers won’t be able to
check for errors. Examples: parity check, ARQ, Check sum and echo
checking.
When data is entered into a computer or is sent over transmission
lines or cables within a computer, unintentional errors can occur.
Minute particles of dirt or grease can corrupt data on a disk, for
example.
It is very important to be sure that data has not been corrupted.
Error detection checks for errors that occur in the transmission or
storage of data.
Error correction determines that an error has occurred and tries to fix
the mistake.
Error checking methods are: parity, ARQ, Checksum , Echo check &
Check digit
20. PARITY CHECKING
A parity bit is an extra bit that is associated with a word of storage. The value of 1 or 0 is assigned to
the parity bit to make the total number of 1s in the word odd if odd parity is used, and even if even
parity is used.
When parity is in use on a computer system, one parity bit is stored in DRAM along with every 8 bits (1
byte) of data.
For example, the ASCII code for ‘A’ is 0100 0001. Using odd parity, it is 1 0100 0001.
The extra bit is the parity bit, and it is set to 1 because 0100 0001 has an even number of 1s.
On the other hand, the ASCII code for ‘C’ is 0100 0011. This code already has an odd number of 1s, so
the representation using odd parity would be 0 0100 0011.
The primary advantages of parity are its simplicity and ease of use.
Parity does have its limitations.
Its primary disadvantage is that it may fail to catch errors.
Parity can detect errors but cannot make corrections, because the parity technology can’t determine
which of the 8 data bits are invalid.
If two data bits are corrupted, parity will not detect the error.
If two bits are transposed (change places), the computer could be fooled into thinking the data is
correct and not corrupted.
Finally, if two random bits change state then the system could also be fooled
21. PARITY CHECKING
Example 1 - Suppose you are using an odd parity.
What should the binary word “1010” look like after
add the parity bit?
Answer:
o There is an even number of 1-bits.
o So, we need to add another 1-bit
o Our new word will look like “10101”.
Example 2 - Suppose you are using an even parity.
What should the binary word “1010” look like after
add a parity bit?
Answer:
o There is an even number of 1’s.
o So we need to add another 0
o Our new word will look like “10100”.
Example 3 - Suppose the sender wants to send the word world. In ASCII the five characters are coded as:
1110111 1101111 1110010 1101100 1100100
The following shows the actual bits sent 1110111 0 1101111 0 1110010 0 1101100 0 1100100 1
Suppose the word world is received without being corrupted in transmission. 11101110 11011110 11100100
11001001
The receiver counts the 1s in each character and comes up with even numbers (6, 6, 4, 4, 4). The data are
accepted.
Now if the word world is corrupted during transmission. 11111110 11011110 11101100 11011000
The receiver counts the 1s in each character and comes up with even and odd numbers (7, 6, 5, 4, 4).
The receiver knows that the data are corrupted, discards them, and asks for retransmission.
22. AUTOMATIC REPEAT REQUEST (ARQ)
It uses an ACKNOWLEDGEMENT (a message sent by the receiver indicating
that data has been received correctly) and TIMEOUT (this is the time allowed
to elapse before an acknowledgement is received).
The sending computer transmits a block of data
The sending computer waits a period of time to see if the receiving computer
acknowledges receipt of the data
After a set period of time, a timeout occurs which triggers the data to be
automatically resent by the sending computer
This will continue until the receiving computer acknowledges the data has
been received (until the packet resend is error free or a limited amount of
resend request is reached)
ARQs are often used to ensure reliable transmissions over an unreliable
service. ARQ is sometimes used with Global System for Mobile (GSM)
communication to guarantee data integrity
23. CHECKSUM
A block of data is sent alongside a calculated checksum value. The receiving
computer also calculates what it believes should be the checksum. The
checksum values are then compared to see if an error has occurred during
transmission.
Method
The sending computer uses the block of data to be sent, and a predefined
mathematical algorithm, to calculate a checksum value
The sending computer sends the data, plus the checksum value
The receiving computer uses the data it receives to also calculate what it
believes should be the checksum, using the same mathematical algorithm
The two checksum values are compared by the receiving computer
Due to the nature of the algorithm, it is highly unlikely that corruption has
occurred if the checksum values match
If the checksum values don’t match, the receiving computer requests that the
data is transmitted again
24. CHECKSUM
Checksums are used to ensure the integrity of a file after it has been transmitted
from one storage device to another. This can be across the Internet or simply
between two computers on the same network. Either way, if you want to ensure that
the transmitted file is exactly the same as the source file, you can use a checksum.
Checksums are used not only to ensure a corrupt-free transmission, but also to
ensure that the file has not been tampered with. When a good checksum algorithm is
used, even a tiny change to the file will result in a completely different checksum
value.
To explain how this works, we will assume the checksum of a block of data is 1 byte
in length. This gives a maximum value of 28 – 1 (i.e. 255). The value 0000 0000 is
ignored in this calculation.
If the sum of all the bytes in the transmitted block of data is <= 255, then the
checksum is this value.
However, if the sum of all the bytes in the data block > 255, then the checksum is
found using the simple algorithm in Figure 2.15.
25. CHECKSUM
Example 1 - Suppose the following block of 16 bits is to be sent using a checksum of 8 bits.
10101001 00111001
The numbers are added as:
The pattern sent is 10101001 00111001 00011101
Suppose there is no error at the receiver side 10101001 00111001 00011101
When the receiver adds the three sections, it will get all 1s, which, after complementing, is all 0s and
shows that there is no error.
Complement 00000000 means that the pattern is OK.
Message1 10101001
Message2 00111001
Sum 11100010
Checksum 00011101
Message1 10101001
Message2 00111001
Checksum 00011101
sum 11111111
Complement 00000000
26. ECHO CHECK
The receiving computer sends a copy of the data immediately back
to the sending computer for comparison. The sending computer
compares the two sets of data to check if any errors occurred
during the transmission process. If an error has occurred, the data
will be transmitted again. The sender compares the two sets of
data to check if any errors occurred during the transmission
process, this isn’t very reliable. However, if no errors occurred
then it is another way to check that the data was transmitted
correctly.
Drawback of echo checks
If the two sets of data are different you will have no way of
knowing whether the error occurred when originally sent, or when
it was sent back
Echo checks require a lot of extra data to be transmitted
27. CHECK DIGIT
It is the single final digit in a code of numbers. It is calculated from all the other digits in the
code. Its purpose is to spot human errors on data entry.
Check digits are often found in barcodes, product codes or ISBN book numbers.
Check digit algorithms are generally designed to capture human transcription errors. These
include the following:
single digit errors, such as 1 → 2
transposition errors, such as 12 → 21
twin errors, such as 11 → 22
jump transpositions errors, such as 132 → 231
jump twin errors, such as 131 → 232
phonetic errors, such as 60 → 16 ("sixty" to "sixteen"), a0 → 1a
omitting or adding a digit
Benefits
Good for spotting human errors such as:
Incorrect digit entered
Transposition error (two numbers change order)
Omitted digit or extra digit
28. CHECK DIGIT 10 DIGITS
Example 1: ISBN 1 84146 201 2 (Using modulus 11 technique)
Multiply the number by the code underneath it
Add the results from the bottom row together: 10 + 72 + 32 + 7 + 24 + 30 + 8 + 0 + 2 = 185
Divide the total by 11 and record the remainder. 185 divide by 11 = 16 with 9 remaining.
Take the remainder away from 11. 11 - 9 (the remainder from step 4) = 2
If the numbers are the same then the check digit has confirmed the original numbers were
entered correctly.
NOTE:
If the remainder is 0 then the check digit is 0
If the remainder is 10 then the check digit is X
WORKOUT: 184146208(-); 184146202(-); 817245124(-); 086163432(-); 817029584(-)
ISBN 1 8 4 1 4 6 2 0 1 SUM
WEIGHT 10 9 8 7 6 5 4 3 2
RESULT 10 72 32 7 24 30 8 0 2 185
29. CHECK DIGIT 13 DIGITS
Add the results from the bottom row together: 9+21+8+3+1+0+7+15+7+27+0+27 = 125
Round the result up to the nearest multiple of 10. What number should be added to 125 to make it a
multiple of 10. The answer is 5. Hence 5 is the check digit.
NOTE:
If the remainder is 0 then the check digit is 0
If the remainder is 10 then the check digit is X
WORKOUT: 978-981-086-524(-); 501-324-215-701(-); 400-638-133-393(-); 973-594-056-482(-)
ISBN 9 7 8 1 1 0 7 5 7 9 0 9 SUM
WEIGHT 1 3 1 3 1 3 1 3 1 3 1 3
PRODUCT 9 21 8 3 1 0 7 15 7 27 0 27 125
30. Questions - 1) a) Two bytes are
transmitted [A-level J14/P11/Qu 9]
Byte 1: 0101011
Byte 2: 1011011
The system uses even parity. An extra
eighth (8th) bit is used as the parity bit.
Give the parity bit values for byte 1 and
byte 2 to achieve even parity.
Parity bit value in byte 1: _______
Parity bit value in byte 2: ________ [2]
b) The vending machine transmits eight
codes (bytes), followed by a parity byte.
The following bytes have been received:
One of the eight bytes of data contains
an error that occurred during data
transmission.
Using an arrow, identify the byte
where the error has occurred.
Circle the bit that has been altered.
Explain your reason for choosing the
byte and bit identified above. [3]
31. 2 (a) A computer system uses even
parity. The leftmost position of each
byte is the parity bit. [A-level
J14/P16/Qu 7]
(i) Complete the byte below: [1]
(ii) The parity bit is used to perform a
parity check when a byte is transmitted
from computer A to computer B.
Explain how computer B will establish
whether or not the byte has been
transmitted correctly. [2]
(b) In addition to a parity bit check on a
byte, a parity block check is also carried
out. Computer A transmits four bytes
followed by a parity byte. The following
sequence of bytes has just been
received by computer B.
One of the four bytes has an error in
one of the bits.
(i) Identify the byte where the error
has occurred with an arrow. Circle the
bit that has been altered. [2]
(ii) Write down the corrected byte: [1]
(iii) Explain what the computer system
needs to do if more than 1 bit has
been transmitted wrongly. [2]
32. ANSWERS:
WORKOUT: 184146208(X); 184146202(0); 817245124(5); 086163432(2);
817029584(X)
WORKOUT: 978-981-086-524(5); 501-324-215-701(7); 400-638-133-393(X); 973-
594-056-482(4)
ANSWERS
1) a) 0,1
b) – byte 7: 0 0 1 1 1 0 1 1 has odd parity (shown by an arrow)
– column 5 (counting from the left) indicates that parity byte is incorrect in
position 5
– therefore, bit in row 7, column 5 is in error
– the bit in that position should change from 1 to 0 to make even parity in all
bytes
– this gives the corrected byte as: 0 0 1 1 0 0 1 1
33. 2 (a) i) 1
ii) computer “B” counts number of 1-bits
if number of 1-bits is even then byte has been transmitted correctly
if number of 1-bits is odd then byte has been corrupted during transmission
(b)
(i) 1 mark for identifying third byte and 1 mark for identifying 5 bit as an error
iii) • for example, a check sum
• brief description of check sum
• description of alternative checking method
• ask for data to be re-sent
34. THIS IS THE END OF UNIT 1.13
DATA STORAGE
YOU CAN GET MORE EXERCISES
FROM PAST EXAM PAPERS