SlideShare une entreprise Scribd logo
1  sur  69
12 November 2013

University of Virginia cs4414

1
Why is storage complicated?

12 November 2013

University of Virginia cs4414

2
Delay Lines

12 November 2013

University of Virginia cs4414

3
Mercury Delay Lines
0/1

12 November 2013

University of Virginia cs4414

4
12 November 2013

University of Virginia cs4414

5
12 November 2013

University of Virginia cs4414

6
Speed of Sound
Air
343 m/s
Mercury 1450 m/s (40° C)
Water
1500 m/s (25° C)

12 November 2013

Why Mercury?
0/1

University of Virginia cs4414

7
Magnetic Core Memory

MIT Project Whirlwind, 1951
2K 16-bit words
with “no waiting”!

12 November 2013

University of Virginia cs4414

8
SRAM

NOT

NOT

12 November 2013

University of Virginia cs4414

9
Four-Transistor SRAM Bit

12 November 2013

University of Virginia cs4414

10
Modern DRAM

12 November 2013

University of Virginia cs4414

11
After Turning off Power

5 seconds
12 November 2013

30 seconds
University of Virginia cs4414

5 minutes
12
cycles (at 800MHz) to read a
particular row = 13.75ns

= 185° F

12 November 2013

University of Virginia cs4414

13
Storage Systems
Device
Mercury (Gin)
Delay Line

Example
UNIVAC (1951)

Time to Access
220,000ns
(average)

Cost per Bit
$ 0.38 (1968)
(a bazillion n$)

DRAM

Kingston
KVR16N11/4 4GB
DDR3 ($40)

13.75ns

1.16 n$

UNIVAC 1968 (Core memory): $823,500 for 131 K 16-bit words

12 November 2013

University of Virginia cs4414

14
Cheaper, More Persistent Storage

12 November 2013

University of Virginia cs4414

15
How big
is a TB?

12 November 2013

University of Virginia cs4414

16
Storage Systems
Device

Example

Time to Access

Mercury (Gin)
Delay Line

UNIVAC (1951)

220,000ns
(average)

DRAM

Kingston
KVR16N11/4 4GB
DDR3 ($40)

13.75ns

1.16 n$

Hard Drive

Seagate Desktop
HDD 4 TB SATA
6Gb/s NCQ 64MB

?

0.0046 n$

12 November 2013

University of Virginia cs4414

Cost per Bit
$ 0.38 (1968)
(a bazillion n$)

17
Accessing a Hard Drive

“seek time”
~ 0.1ms
rotate time:
1/5900rpm ~ max 10ms
12 November 2013

University of Virginia cs4414

5900 rpm spindle

18
Passing the Drop Test
12 November 2013

University of Virginia cs4414

19
Passing the Drop Test
12 November 2013

University of Virginia cs4414

20
Storage Systems
Device

Example

Time to Access

Mercury (Gin)
Delay Line

UNIVAC (1951)

220,000ns
(average)

DRAM

Kingston
KVR16N11/4 4GB
DDR3 ($40)

13.75ns

1.16 n$

Hard Drive

Seagate Desktop
HDD 4 TB SATA
6Gb/s NCQ 64MB

5ms (ave)

0.0046 n$

12 November 2013

University of Virginia cs4414

Cost per Bit
$ 0.38 (1968)
(a bazillion n$)

21
Storage Abstractions

12 November 2013

University of Virginia cs4414

22
12 November 2013

University of Virginia cs4414

23
12 November 2013

University of Virginia cs4414

24
Storage Abstractions
Memory Location

File

Do we really need both?
12 November 2013

University of Virginia cs4414

What about: database, URI?
25
Unix File Abstraction

12 November 2013

University of Virginia cs4414

26
Which are files?
class24.pptx
/Users/dave/OS/classes/
OS-provided random numbers

12 November 2013

University of Virginia cs4414

27
“Everything is a File”
class24.pptx

/mnt/cdrom

/Users/dave/OS/classes/
OS-provided random numbers

/dev/tty0

/dev/random
12 November 2013

University of Virginia cs4414

28
inode
represents a
file

Size of File (bytes)
Device ID
User ID
Group ID
File Mode (permission bits)
Link count (number of hard links to node)
…

Diskmap

12 November 2013

University of Virginia cs4414

29
include/linux/fs.h

12 November 2013

University of Virginia cs4414

30
Size of File (bytes)
Device ID
User ID

Group ID

stat

File Mode (permission bits)
Link count (number of hard links to node)
…

Diskmap

> stat -x class24.pptx
File: "class24.pptx"
Size: 5855495 FileType: Regular File
Mode: (0644/-rw-r--r--)
Uid: ( 501/ dave) Gid: ( 20/ staff)
Device: 1,2 Inode: 6706357 Links: 1
Access: Wed Nov 20 15:00:41 2013
Modify: Wed Nov 20 14:23:13 2013
Change: Wed Nov 20 14:23:13 2013

12 November 2013

University of Virginia cs4414

31
> ln class24.pptx todays-class.pptx
> stat -x class24.pptx
File: "class24.pptx"
Size: 5855495 FileType: Regular File
Mode: (0644/-rw-r--r--)
Uid: ( 501/ dave) Gid: ( 20/ staff)
Device: 1,2 Inode: 6706357 Links: 2
Access: ..
> stat -x todays-class.pptx
File: "todays-class.pptx"
Size: 5855495 FileType: Regular File
Mode: (0644/-rw-r--r--)
Uid: ( 501/ dave) Gid: ( 20/ staff)
Device: 1,2 Inode: 6706357 Links: 2
> rm class24.pptx
> stat -x class24.pptx
stat: class24.pptx: stat: No such file or directory
> stat -x todays-class.pptx
File: "todays-class.pptx"
Size: 5855495 FileType: Regular File
Mode: (0644/-rw-r--r--)
Uid: ( 501/ dave) Gid: ( 20/ staff)
Device: 1,2 Inode: 6706357 Links: 1
12 November 2013

University of Virginia cs4414

32
Removing a linked file like this is very confusing for PowerPoint…

12 November 2013

University of Virginia cs4414

33
Size of File (bytes)

Diskmap
(Unix System 5)

Device ID
User ID

Group ID
File Mode (permission bits)

0

Link count (number of hard links to node)
…

1
2

Diskmap

…
9
10

Disk Block
(1K bytes)

Disk Block
(1K bytes)

11
12
12 November 2013

Disk Block
(1K bytes)
University of Virginia cs4414

34
Diskmap
(Unix System 5)
0
1

Disk Block
Disk Block
(1K Block
Diskbytes)
(1K bytes)
(1K bytes)

Indirect
Disk Block
(1K bytes)

4 bytes for each = 256 pointers

2
…
9
10

Disk Block
(1K bytes)

Disk Block
(1K bytes)

11
12
12 November 2013

Disk Block
(1K bytes)
University of Virginia cs4414

35
Diskmap
(Unix System 5)
0
1
2
…
9

Indirect
Disk Block
(1K bytes)

Disk Block
Disk Block
(1K Block
Diskbytes)
(1K bytes)
(1K bytes)

4 bytes for each = 256 pointers

Double
Indirect
Disk Block

Indirect
Indirect
Disk Block
Disk Block
(1K bytes)
(1K bytes)

D
DD
(
(1
(

10
11
12
12 November 2013

University of Virginia cs4414

36
Diskmap
(Unix System 5)
0
1
2
…
9

Indirect
Disk Block
(1K bytes)

Disk Block
Disk Block
(1K Block
Diskbytes)
(1K bytes)
(1K bytes)

4 bytes for each = 256 pointers

Double
Indirect
Disk Block

Indirect
Indirect
Disk Block
Disk Block
(1K bytes)
(1K bytes)

D
DD
(
(1
(

10
11
12
12 November 2013

How would you determine if your
file system has this structure?

University of Virginia cs4414

37
Diskmap
(Unix System 5)
0
1
2
…
9

Indirect
Disk Block
(1K bytes)

Disk Block
Disk Block
(1K Block
Diskbytes)
(1K bytes)
(1K bytes)

4 bytes for each = 256 pointers

Double
Indirect
Disk Block

Indirect
Indirect
Disk Block
Disk Block
(1K bytes)
(1K bytes)

D
DD
(
(1
(

10
11
12
12 November 2013

Disk Block
(1K bytes)
University of Virginia cs4414

38
Directories are Files Too!
Filename

Inode

.
..
.DS_Store

494211
494205
494212

class0
class1
class10
class11
…
class19
class2
… November 2013
12

6565946
6565826
1467012
2252968
…
5649155
494218
… University of Virginia cs4414

ls -ali

39
> brew install tree # needed on MacOS X, but builtin to most Unixes

12 November 2013

University of Virginia cs4414

40
How to create a new file?

12 November 2013

University of Virginia cs4414

41
Finding a Free Block
Data
0
1
…

I-List (inodes)

98
99

0
1
…
98

99

Superblock
List of free disk blocks

Boot block
12 November 2013

Not to scale!
University of Virginia cs4414

42
Finding a Free inode
Data
0
1
2

3
…

I-List (inodes)
Superblock
Boot block
12 November 2013

0
1
0

0
…

Superblock keeps a cache of free inodes

Not to scale!
University of Virginia cs4414

43
Modern File Systems

12 November 2013

University of Virginia cs4414

44
What should a modern file system
do that Unix S5FS doesn’t?

12 November 2013

University of Virginia cs4414

45
Handling Failures

ZFS
Developed for Solaris, 2005
Now open source:
http://open-zfs.org/

“MacZFS is free data storage and protection software
for all Mac OS users. It's for people who have Mac OS,
who have any data, and who really like their data.
Whether on a single-drive laptop or on a massive
server, it'll store your petabytes with ragingly redundant
RAID reliability, and it'll keep the bit-rotted bleeps and
bloops out of your iTunes library.”
12 November 2013

University of Virginia cs4414

46
Block Checksums
0

Checksum
Block (SHA-256)
0
40a3dc…

1

1

2c5829d…

2

2

955d253
…
…

…
9

Disk Block
(1K bytes)

10

…

ZFS

11
12

S5FS
12 November 2013

How do you check the checksums?
University of Virginia cs4414

47
Hashing the Hashes

Hash(B1)

Hash(B2)

Hash(B2)

Hash(B2)

Block 1

Block 2

Block 3

Block 4

12 November 2013

University of Virginia cs4414

48
Merkle Tree
Ralph Merkle

Hash(B1)

Hash(B2)

Hash(B2)

Hash(B2)

Block 1

Block 2

Block 3

Block 4

12 November 2013

University of Virginia cs4414

49
Recovery
Copy 1

One
Copy
Copy 2
Keep 2 copies of every block: if
checksum fails for first copy
read, try reading second copy.
12 November 2013

copies = 2

University of Virginia cs4414

50
For the truly paranoid…

Copy 1

One
Copy

Copy 2

Copy 3
copies = 3
12 November 2013

University of Virginia cs4414

51
For the fairly paranoid but cheap…

RAID

Redundant
Arrays of
Inexpensive
Disks
ACM SIGMOD 1988

whitehouse.gov
12 November 2013

University of Virginia cs4414

52
Case for RAID

12 November 2013

University of Virginia cs4414

53
12 November 2013

University of Virginia cs4414

54
Redundancy

12 November 2013

University of Virginia cs4414

55
12 November 2013

University of Virginia cs4414

56
Improving Performance
Cache (64MB DRAM)
Adaptive Replacement Cache

12 November 2013

University of Virginia cs4414

57
Adaptive Replacement Cache
Blocks in Cache

Accessed Again

T1: Recent Cache Entries

T2: Frequently-Used Blocks

“Ghost” Entries

Size of T1 adapts

B1: Evicted from T1 (LRU)

B2: Evicted from T2 (LRU)

How should relative size of T1 and T2 be adjusted?
12 November 2013

University of Virginia cs4414

58
Adaptive Replacement Cache
Blocks in Cache

Accessed Again

T1: Recent Cache Entries

T2: Frequently-Used Blocks

“Ghost” Entries

Size of T1 adapts

B1: Evicted from T1 (LRU)

B2: Evicted from T2 (LRU)

Hit in B1: should increase size of T1, drop entry from T2 to B2
Hit in B2: should increase size of T2, drop entry from T1 to B1

12 November 2013

University of Virginia cs4414

59
IBM Almaden Research Center
12 November 2013

University of Virginia cs4414

60
Do you actually have a disk like this on
your main computing device?

Cache (64MB DRAM)

12 November 2013

University of Virginia cs4414

61
Flash Memory

Solid State Drive

12 November 2013

University of Virginia cs4414

62
Storage Systems
Device

Example

Time to Access

Mercury (Gin) Delay
Line

UNIVAC (1951)

220,000ns
(average)

DRAM

Kingston
KVR16N11/4 4GB
DDR3 ($40)

13.75ns

1.16 n$

Hard Drive

Seagate Desktop
HDD 4 TB SATA
6Gb/s NCQ 64MB

5,000,000ns

0.0046 n$

SSD

Samsung
500GB ($300)

?

0.075 n$

12 November 2013

University of Virginia cs4414

Cost per Bit
$ 0.38 (1968)
(a bazillion n$)

63
12 November 2013

University of Virginia cs4414

64
12 November 2013

University of Virginia cs4414

65
12 November 2013

University of Virginia cs4414

66
Storage Systems
Device

Modern Hard Drive

Mercury (Gin) Delay
Line

Example

Time to Access

UNIVAC (1951)

220,000ns
(average)

DRAM

Kingston
KVR16N11/4 4GB
DDR3 ($40)

SSD

Samsung
~10,000 ns
500GB ($300) (for random read)

Disk Drive

12 November 2013

Seagate Desktop
HDD 4 TB SATA
6Gb/s NCQ 64MB

13.75ns

5,000,000ns

University of Virginia cs4414

Cost per Bit
$ 0.38 (1968)
(a bazillion n$)
1.16 n$

0.075 n$
0.0046 n$

67
Storage systems should be
designed around
hardware capabilities and
workload
Today’s OSes mostly use
filesystems designed
around 1990s disks and
1960s workloads!
But, with lots of clever
hacks to make them work
okay on today’s hardware
and workloads

12 November 2013

University of Virginia cs4414

Charge
More from Wilkes 1967:

68

Contenu connexe

Similaire à Storage Systems

Once Upon a Process
Once Upon a ProcessOnce Upon a Process
Once Upon a ProcessDavid Evans
 
What the &~#@<!? (Memory Management in Rust)
What the &~#@<!? (Memory Management in Rust)What the &~#@<!? (Memory Management in Rust)
What the &~#@<!? (Memory Management in Rust)David Evans
 
DUDE AT SAOUG 2008
DUDE AT SAOUG 2008DUDE AT SAOUG 2008
DUDE AT SAOUG 2008Kugendran
 
Data Capacitor II at Indiana University
Data Capacitor II at Indiana UniversityData Capacitor II at Indiana University
Data Capacitor II at Indiana Universityinside-BigData.com
 
Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)packetloop
 
OSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdfOSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdfnitinscribd
 
Web Server Scheduling
Web Server SchedulingWeb Server Scheduling
Web Server SchedulingDavid Evans
 
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?MongoDB
 
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?MongoDB
 
An NSA Big Graph experiment
An NSA Big Graph experimentAn NSA Big Graph experiment
An NSA Big Graph experimentTrieu Nguyen
 
Preserving Computer-Aided Design, Digital Preservation Coalition Report
Preserving Computer-Aided Design, Digital Preservation Coalition ReportPreserving Computer-Aided Design, Digital Preservation Coalition Report
Preserving Computer-Aided Design, Digital Preservation Coalition ReportRuggero Lancia
 
Workload Isolation - Asya Kamsky
Workload Isolation - Asya KamskyWorkload Isolation - Asya Kamsky
Workload Isolation - Asya KamskyMongoDB
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
 

Similaire à Storage Systems (20)

Introduction to UNIX
Introduction to UNIXIntroduction to UNIX
Introduction to UNIX
 
Once Upon a Process
Once Upon a ProcessOnce Upon a Process
Once Upon a Process
 
What the &~#@<!? (Memory Management in Rust)
What the &~#@<!? (Memory Management in Rust)What the &~#@<!? (Memory Management in Rust)
What the &~#@<!? (Memory Management in Rust)
 
Storage
StorageStorage
Storage
 
DUDE AT SAOUG 2008
DUDE AT SAOUG 2008DUDE AT SAOUG 2008
DUDE AT SAOUG 2008
 
Scheduling
SchedulingScheduling
Scheduling
 
Data Capacitor II at Indiana University
Data Capacitor II at Indiana UniversityData Capacitor II at Indiana University
Data Capacitor II at Indiana University
 
Hard Disk
Hard Disk Hard Disk
Hard Disk
 
Access Control
Access ControlAccess Control
Access Control
 
Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)Ruxcon Finding Needles in Haystacks (the size of countries)
Ruxcon Finding Needles in Haystacks (the size of countries)
 
OSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdfOSC-Fall-Tokyo-2012-v9.pdf
OSC-Fall-Tokyo-2012-v9.pdf
 
Web Server Scheduling
Web Server SchedulingWeb Server Scheduling
Web Server Scheduling
 
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
MongoDB.local Austin 2018: Workload Isolation: Are You Doing it Wrong?
 
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?
MongoDB.local DC 2018: Workload Isolation: Are You Doing It Wrong?
 
An NSA Big Graph experiment
An NSA Big Graph experimentAn NSA Big Graph experiment
An NSA Big Graph experiment
 
Preserving Computer-Aided Design, Digital Preservation Coalition Report
Preserving Computer-Aided Design, Digital Preservation Coalition ReportPreserving Computer-Aided Design, Digital Preservation Coalition Report
Preserving Computer-Aided Design, Digital Preservation Coalition Report
 
U12
U12U12
U12
 
Workload Isolation - Asya Kamsky
Workload Isolation - Asya KamskyWorkload Isolation - Asya Kamsky
Workload Isolation - Asya Kamsky
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFS
 

Plus de David Evans

Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!David Evans
 
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksTrick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksDavid Evans
 
Hidden Services, Zero Knowledge
Hidden Services, Zero KnowledgeHidden Services, Zero Knowledge
Hidden Services, Zero KnowledgeDavid Evans
 
Anonymity in Bitcoin
Anonymity in BitcoinAnonymity in Bitcoin
Anonymity in BitcoinDavid Evans
 
Midterm Confirmations
Midterm ConfirmationsMidterm Confirmations
Midterm ConfirmationsDavid Evans
 
Scripting Transactions
Scripting TransactionsScripting Transactions
Scripting TransactionsDavid Evans
 
How to Live in Paradise
How to Live in ParadiseHow to Live in Paradise
How to Live in ParadiseDavid Evans
 
Mining Economics
Mining EconomicsMining Economics
Mining EconomicsDavid Evans
 
Becoming More Paranoid
Becoming More ParanoidBecoming More Paranoid
Becoming More ParanoidDavid Evans
 
Asymmetric Key Signatures
Asymmetric Key SignaturesAsymmetric Key Signatures
Asymmetric Key SignaturesDavid Evans
 
Introduction to Cryptography
Introduction to CryptographyIntroduction to Cryptography
Introduction to CryptographyDavid Evans
 
Class 1: What is Money?
Class 1: What is Money?Class 1: What is Money?
Class 1: What is Money?David Evans
 
Multi-Party Computation for the Masses
Multi-Party Computation for the MassesMulti-Party Computation for the Masses
Multi-Party Computation for the MassesDavid Evans
 
Proof of Reserve
Proof of ReserveProof of Reserve
Proof of ReserveDavid Evans
 
Blooming Sidechains!
Blooming Sidechains!Blooming Sidechains!
Blooming Sidechains!David Evans
 
Useful Proofs of Work, Permacoin
Useful Proofs of Work, PermacoinUseful Proofs of Work, Permacoin
Useful Proofs of Work, PermacoinDavid Evans
 

Plus de David Evans (20)

Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!Cryptocurrency Jeopardy!
Cryptocurrency Jeopardy!
 
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for CypherpunksTrick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
Trick or Treat?: Bitcoin for Non-Believers, Cryptocurrencies for Cypherpunks
 
Hidden Services, Zero Knowledge
Hidden Services, Zero KnowledgeHidden Services, Zero Knowledge
Hidden Services, Zero Knowledge
 
Anonymity in Bitcoin
Anonymity in BitcoinAnonymity in Bitcoin
Anonymity in Bitcoin
 
Midterm Confirmations
Midterm ConfirmationsMidterm Confirmations
Midterm Confirmations
 
Scripting Transactions
Scripting TransactionsScripting Transactions
Scripting Transactions
 
How to Live in Paradise
How to Live in ParadiseHow to Live in Paradise
How to Live in Paradise
 
Bitcoin Script
Bitcoin ScriptBitcoin Script
Bitcoin Script
 
Mining Economics
Mining EconomicsMining Economics
Mining Economics
 
Mining
MiningMining
Mining
 
The Blockchain
The BlockchainThe Blockchain
The Blockchain
 
Becoming More Paranoid
Becoming More ParanoidBecoming More Paranoid
Becoming More Paranoid
 
Asymmetric Key Signatures
Asymmetric Key SignaturesAsymmetric Key Signatures
Asymmetric Key Signatures
 
Introduction to Cryptography
Introduction to CryptographyIntroduction to Cryptography
Introduction to Cryptography
 
Class 1: What is Money?
Class 1: What is Money?Class 1: What is Money?
Class 1: What is Money?
 
Multi-Party Computation for the Masses
Multi-Party Computation for the MassesMulti-Party Computation for the Masses
Multi-Party Computation for the Masses
 
Proof of Reserve
Proof of ReserveProof of Reserve
Proof of Reserve
 
Silk Road
Silk RoadSilk Road
Silk Road
 
Blooming Sidechains!
Blooming Sidechains!Blooming Sidechains!
Blooming Sidechains!
 
Useful Proofs of Work, Permacoin
Useful Proofs of Work, PermacoinUseful Proofs of Work, Permacoin
Useful Proofs of Work, Permacoin
 

Dernier

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Storage Systems

  • 1.
  • 2. 12 November 2013 University of Virginia cs4414 1
  • 3. Why is storage complicated? 12 November 2013 University of Virginia cs4414 2
  • 4. Delay Lines 12 November 2013 University of Virginia cs4414 3
  • 5. Mercury Delay Lines 0/1 12 November 2013 University of Virginia cs4414 4
  • 6. 12 November 2013 University of Virginia cs4414 5
  • 7. 12 November 2013 University of Virginia cs4414 6
  • 8. Speed of Sound Air 343 m/s Mercury 1450 m/s (40° C) Water 1500 m/s (25° C) 12 November 2013 Why Mercury? 0/1 University of Virginia cs4414 7
  • 9. Magnetic Core Memory MIT Project Whirlwind, 1951 2K 16-bit words with “no waiting”! 12 November 2013 University of Virginia cs4414 8
  • 11. Four-Transistor SRAM Bit 12 November 2013 University of Virginia cs4414 10
  • 12. Modern DRAM 12 November 2013 University of Virginia cs4414 11
  • 13. After Turning off Power 5 seconds 12 November 2013 30 seconds University of Virginia cs4414 5 minutes 12
  • 14. cycles (at 800MHz) to read a particular row = 13.75ns = 185° F 12 November 2013 University of Virginia cs4414 13
  • 15. Storage Systems Device Mercury (Gin) Delay Line Example UNIVAC (1951) Time to Access 220,000ns (average) Cost per Bit $ 0.38 (1968) (a bazillion n$) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ UNIVAC 1968 (Core memory): $823,500 for 131 K 16-bit words 12 November 2013 University of Virginia cs4414 14
  • 16. Cheaper, More Persistent Storage 12 November 2013 University of Virginia cs4414 15
  • 17. How big is a TB? 12 November 2013 University of Virginia cs4414 16
  • 18. Storage Systems Device Example Time to Access Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ Hard Drive Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB ? 0.0046 n$ 12 November 2013 University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 17
  • 19. Accessing a Hard Drive “seek time” ~ 0.1ms rotate time: 1/5900rpm ~ max 10ms 12 November 2013 University of Virginia cs4414 5900 rpm spindle 18
  • 20. Passing the Drop Test 12 November 2013 University of Virginia cs4414 19
  • 21. Passing the Drop Test 12 November 2013 University of Virginia cs4414 20
  • 22. Storage Systems Device Example Time to Access Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ Hard Drive Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB 5ms (ave) 0.0046 n$ 12 November 2013 University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 21
  • 23. Storage Abstractions 12 November 2013 University of Virginia cs4414 22
  • 24. 12 November 2013 University of Virginia cs4414 23
  • 25. 12 November 2013 University of Virginia cs4414 24
  • 26. Storage Abstractions Memory Location File Do we really need both? 12 November 2013 University of Virginia cs4414 What about: database, URI? 25
  • 27. Unix File Abstraction 12 November 2013 University of Virginia cs4414 26
  • 28. Which are files? class24.pptx /Users/dave/OS/classes/ OS-provided random numbers 12 November 2013 University of Virginia cs4414 27
  • 29. “Everything is a File” class24.pptx /mnt/cdrom /Users/dave/OS/classes/ OS-provided random numbers /dev/tty0 /dev/random 12 November 2013 University of Virginia cs4414 28
  • 30. inode represents a file Size of File (bytes) Device ID User ID Group ID File Mode (permission bits) Link count (number of hard links to node) … Diskmap 12 November 2013 University of Virginia cs4414 29
  • 32. Size of File (bytes) Device ID User ID Group ID stat File Mode (permission bits) Link count (number of hard links to node) … Diskmap > stat -x class24.pptx File: "class24.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 1 Access: Wed Nov 20 15:00:41 2013 Modify: Wed Nov 20 14:23:13 2013 Change: Wed Nov 20 14:23:13 2013 12 November 2013 University of Virginia cs4414 31
  • 33. > ln class24.pptx todays-class.pptx > stat -x class24.pptx File: "class24.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 2 Access: .. > stat -x todays-class.pptx File: "todays-class.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 2 > rm class24.pptx > stat -x class24.pptx stat: class24.pptx: stat: No such file or directory > stat -x todays-class.pptx File: "todays-class.pptx" Size: 5855495 FileType: Regular File Mode: (0644/-rw-r--r--) Uid: ( 501/ dave) Gid: ( 20/ staff) Device: 1,2 Inode: 6706357 Links: 1 12 November 2013 University of Virginia cs4414 32
  • 34. Removing a linked file like this is very confusing for PowerPoint… 12 November 2013 University of Virginia cs4414 33
  • 35. Size of File (bytes) Diskmap (Unix System 5) Device ID User ID Group ID File Mode (permission bits) 0 Link count (number of hard links to node) … 1 2 Diskmap … 9 10 Disk Block (1K bytes) Disk Block (1K bytes) 11 12 12 November 2013 Disk Block (1K bytes) University of Virginia cs4414 34
  • 36. Diskmap (Unix System 5) 0 1 Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) Indirect Disk Block (1K bytes) 4 bytes for each = 256 pointers 2 … 9 10 Disk Block (1K bytes) Disk Block (1K bytes) 11 12 12 November 2013 Disk Block (1K bytes) University of Virginia cs4414 35
  • 37. Diskmap (Unix System 5) 0 1 2 … 9 Indirect Disk Block (1K bytes) Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) 4 bytes for each = 256 pointers Double Indirect Disk Block Indirect Indirect Disk Block Disk Block (1K bytes) (1K bytes) D DD ( (1 ( 10 11 12 12 November 2013 University of Virginia cs4414 36
  • 38. Diskmap (Unix System 5) 0 1 2 … 9 Indirect Disk Block (1K bytes) Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) 4 bytes for each = 256 pointers Double Indirect Disk Block Indirect Indirect Disk Block Disk Block (1K bytes) (1K bytes) D DD ( (1 ( 10 11 12 12 November 2013 How would you determine if your file system has this structure? University of Virginia cs4414 37
  • 39. Diskmap (Unix System 5) 0 1 2 … 9 Indirect Disk Block (1K bytes) Disk Block Disk Block (1K Block Diskbytes) (1K bytes) (1K bytes) 4 bytes for each = 256 pointers Double Indirect Disk Block Indirect Indirect Disk Block Disk Block (1K bytes) (1K bytes) D DD ( (1 ( 10 11 12 12 November 2013 Disk Block (1K bytes) University of Virginia cs4414 38
  • 40. Directories are Files Too! Filename Inode . .. .DS_Store 494211 494205 494212 class0 class1 class10 class11 … class19 class2 … November 2013 12 6565946 6565826 1467012 2252968 … 5649155 494218 … University of Virginia cs4414 ls -ali 39
  • 41. > brew install tree # needed on MacOS X, but builtin to most Unixes 12 November 2013 University of Virginia cs4414 40
  • 42. How to create a new file? 12 November 2013 University of Virginia cs4414 41
  • 43. Finding a Free Block Data 0 1 … I-List (inodes) 98 99 0 1 … 98 99 Superblock List of free disk blocks Boot block 12 November 2013 Not to scale! University of Virginia cs4414 42
  • 44. Finding a Free inode Data 0 1 2 3 … I-List (inodes) Superblock Boot block 12 November 2013 0 1 0 0 … Superblock keeps a cache of free inodes Not to scale! University of Virginia cs4414 43
  • 45. Modern File Systems 12 November 2013 University of Virginia cs4414 44
  • 46. What should a modern file system do that Unix S5FS doesn’t? 12 November 2013 University of Virginia cs4414 45
  • 47. Handling Failures ZFS Developed for Solaris, 2005 Now open source: http://open-zfs.org/ “MacZFS is free data storage and protection software for all Mac OS users. It's for people who have Mac OS, who have any data, and who really like their data. Whether on a single-drive laptop or on a massive server, it'll store your petabytes with ragingly redundant RAID reliability, and it'll keep the bit-rotted bleeps and bloops out of your iTunes library.” 12 November 2013 University of Virginia cs4414 46
  • 48. Block Checksums 0 Checksum Block (SHA-256) 0 40a3dc… 1 1 2c5829d… 2 2 955d253 … … … 9 Disk Block (1K bytes) 10 … ZFS 11 12 S5FS 12 November 2013 How do you check the checksums? University of Virginia cs4414 47
  • 49. Hashing the Hashes Hash(B1) Hash(B2) Hash(B2) Hash(B2) Block 1 Block 2 Block 3 Block 4 12 November 2013 University of Virginia cs4414 48
  • 50. Merkle Tree Ralph Merkle Hash(B1) Hash(B2) Hash(B2) Hash(B2) Block 1 Block 2 Block 3 Block 4 12 November 2013 University of Virginia cs4414 49
  • 51. Recovery Copy 1 One Copy Copy 2 Keep 2 copies of every block: if checksum fails for first copy read, try reading second copy. 12 November 2013 copies = 2 University of Virginia cs4414 50
  • 52. For the truly paranoid… Copy 1 One Copy Copy 2 Copy 3 copies = 3 12 November 2013 University of Virginia cs4414 51
  • 53. For the fairly paranoid but cheap… RAID Redundant Arrays of Inexpensive Disks ACM SIGMOD 1988 whitehouse.gov 12 November 2013 University of Virginia cs4414 52
  • 54. Case for RAID 12 November 2013 University of Virginia cs4414 53
  • 55. 12 November 2013 University of Virginia cs4414 54
  • 56. Redundancy 12 November 2013 University of Virginia cs4414 55
  • 57. 12 November 2013 University of Virginia cs4414 56
  • 58. Improving Performance Cache (64MB DRAM) Adaptive Replacement Cache 12 November 2013 University of Virginia cs4414 57
  • 59. Adaptive Replacement Cache Blocks in Cache Accessed Again T1: Recent Cache Entries T2: Frequently-Used Blocks “Ghost” Entries Size of T1 adapts B1: Evicted from T1 (LRU) B2: Evicted from T2 (LRU) How should relative size of T1 and T2 be adjusted? 12 November 2013 University of Virginia cs4414 58
  • 60. Adaptive Replacement Cache Blocks in Cache Accessed Again T1: Recent Cache Entries T2: Frequently-Used Blocks “Ghost” Entries Size of T1 adapts B1: Evicted from T1 (LRU) B2: Evicted from T2 (LRU) Hit in B1: should increase size of T1, drop entry from T2 to B2 Hit in B2: should increase size of T2, drop entry from T1 to B1 12 November 2013 University of Virginia cs4414 59
  • 61. IBM Almaden Research Center 12 November 2013 University of Virginia cs4414 60
  • 62. Do you actually have a disk like this on your main computing device? Cache (64MB DRAM) 12 November 2013 University of Virginia cs4414 61
  • 63. Flash Memory Solid State Drive 12 November 2013 University of Virginia cs4414 62
  • 64. Storage Systems Device Example Time to Access Mercury (Gin) Delay Line UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) 13.75ns 1.16 n$ Hard Drive Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB 5,000,000ns 0.0046 n$ SSD Samsung 500GB ($300) ? 0.075 n$ 12 November 2013 University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 63
  • 65. 12 November 2013 University of Virginia cs4414 64
  • 66. 12 November 2013 University of Virginia cs4414 65
  • 67. 12 November 2013 University of Virginia cs4414 66
  • 68. Storage Systems Device Modern Hard Drive Mercury (Gin) Delay Line Example Time to Access UNIVAC (1951) 220,000ns (average) DRAM Kingston KVR16N11/4 4GB DDR3 ($40) SSD Samsung ~10,000 ns 500GB ($300) (for random read) Disk Drive 12 November 2013 Seagate Desktop HDD 4 TB SATA 6Gb/s NCQ 64MB 13.75ns 5,000,000ns University of Virginia cs4414 Cost per Bit $ 0.38 (1968) (a bazillion n$) 1.16 n$ 0.075 n$ 0.0046 n$ 67
  • 69. Storage systems should be designed around hardware capabilities and workload Today’s OSes mostly use filesystems designed around 1990s disks and 1960s workloads! But, with lots of clever hacks to make them work okay on today’s hardware and workloads 12 November 2013 University of Virginia cs4414 Charge More from Wilkes 1967: 68