2. Contents
Introduction to GFS
System Architecture
System Features
Working of GFS
Latest advancement
Conclusion
Questions
3. Introduction
More than 15,000 commodity-class PC's.
Multiple clusters distributed worldwide.
Thousands of queries served per second.
One query reads 100's of MB of data.
One query consumes 10's of billions of CPU
cycles.
Google stores dozens of copies of the entire Web!
Conclusion: Need large, distributed, highly
fault tolerant file system.
4. System Architecture
A GFS cluster consists of a single master and multiple
chunk-servers and is accessed by multiple clients.
5. Large Chunk
GFS uses large chunk: 64MB (1G = 1024 MB = 16 chunks)
Stored as a plain Linux file, which will be lazily extended up to 64MB.
Opt to many read and write on a given chunk
Reduces network overhead by keeping a connection to the chunk
server.
See also Map-Reduce, Big-Table.
6. Architecture (cont’d)
Chunkserver
Files are divided into fixed-size chunks (64 MB)
Each chunk is identified by an immutable and globally
unique 64 bit chunkhandle assigned by the master at the
time of chunkcreation
Chunkservers store chunks on local disks as Linux files
and read or write chunk data specified by a chunkhandle
For reliability, each chunk is replicated on multiple
chunkservers. (default 3 replicas)
GFS Client
GFS client code linked into each application implements
the file system API and communicates with the master
and chunkservers to read or write data on behalf of the
7. System Metadata
The master stores three major types of metadata:
The file and chunk namespaces
The mapping from files to chunks
The locations of each chunk’s replicas
All metadata is kept in the master’s memory
The first two types are also kept persistent by logging
mutations to an operation log stored on the master’s local disk
and replicated on remote machines.
The master does not store third type persistently. Instead, it
asks each chunkserver about its chunks at master startup
8. System Features
Page Rank- Probability that a random surfer visits the site
Citations (Back links)
•
How is Page Rank calculated??
•
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
where,
PR -> Page Rank of a page
T1….Tn -> Pages that point to Page A (citations)
d -> Damping Factor (0<d<1)
C(A) -> No. of Links going out from A
Page Rank of a page depends on-
Number of pages pointing to it.
Page Rank of the page that points to it.
9. System Features
Anchor Text- text associated with the link
Association with the page the link is on
•
• Association with the page the link points to( unique
to Google)
Advantages:
• Anchors contain more information than the pages
themselves
• Documents that cannot be indexed can be displayed
Other Features:
Proximity of location information in search for all hits
•
Track of visual presentation details
•
12. Google Query Evaluation
Parse the query.
1.
Convert words into wordIDs.
2.
Seek to the start of the doclist in the short barrel for every word.
3.
Scan through the doclists until there is a document that matches all the
4.
search terms.
Compute the rank of that document for the query.
5.
If in the short barrels and at the end of any doclist, seek to the start of
6.
the doclist in the full barrel for every word and go to step 4.
If we are not at the end of any doclist go to step 4.
7.
Sort the documents that have matched by rank and return the top k.
13. Client Read
Client sends master:
read(file name, chunk index)
Master’s reply:
chunk ID, chunk version number, locations of
replicas
Client sends “closest” chunkserver w/replica:
read(chunk ID, byte range)
“Closest” determined by IP address on simple rack-
based network topology
Chunkserver replies with data
14. Client Write
Some chunkserver is primary for each chunk
Master grants lease to primary (typically for 60 sec.)
Leases renewed using periodic heartbeat messages
between master and chunkservers
Client asks server for primary and secondary replicas for
each chunk
Client sends data to replicas in daisy chain
Pipelined: each replica forwards as it receives
Takes advantage of full-duplex Ethernet links
15. Client Write (2)
All replicas acknowledge data write to client
Client sends write request to primary
Primary assigns serial number to write
request, providing ordering
Primary forwards write request with same serial
number to secondaries
Secondaries all reply to primary after completing
write
Primary replies to client
17. What Happen If the Master Reboots?
Replays log from disk
Recovers namespace (directory) information
Recovers file-to-chunk-ID mapping
Asks chunkservers which chunks they hold
Recovers chunk-ID-to-chunkserver mapping
If chunk server has older chunk, it’s stale
Chunk server down at lease renewal
If chunk server has newer chunk, adopt its version
number
Master may have failed while granting lease
18. What Happen if Chunkserver Fails?
Master notices missing heartbeats
Master decrements count of replicas for all chunks on
dead chunkserver
Master re-replicates chunks missing replicas in
background
Highest priority for chunks missing greatest
number of replicas
19. Latest Advancement
gMail - An easily configurable email
service with 1GB of web space.
Blogger- A free web-based service that
helps consumers publish on the
web without writing code or installing
software.
Google “next generation corporate s/w”
- A smaller version of the google
software, modified for private use.
20. Conclusion
Success: used actively by Google to support search service and
other applications
Availability and recoverability on cheap hardware
High throughput by decoupling control and data
Supports massive data sets and concurrent appends
Semantics not transparent to apps
Must verify file contents to avoid inconsistent
regions, repeated appends (at-least-once semantics)
Performance not good for all apps
Assumes read-once, write-once workload (no client
caching!)