7. 3000 BC 1200 BC 300 BC Recording Aggregating Storing at Scale
8. 300s AD – Random Access 3000 BC 1200 BC 300 BC 300 AD Recording Aggregating Storing at Scale Random Access
9. 3000 BC 1200 BC 300 BC 300 AD 1400 AD Recording Aggregating Storing at Scale Random Access Mass Distribution
10. 3000 BC 1200 BC 300 BC 300 AD 1400 AD 1700 AD Recording Aggregating Storing at Scale Random Access Mass Distribution Infographics
11. 1930s – Computation theory (Turing) 1940s – Information theory (Shannon) 1950s – Computer languages (1GL,2GL,3GL) 1960s – Standardized metadata (Avram) 1970s – Relational databases (IBM) 1980s – WWW (Al Gore ) 1990s – Internet archive (Kahle) 3000 BC 1200 BC 300 BC 300 AD 1400 AD 1700 AD Recording Aggregating Storing at Scale Random Access Mass Distribution Infographics
12.
13.
14.
15.
16.
17.
18. Tables on web pages Open APIs Commercial data sources Augmentation Completion Normalization Name ZIP Average Rent Walter Cureton 78701 $400-$599 Ivy Caldwell 94103 >$1500 Regina Wootton 10027 $1000-$1499 Name Address City ZIP Brian James 901 Red River Austin 78701 Terri Becraft 262 7th St. San Francisco 94103 Paz Brummit 603 W. 114th St. New York 10027 Name Address Normalized Address Cecil Bartz 901 red river austin texas 901 Red River, Austin, TX 78701 Genaro Luz 702 w. 32nd st austin 702 W. 32nd St., Austin, TX 78705 Ruth Brown 114th + broadway, nyc W. 114th St. & Broadway, New York, NY 10027
Babylon was first society to systematically record knowledge, including the first census which systematically counted and recorded people and commodities for taxation and other purposes
Library at Thebes was first known effort to gather and make many sources of knowledge available in one place
Charged with collecting all the world's knowledge, the Library of Alexandria collected what is thought to have been nearly a half million objects
Codex replaces scrolls, enabling random access of information, or browsing.
Gutenberg’s printing press enables mass production and distribution of information
William Playfair invents the line, bar and pie charts, paving the way for Charles Minard’s famous graphical representation of Napoleon’s March
Alan Turing showed that any reasonable computation could be done by programming a machine Claude Shannon solved the engineering problem of the transmission of information over a noisy channel Computer language advanced quickly from first generation languages to third generation languages such as COBAL Henriette Avram created the Machine-readable cataloging system to metatag books Relational databases enabled storing and lookups of data at scale Tim Berners-Lee creates WWW which leads to mass adoption of internet, quickly growing to billions of pages, causing Brewster Kahle to begin systematically capturing and storing the information 1930s – Computation theory (Turing) 1940s – Information theory (Shannon) 1950s – Computer languages (1GL, 2GL, 3GL) 1960s – Standardized metadata (Avram) 1970s – Relational databases (IBM) 1980s – WWW (Al Gore ) 1990s – Internet archive (Kahle)
1.8 ZB of data but still hard to find the pieces you want
Aggregated, organized, accessible. When you can easily identify, understand and access the pieces, you can build anything.
Map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian campaign of 1812