Big Data = Bigger Metadata

•Télécharger en tant que PPTX, PDF•

3 j'aime•581 vues

Ian White

Ian White of Urban Mapping from the O'Reilly Strata conference in Feb 2012

Business Formation Technologie

Big Data = Bigger Meta
O’Reilly Strata Conference
February 29 2012

Pivot/Skate, etc…
Founded 2003
Poor man’s GIS
Panamap

Refounded 2006
Neighborhood boundaries
Mass transit data

Refocused 2009
SaaS for mapping + on-demand data

Achtung!

NoSQL is no panacea
Big Data isn’t about data
Big Data isn’t new
Big Data doesn’t present a Boolean quandary
With power comes responsibility
AWS bills
Lady Gaga tweets
Innumeracy (correlation v causation)

Big v Important

Big Important
Heterogeneous Well-defined schema
Raw High value (not free)
Distributed Test-driven
Streaming/real time Relational
Search for meaning Historical
Time-sensitive Enterprise-focused
Philosophical

Data Exhaust

Analytics Probes

Social Media Gov 2.0

Platforms

Commoditization of compute and storage

A Brief History of Metadata

Callimachus Library of Alexandria, Egypt

A Brief History of Metadata

“Pinakes” (lists)
Title
Category
Author
Author birthplace
Father
Word count

Callimachus

A Brief History of Metadata

Card catalog room,
Library of Congress c. 1920

A Brief History of Metadata

Dewey Decimal System goes electronic in 1967

Out with the Old, in with the New

Archiving card catalogs
after digitization

Why Can’t We Be Together?

Metadata Data

Exponential Growth in Data

Unprecedented rate of data creation, 1995-today
Data

Pinakes Catalog Taxonomy Database

300 BC 1595 AD 1876 1970

Oh, How I’ve Missed You

The reunification of metadata
and the artifact

Enter the Data Curator

Part social scientist, part librarian,
part statistician, part RDBMS wiz

DIKW Model
Data
Fact, Signal, Symbol
Information
Structural v Functional
Symbolic v Subjective
Knowledge
Processed
Procedural
Propositional

Thank you!
ian@urbanmapping.com
@urbanmapping

R.I.P.
Schema

Contenu connexe

En vedette

The Big MetadataDaniela Tomova

Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni

Creating a Modern Data ArchitectureZaloni

JOSA TechTalk: Metadata Management in Big DataJordan Open Source Association

Data Harmony Thesaurus Master®Access Innovations, Inc.

3 dw architecturesClaudia Gomez

10 razones para quiebran un emprendimiento (2)Ronald Quiros

Big Data Madison: Architecting for Big Data (with notes)MIO | the data experts

Self-Service Access and Exploration of Big DataInside Analysis

Inline Tagging and Dictionary ConnectionAccess Innovations, Inc.

Convergence and Interoperability (IFLA 2011)Figoblog

Work In Progresssamluk

The Design of DataIan White

Project-imp Report 02samluk

მშობლიურის აქტივობაcira75

Paolo ciccarese DILS 2013 keynotePaolo Ciccarese

Chapter 2 5gmaidekamido

Assistive Technology Webquestangtapper

დედაენაcira75

An Integrated Solution for Runtime Compliance Governance in SOAAliaksandr Birukou

En vedette (20)

The Big Metadata

Understanding Metadata: Why it's essential to your big data solution and how ...

Creating a Modern Data Architecture

JOSA TechTalk: Metadata Management in Big Data

Data Harmony Thesaurus Master®

3 dw architectures

10 razones para quiebran un emprendimiento (2)

Big Data Madison: Architecting for Big Data (with notes)

Self-Service Access and Exploration of Big Data

Inline Tagging and Dictionary Connection

Convergence and Interoperability (IFLA 2011)

Work In Progress

The Design of Data

Project-imp Report 02

მშობლიურის აქტივობა

Paolo ciccarese DILS 2013 keynote

Chapter 2 5

Assistive Technology Webquest

დედაენა

An Integrated Solution for Runtime Compliance Governance in SOA

Similaire à Big Data = Bigger Metadata

STI Summit 2011 - Digital WorldsSemantic Technology Institute International

Normalization: A Workshop for Everybody Pt. 1Command Prompt., Inc

Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Martin Kalfatovic

There's no such thing as big dataAndrew Clegg

Tech4Africa - Opportunities around Big DataSteve Watt

What is a database (for non techies)Eric Tachibana

NoSQL and MapReduceJ Singh

Data Mining: Future Trends and ApplicationsIJMER

Cs501 dm introKamal Singh Lodhi

CBS CEDAR PresentationAlbert Meroño-Peñuela

introduction to data warehousing and miningRajesh Chandra

Thinking of LinkingMartin Kalfatovic

Data MonetizationKiran Donepudi

Base de datos historiaJose Carlos Romero Rojas

From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze

INF2190_W1_2016_publicAttila Barta

Scaling Out With Hadoop And HBaseAge Mooij

Steve Watt PresentationBig Data Houston

Big Metadata: Mining Special Collections Catalogs for New KnowledgeAllison Jai O'Dell

Chapter 1. Introductionbutest

Similaire à Big Data = Bigger Metadata (20)

STI Summit 2011 - Digital Worlds

Normalization: A Workshop for Everybody Pt. 1

Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...

There's no such thing as big data

Tech4Africa - Opportunities around Big Data

What is a database (for non techies)

NoSQL and MapReduce

Data Mining: Future Trends and Applications

Cs501 dm intro

CBS CEDAR Presentation

introduction to data warehousing and mining

Thinking of Linking

Data Monetization

Base de datos historia

From Web Data to Knowledge: on the Complementarity of Human and Artificial In...

INF2190_W1_2016_public

Scaling Out With Hadoop And HBase

Steve Watt Presentation

Big Metadata: Mining Special Collections Catalogs for New Knowledge

Chapter 1. Introduction

Plus de Ian White

Everything about Data for SV2B in Vilnius, LithuaniaIan White

Departmental Seminar: InnovationIan White

Tableau Customer Conference - Geographic AnalysisIan White

How Open Is Open (Redux)?Ian White

Geotrends For 2011 And BeyondIan White

Dark Side Of DataIan White

How Open Is Open?Ian White

$Location Doesn\'t Matter$ $Location Doesn\'t Matter$

Location Doesn\'t MatterIan White

Plus de Ian White (8)

Everything about Data for SV2B in Vilnius, Lithuania

Departmental Seminar: Innovation

Tableau Customer Conference - Geographic Analysis

How Open Is Open (Redux)?

Geotrends For 2011 And Beyond

Dark Side Of Data

How Open Is Open?

$Location Doesn\'t Matter$ $Location Doesn\'t Matter$

Location Doesn\'t Matter

Dernier

FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044

8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044

Cyber Security Training in Office Environmentelijahj01012

Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan

Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar

Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps

Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts

Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057

8447779800, Low rate Call girls in Saket Delhi NCRashishs7044

8447779800, Low rate Call girls in New Ashok Nagar Delhi NCRashishs7044

Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057

Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras

8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044

APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua

Organizational Structure Running A Successful BusinessSeta Wicaksana

Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar

The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari

MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic

Financial-Statement-Analysis-of-Coca-cola-Company.pptxsaniyaimamuddin

Dernier (20)

FULL ENJOY Call girls in Paharganj Delhi | 8377087607

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR

8447779800, Low rate Call girls in Rohini Delhi NCR

Cyber Security Training in Office Environment

Independent Call Girls Andheri Nightlaila 9967584737

Marketplace and Quality Assurance Presentation - Vincent Chirchir

Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck

Buy gmail accounts.pdf Buy Old Gmail Accounts

Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service

8447779800, Low rate Call girls in Saket Delhi NCR

8447779800, Low rate Call girls in New Ashok Nagar Delhi NCR

Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon

Traction part 2 - EOS Model JAX Bridges.

8447779800, Low rate Call girls in Uttam Nagar Delhi NCR

APRIL2024_UKRAINE_xml_0000000000000 .pdf

Organizational Structure Running A Successful Business

Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...

The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx

MAHA Global and IPR: Do Actions Speak Louder Than Words?

Financial-Statement-Analysis-of-Coca-cola-Company.pptx

Big Data = Bigger Metadata

1. Big Data = Bigger Meta O’Reilly Strata Conference February 29 2012

2. Pivot/Skate, etc… Founded 2003 Poor man’s GIS Panamap Refounded 2006 Neighborhood boundaries Mass transit data Refocused 2009 SaaS for mapping + on-demand data

3. Achtung! NoSQL is no panacea Big Data isn’t about data Big Data isn’t new Big Data doesn’t present a Boolean quandary With power comes responsibility AWS bills Lady Gaga tweets Innumeracy (correlation v causation)

4. Big v Important Big Important Heterogeneous Well-defined schema Raw High value (not free) Distributed Test-driven Streaming/real time Relational Search for meaning Historical Time-sensitive Enterprise-focused Philosophical

5. Data Exhaust Analytics Probes Social Media Gov 2.0

6. Platforms Commoditization of compute and storage

7. A Brief History of Metadata Callimachus Library of Alexandria, Egypt

8. A Brief History of Metadata “Pinakes” (lists) Title Category Author Author birthplace Father Word count Callimachus

9. A Brief History of Metadata

10. A Brief History of Metadata

11. A Brief History of Metadata Card catalog room, Library of Congress c. 1920

12. A Brief History of Metadata Dewey Decimal System goes electronic in 1967

13. Out with the Old, in with the New Archiving card catalogs after digitization

14. Why Can’t We Be Together? Metadata Data

15. Exponential Growth in Data Unprecedented rate of data creation, 1995-today Data Pinakes Catalog Taxonomy Database 300 BC 1595 AD 1876 1970

16. Oh, How I’ve Missed You The reunification of metadata and the artifact

17. Together At Last

18. GIS Data is Unevolved + =

19. Enter the Data Curator Part social scientist, part librarian, part statistician, part RDBMS wiz

20. DIKW Model Data Fact, Signal, Symbol Information Structural v Functional Symbolic v Subjective Knowledge Processed Procedural Propositional

21. Popularity (Google Trends)

22. Words to Live By dx / dt

23. Thank you! ian@urbanmapping.com @urbanmapping R.I.P. Schema

Notes de l'éditeur

Some background to Urban Mapping. Wasn’t a straight forward path, but it’s very relevant-started close to 10 yrs ago with a printed map that reveals different layers of thematic imagery—streets, subways, neighborhoods, depending on the angle of viewing. We all know what happened to print, so I shifted the business to a new medium-in 2006 or so we collected much of the same data, but now using a spatial database as opposed to regular old vector/adobe illustrator. The writing was on the wall for licensing content to local web publishers, so shifted again-this time we moved upstream—continue to develop our own data, but greatly expand that effort to include commercial data and deliver it through our own mapping service. We do this for customers in various market segments, like Tableau Software, where we perform a few geo-services like hosting the base map and overlaying data.
I can be a bit of a curmudgeon and I hope a cautionary point of view has a place. Let’s talk about what Big Data is not. I’ll talk later about what it is.First thing to note is that Big Data isn’t really about data at all. But I am. It’s about tools and processes to manage and exploit info-nuggets. There’s nothing revolutionary about saying this, but I wanted to make it explicit. Second, big data isn’t especially new– Wall St and Walmart have been processing and deriving value for decades, but they don’t talk about it. Why? Because they make money doing so and don’t need to alert the competition. Anybody hear of Teradata? Whenever companies want to talk about what they are doing, it’s usually a red flag for me, meaning the technology, industry or something else hasn’t sufficiently evolved. But I’m also not saying Big Data is a rehash of enterprise software. More on that later…Finally, Big Data has democratized access to powerful tools at little cost. This doesn’t necessarily mean everybody knows how to use these tools. There can be some blowback, such as high credit card bills, analysis without direction/objective and lack of knowledge about basic statistics
There’s been exponential growth in data and it comes from any number of places. Some are shown here—mobile devices as probes, which vast capabilities to record all kinds of environmental variables, open government, social media and a desire for analytics which has been rebranded as business intelligence,
Processing and storage costs drop like rocks—enterprise software has been offering big solutions for decades to banking and others, but with incredibly low barriers to entry virtually anybody can participate.
Kal-i-um-akuswas a noted poet in the Library of Alexandria in 3rd century BC.
He created pin-a-keez, or Lists, a way of organizing works in the libraryEmbarked on the effort to organize 120k scrolls, by title, author, birthplace, father, education, summary of contents and other info. This was first effort to systematically create a bibliographic system. A direct link to metadata 2 millennia later
1595, Johan van der Does publishedNomenclator– this was the first instance of a printed catalog of library holdings. Represented a significant advancement over the Kal-i-um-akuslists, but it too close to two millennia to get here
The modern cataloging system: Dewey Decimal System, created 1876. Its father was Melville DeweyThe Dewey Decimal System attempted to organize all knowledge into ten main classes. Further subdivided into ten divisions, and each division into ten sections, giving ten main classes, 100 divisions and 1000 sections. Allows for infinite hierarchy, numerical and faceted (linking content from different areas).Other systems followed: Universal Decimal Classification, Library of Congress, etc…
This photo is from the Card Division at the Library of Congress in the1920s. The amount of physical metadata is astounding. Millions of library cards with metadata
The next major advancement was in the late 1960s. Early attempts at electronic indexing focused on a taxonomy of keywords and related information. Was efficient for reporting on what the system contained, but also kept the long running divorce between artifact and metadataThe online computer library center was created as a nonprofit to further access to library resources across institutions and decrease costs.The OCLC acquired the Dewey Decimal System and as any standards body does, sought to perpetuate its existence over the decadesThen the internet happened
That meant out wit the old, In with the new. This photo is library cards going into storage. Not sure why they’d even be archived after the transition to databases was made, but that’s for another time
So this is the situation. Beginning in the late 60s, electronically-stored metadata began to grow. The library cards (at left) went away, but the bifurcation was complete. Total separation of the thing from the description of the thing. And it sort of made sense– IT was in its infancy, so storage and processing costs were high. Publishers also exerted a great deal of control over how they permitted libraries to index and make available works.
To put the last 2000 years in perspective, Kal-i-um-akus created the first crude schema, leaving a place for metadata to be storedThe Nomenclator gave us the first bibliographic catalog, printed and bound, produced annuallyThe Dewey Decimal System was born in 1876 and was the basis of an extensive metadata system for published worksThen…the internet happened. In the top right you see the corner of a cloud. That’s my way of representing what happens next.The volume of data product grows exponentially, overtaking 2000 plus years of history in no time.
So how about the bifurcation/divorce I mentioned? The web brought the artifact and metadata together again
Google Books. Sure, we have the Dewey Decimal type stuff along with ISBN, retail price, etc…but we also threw in the whole damn book—full text search.Amazon does it too
In my industry, the state of metadata is horrendous. We’re stuck in the green screen days. Proprietary data formats and slow moving vendors don’t help.While I’m the first person to admit GIS needs to get off its ass and change, radically, there’s also something the real time streaming web can learn from us.
We hear about the rise of the curator, the part social scientist, part librarian, part RDBMS wiz and statistician.This is increasingly important across all industries—when dealing with a torrent of data, domain experts will be required to help make sense of it.
The Knowledge Hierarchy, as it is sometimes known, has been used to represent relationships between the stuff that turns into something meaningful. You could look at this going from a letter to a sentence to a paragraph or an ingredient to a recipe to a meal or something else. The details don’t matter here, but I think about the fundamental building block of data.One geocoded tweet has little or no value on its own. Contrast that with per capital income for this ZIP code. By amassing enough geocoded tweets, it’s clear we can get to something meaningful, but I don’t know how many tweets that is. I do know that per capita income can directly inform my marketing plans for selling a new shampoo.
With that, here’s some more wet blanket for everybody. Using Google Trends, I looked at a number of terms that might indicate the old fashioned RDBMS, SQL way of life and most seem to follow the blue line, which represents the term ‘metadata.’ Big Data, coincidentally, first appears a few months before the first Strata conference in 2011. ‘Curation’ has a longer life but doesn’t show the surge of Big Data, and everybody’s favorite ‘data scientist,’ doesn’t register as much more than a rounding error. I’m not using Google Trends to fully substantiate my argument, but I do hope you take a dose of skepticism before fully embracing ‘this.’
In close, I’d like to leave you with an emergent cliché. It’s also my measure of how geeky an audience I have: one person’s metadata is another person’s data.

Big Data = Bigger Metadata

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à Big Data = Bigger Metadata

Similaire à Big Data = Bigger Metadata (20)

Plus de Ian White

Plus de Ian White (8)

Dernier

Dernier (20)

Big Data = Bigger Metadata

Notes de l'éditeur