SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Big Data Challenges: Getting Some
March 31, 2011




Gil Elbaz
   @factual
   @gilelbaz
Road to Information Singularity




                          Conf dential
                             i           2
Networks Underlying Information Flow


                                                                                ! Density:
                                                                                  number of
                                                                                  connecting paths
                                                                                ! Plasticity:
                                                                                  ease of forming
                                                                                  new paths
                                                                                !
                                                                                  Speed & Flow:
 !""#$%%&&&'()*++,-+.*/(01,-211(**3'4*5%()*++,-+6.*/6(01,-211%
                                                                                  rate of information
                                                                                  transfer


                                                                 Conf dential
                                                                    i                                   3
The Internet




               !""#$%%&&&'7578*-'4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF




                                                          Conf dential
                                                             i                                               4
Search Engines




                 Conf dential
                    i           5
Social Networks: Facebook




                            !""#$%%A'(#'()*+1#*"'4*5%



             600 million Facebook users
                 130 average friends
              8 friend requests / month

              15 messages / day / user
                        Conf dential
                           i                            6
Trending of Unfriending




                          Conf dential
                             i           7
Conf dential
   i           8
Unfriending




              Conf dential
                 i           9
Another Network: The Brain




                  100 billion neurons

               1000 ‘hardwired’ synapses




                       !""#$%%&2)4*52"*G57/"'4*5%A@CC%@C




                            Conf dential
                               i                           10
Web 3.0: Data Web




                    Conf dential
                       i           11
Web Scale Data = More Pain


                     Findability
                       Access
                       Rights
                     Economics
                     Standards
             Integration & Aggregation
                        Trust
                         Conf dential
                            i            12
Web 2.0 Model: Scale-Free Networks




&&&'.0"0/22H#)*/7",*-'-2"   Conf dential
                               i           13
Book Data: Progress Being Made




    Google Book Search API
      Open Library Books API
         ISBNdb
           Amazon API
             LibraryThing
                GoodReads
                   WorldCat


                        Conf dential
                           i           14
Google Book Search API                 Amazon API
    Open Library Books API                LibraryThing
         ISBNdb     WorldCat           GoodReads



            I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKK
            L,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK
            N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK




                        Conf dential
                           i
Another Case Study: Local Data




                                        !""#$%%1"2O24!2-2J'#*1"2/*01'4*5%




                         Conf dential
                            i                                               16
Another Case Study: Local Data


    !"##$%$&$'$(#)*+()(,-&(##)%.'/!"#$%"$&"'$()*$*!)$%+*+0

         !"#$$%&                                  !"#$$%&
      '()%*++,       Examine Twitter sentiment    '()%*++,
                     (avoid dirty coffee shops)
           -++.$                                  -++.$
     '+/&01/(&%       Identify areas of highest   '+/&01/(&%
                               bike thefts
             2%3.                                 2%3.
          4#33+"                                  4#33+"
                      Correlate check-ins with
         5++63%            property values        5++63%
 7+8%9:/;)$#+;                                    7+8%9:/;)$#+;

                               Conf dential
                                  i                               17
HomeJunction




               Conf dential
                  i           18
Factual is Example of New Information Network

      "#$#%&'(   )'$&*+*#(&(    345&*'6&'$       ,-./#'&01&-*'&2

                           ,-."'-%$%+*+




                 Aggregate      Mash Curate
                   Dedupe       Canonicalize



          Developers      Publishers          Search Engines


        !"#$%"&'()"*+$,-.-/(0(1("*+$%231#-&"$4..*

                               Conf dential
                                  i                                19
Factual’s Open Data Model

  Free, access via APIs, SDKs, and downloads BUT…
     we ask you to contribute back into ecosystem.

                                           Benef ts
                                               i

                                           ! Drive down costs
                                           ! Rapid iteration
                                           ! Differentiate on user
                                              experience

                                           ! Only need small %
                                              participation from world
                                              (e.g. Wikipedia)



                            Conf dential
                               i                                         20
Equivalence Measurements




                     =?
    Subway Sandwiches                 Subway
    52 E Court St                     52 West Court St
    Cincinnati, OH                    45202
    (513)-241-6699                    (800)-653-2323


                       Conf dential
                          i                              21
Large-Scale Aggregation Technologies




                         Conf dential
                            i           22
Large-Scale Aggregation Technologies

                      =#7/"52-"1KPK=#"1
                         ;2-"2/KPK;"/
                    ;*/#KPK;*/#*/7",*-
                         N2/O,42KPKNO4
                        =""*/-2JKPK=""J
                      =11*4KPK=11*4,7"21
                      ?-4KPK?-4*/#*/7"2<
                      =11-KPK=11*4,7",*-
                        ;*KPK;*5#7-J
                          Q*0-"KPKQ"
                        R/*1KPKR/*"!2/1
                KKKKKKKRRSKPKR7/(2T02KKK'''''
                           U*/,KPK>2<
                          Conf dential
                             i                  23
Large-Scale Aggregation Technologies

                 L21"70/7-"KPKL1"/-"
               L21"70/7-"KPKL21"07/7-"
                    V*1#KPKV*1#,"7)
                   R,))7/<1KPKR,)),7/<1
                       N7)*-KPKN)-
                     R0..2"KPKR0..2""
                      ;2-"2/KPK;"/
                  =#7/"52-"1KPK=#"1
                     R*0",T02KPKR"T
                   W2&2)2/1KPKW2&)2/1
                    ;)27-2/1KPK;)-/1
                KKKKKQ7/32"KPKQ3"8K'''''
                  X/7+2-KPKYZL2,))JK[
                         Conf dential
                            i              24
Kragen O'Reilly?




                   Conf dential
                      i           25
Large-Scale Deduping




   • Specialized data compression & folding techniques
   • Eliminate redundant entities - endpoints and authority pages
   • Improves precision & recall
   • Enables real-time dedupe and crosswalks

                               Conf dential
                                  i                                 26
Shared Foundational Data

  ! Commoditization of data
  ! Head attributes for people, places, things decreasing in value
    ! hCard data value driven to zero (visual of local data being
       identical on thousand of apps)
    ! Entertainment: IMDB exposed all their data for non-
       commercial use (link to site map)
    ! Yet, there are still lots of errors in foundation data – thus
       need “living” model




                                Conf dential
                                   i
LA Neighborhoods: Another Crowdsourcing Example




 ! LA Times started with 87
   neighborhoods based on census
   tracts
 ! Incorporated 650+ user maps
 ! Ended with 114 neighborhoods for
   LA City
 ! Added additional 158
   neighborhoods for LA County




                                   Conf dential
                                      i
Ownership & Rights: LA Neighborhoods:


  ! Terms of Service:
    Creative Commons
    Attribution,
    Noncommercial, Share-
    Alike license
  ! Can share and remix as
    long as it’s for
    noncommercial uses,
    attributed to the LA
    Times, and shared
    under the same terms



                             Conf dential
                                i
Evolving “Buy” Model


 ! Data Marketplaces (“itunes of data?”)
 ! Data Search Engines
 ! Microformats / Semantic Web Markups / Other
   Standards
 ! Electronic Forms of T&Cs




                            Conf dential
                               i
Summary: Road to the Information Singularity


 ! Rise in community storage and access
 ! New common schemas and standards
 ! Def nitive, accountable sources of “open” data
     i
 ! Trends towards sharing of foundational data
 ! 'Buy' models based on unique data, novel access
   methods, SLAs, value-added services




                            Conf dential
                               i                     31
Thank you!
              Questions......

Gil Elbaz
  @factual
  @gilelbaz

Contenu connexe

Similaire à Factual 2011 Web 2.0 Presentation

Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010Jeroen van der Schenk
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGUTanu Malik
 
Transition web project_survey_presentation_final
Transition web project_survey_presentation_finalTransition web project_survey_presentation_final
Transition web project_survey_presentation_finalEd Mitchell
 
Visualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalxVisualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalxrsd6
 
Evolution of Social Software in IBM
Evolution of Social Software in IBMEvolution of Social Software in IBM
Evolution of Social Software in IBMChris Sparshott
 
Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)Igor Polakovic
 
Data on the web - an inconvenient truth
Data on the web - an inconvenient truthData on the web - an inconvenient truth
Data on the web - an inconvenient truthmarcobrattinga
 
Saiful Hidayat Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat   Trend Teknologi Dijital Dan E CommerceSaiful Hidayat   Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat Trend Teknologi Dijital Dan E CommerceSaiful Hidayat
 
Economics of innovation in mobile
Economics of innovation in mobileEconomics of innovation in mobile
Economics of innovation in mobileAndrew Savory
 
Cognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit londonCognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit londonRaffaele Giaffreda
 
Plays Well With Others
Plays Well With OthersPlays Well With Others
Plays Well With Othersbrianoberkirch
 
API Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About ThemAPI Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About ThemEoin Woods
 
MIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systemsMIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systemsBoston Global Forum
 
Brian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social NetworkingBrian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social Networkingebestes
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Alexandre Passant
 
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...AdNerds
 
The Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataThe Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataAlexandre Passant
 

Similaire à Factual 2011 Web 2.0 Presentation (20)

Digital Xperience Trendsession
Digital Xperience TrendsessionDigital Xperience Trendsession
Digital Xperience Trendsession
 
Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGU
 
Transition web project_survey_presentation_final
Transition web project_survey_presentation_finalTransition web project_survey_presentation_final
Transition web project_survey_presentation_final
 
Visualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalxVisualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalx
 
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
 
U learn11 tmp
U learn11 tmpU learn11 tmp
U learn11 tmp
 
Evolution of Social Software in IBM
Evolution of Social Software in IBMEvolution of Social Software in IBM
Evolution of Social Software in IBM
 
Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)
 
Data on the web - an inconvenient truth
Data on the web - an inconvenient truthData on the web - an inconvenient truth
Data on the web - an inconvenient truth
 
Saiful Hidayat Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat   Trend Teknologi Dijital Dan E CommerceSaiful Hidayat   Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat Trend Teknologi Dijital Dan E Commerce
 
Economics of innovation in mobile
Economics of innovation in mobileEconomics of innovation in mobile
Economics of innovation in mobile
 
Cognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit londonCognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit london
 
Plays Well With Others
Plays Well With OthersPlays Well With Others
Plays Well With Others
 
API Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About ThemAPI Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About Them
 
MIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systemsMIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systems
 
Brian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social NetworkingBrian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social Networking
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...
 
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
 
The Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataThe Social Semantic Web and Linked Data
The Social Semantic Web and Linked Data
 

Dernier

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Dernier (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Factual 2011 Web 2.0 Presentation

  • 1. Big Data Challenges: Getting Some March 31, 2011 Gil Elbaz @factual @gilelbaz
  • 2. Road to Information Singularity Conf dential i 2
  • 3. Networks Underlying Information Flow ! Density: number of connecting paths ! Plasticity: ease of forming new paths ! Speed & Flow: !""#$%%&&&'()*++,-+.*/(01,-211(**3'4*5%()*++,-+6.*/6(01,-211% rate of information transfer Conf dential i 3
  • 4. The Internet !""#$%%&&&'7578*-'4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF Conf dential i 4
  • 5. Search Engines Conf dential i 5
  • 6. Social Networks: Facebook !""#$%%A'(#'()*+1#*"'4*5% 600 million Facebook users 130 average friends 8 friend requests / month 15 messages / day / user Conf dential i 6
  • 7. Trending of Unfriending Conf dential i 7
  • 9. Unfriending Conf dential i 9
  • 10. Another Network: The Brain 100 billion neurons 1000 ‘hardwired’ synapses !""#$%%&2)4*52"*G57/"'4*5%A@CC%@C Conf dential i 10
  • 11. Web 3.0: Data Web Conf dential i 11
  • 12. Web Scale Data = More Pain Findability Access Rights Economics Standards Integration & Aggregation Trust Conf dential i 12
  • 13. Web 2.0 Model: Scale-Free Networks &&&'.0"0/22H#)*/7",*-'-2" Conf dential i 13
  • 14. Book Data: Progress Being Made Google Book Search API Open Library Books API ISBNdb Amazon API LibraryThing GoodReads WorldCat Conf dential i 14
  • 15. Google Book Search API Amazon API Open Library Books API LibraryThing ISBNdb WorldCat GoodReads I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKK L,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK Conf dential i
  • 16. Another Case Study: Local Data !""#$%%1"2O24!2-2J'#*1"2/*01'4*5% Conf dential i 16
  • 17. Another Case Study: Local Data !"##$%$&$'$(#)*+()(,-&(##)%.'/!"#$%"$&"'$()*$*!)$%+*+0 !"#$$%& !"#$$%& '()%*++, Examine Twitter sentiment '()%*++, (avoid dirty coffee shops) -++.$ -++.$ '+/&01/(&% Identify areas of highest '+/&01/(&% bike thefts 2%3. 2%3. 4#33+" 4#33+" Correlate check-ins with 5++63% property values 5++63% 7+8%9:/;)$#+; 7+8%9:/;)$#+; Conf dential i 17
  • 18. HomeJunction Conf dential i 18
  • 19. Factual is Example of New Information Network "#$#%&'( )'$&*+*#(&( 345&*'6&'$ ,-./#'&01&-*'&2 ,-."'-%$%+*+ Aggregate Mash Curate Dedupe Canonicalize Developers Publishers Search Engines !"#$%"&'()"*+$,-.-/(0(1("*+$%231#-&"$4..* Conf dential i 19
  • 20. Factual’s Open Data Model Free, access via APIs, SDKs, and downloads BUT… we ask you to contribute back into ecosystem. Benef ts i ! Drive down costs ! Rapid iteration ! Differentiate on user experience ! Only need small % participation from world (e.g. Wikipedia) Conf dential i 20
  • 21. Equivalence Measurements =? Subway Sandwiches Subway 52 E Court St 52 West Court St Cincinnati, OH 45202 (513)-241-6699 (800)-653-2323 Conf dential i 21
  • 23. Large-Scale Aggregation Technologies =#7/"52-"1KPK=#"1 ;2-"2/KPK;"/ ;*/#KPK;*/#*/7",*- N2/O,42KPKNO4 =""*/-2JKPK=""J =11*4KPK=11*4,7"21 ?-4KPK?-4*/#*/7"2< =11-KPK=11*4,7",*- ;*KPK;*5#7-J Q*0-"KPKQ" R/*1KPKR/*"!2/1 KKKKKKKRRSKPKR7/(2T02KKK''''' U*/,KPK>2< Conf dential i 23
  • 24. Large-Scale Aggregation Technologies L21"70/7-"KPKL1"/-" L21"70/7-"KPKL21"07/7-" V*1#KPKV*1#,"7) R,))7/<1KPKR,)),7/<1 N7)*-KPKN)- R0..2"KPKR0..2"" ;2-"2/KPK;"/ =#7/"52-"1KPK=#"1 R*0",T02KPKR"T W2&2)2/1KPKW2&)2/1 ;)27-2/1KPK;)-/1 KKKKKQ7/32"KPKQ3"8K''''' X/7+2-KPKYZL2,))JK[ Conf dential i 24
  • 25. Kragen O'Reilly? Conf dential i 25
  • 26. Large-Scale Deduping • Specialized data compression & folding techniques • Eliminate redundant entities - endpoints and authority pages • Improves precision & recall • Enables real-time dedupe and crosswalks Conf dential i 26
  • 27. Shared Foundational Data ! Commoditization of data ! Head attributes for people, places, things decreasing in value ! hCard data value driven to zero (visual of local data being identical on thousand of apps) ! Entertainment: IMDB exposed all their data for non- commercial use (link to site map) ! Yet, there are still lots of errors in foundation data – thus need “living” model Conf dential i
  • 28. LA Neighborhoods: Another Crowdsourcing Example ! LA Times started with 87 neighborhoods based on census tracts ! Incorporated 650+ user maps ! Ended with 114 neighborhoods for LA City ! Added additional 158 neighborhoods for LA County Conf dential i
  • 29. Ownership & Rights: LA Neighborhoods: ! Terms of Service: Creative Commons Attribution, Noncommercial, Share- Alike license ! Can share and remix as long as it’s for noncommercial uses, attributed to the LA Times, and shared under the same terms Conf dential i
  • 30. Evolving “Buy” Model ! Data Marketplaces (“itunes of data?”) ! Data Search Engines ! Microformats / Semantic Web Markups / Other Standards ! Electronic Forms of T&Cs Conf dential i
  • 31. Summary: Road to the Information Singularity ! Rise in community storage and access ! New common schemas and standards ! Def nitive, accountable sources of “open” data i ! Trends towards sharing of foundational data ! 'Buy' models based on unique data, novel access methods, SLAs, value-added services Conf dential i 31
  • 32. Thank you! Questions...... Gil Elbaz @factual @gilelbaz