BIG DATA?  THE GLOBAL IMBALANCE!        Stéphane Grumbach              INRIA                 1
The digital universeData deluge in all sectors of activity		   U.S. Library of Congress: 235 Terabytes of data	   Walmart:...
The digital universeData deluge in all sectors of activity	                                                               ...
The digital universe 2.7 ZettabytesData deluge in all sectors of activity	                                                ...
The digital universe 2.7 ZettabytesData deluge in all sectors of activity	                                                ...
The Big Data Industry     Advertising     Capture users data     Generate users profiles     Target ads 3
The Big Data Industrybeyond advertising•$300 billion/year    US health care•€250 billion/year    Europe public administrat...
First challenge: Data Harvesting70% of the data produced by individuals  directly produced by users:  	 email, photos, blo...
First challenge: Data Harvesting70% of the data produced by individuals  directly produced by users:  	 email, photos, blo...
Second challenge: knowledge extractionUser profiles (business)	 => Ads targetAutomatic discovery (science)	 => Google Flu	 ...
Second challenge: knowledge extractionUser profiles (business)	 => Ads targetAutomatic discovery (science)	 => Google Flu	 ...
Data: raw material of the 21st century         (much like crude oil)                    7
Data: raw material of the 21st century            (much like crude oil) extractionfrom natural                            ...
Data: raw material of the 21st century            (much like crude oil) extractionfrom natural                            ...
Where are these data?           8
Where are these data?Huge concentration of data85% of data handled by (large) corporations  Virtualization/dematerializati...
Where are these data?Huge concentration of data85% of data handled by (large) corporations  Virtualization/dematerializati...
Geopolitics of big data                      Alexa.com            9
Geopolitics of big dataData from the Web 2.0 produced by users everywhere in the world but accumulated by corporations mos...
Geopolitics of big dataData from the Web 2.0 produced by users everywhere in the world but accumulated by corporations mos...
Geopolitics of big data            10
Geopolitics of big dataThe Top 50 websites worldwide   • USA: 72 %                    10
Geopolitics of big dataThe Top 50 websites worldwide   • USA: 72 %   • China: 16 % (Baidu: 5; QQ: 8; Taobao: 13; Sina:17; ...
Geopolitics of big dataDiversity of search engines    •   USA: Google: 65 % ; Bing: 15% ;Yahoo: 15%    •   China: Baidu: 7...
The global imbalance               Information asymmetry“Since asymmetries of information give rise to market power,and pe...
Impact of the global imbalanceRegulation  What legislations over a dematerialized global industry?  Aren’t the rules define...
The power of data             Map Ecological Footprint        14   http://www.csa.com/discoveryguides/china/review.php
The power of data             Map Ecological Footprint        14   http://www.csa.com/discoveryguides/china/review.php
What’s at stake in Europe?Suspicion (fear?) regarding data	 concern for privacy protection high in Europe	 active legislat...
Are there alternatives?dominant (centralized) model            unclear privacy             lost property     active (centr...
odel                                             m                                   ed)                               ali...
odel                                             m                                   ed)                               ali...
An alternative path for Europe?The information society	 it is only emerging	 it will continue to evolve	 it will impact po...
An alternative path for Europe?The information society	 it is only emerging	 it will continue to evolve	 it will impact po...
19
谢谢 19
Prochain SlideShare
Chargement dans…5
×

Lift12Fr - Stephane Grumbach

6 242 vues

Publié le

0 commentaire
6 j’aime
Statistiques
Remarques
  • Soyez le premier à commenter

Aucun téléchargement
Vues
Nombre de vues
6 242
Sur SlideShare
0
Issues des intégrations
0
Intégrations
2 406
Actions
Partages
0
Téléchargements
93
Commentaires
0
J’aime
6
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Lift12Fr - Stephane Grumbach

    1. 1. BIG DATA? THE GLOBAL IMBALANCE! Stéphane Grumbach INRIA 1
    2. 2. The digital universeData deluge in all sectors of activity U.S. Library of Congress: 235 Terabytes of data Walmart: 2.5 petabytes of data, 1 million customer transactions / hour Facebook: 30 Petabytes of user data Google: processing 20 petabytes a day (2008) World: 5 billion people calling, tweeting, browsing on mobile phonesExponential increase doubles every two years followed by the capacity to store, compute, and communicate 2
    3. 3. The digital universeData deluge in all sectors of activity kilo 103 U.S. Library of Congress: 235 Terabytes of data Walmart: 2.5 petabytes of data, 1 million customer transactions / hour mega 106 Facebook: 30 Petabytes of user data giga 109 Google: processing 20 petabytes a day (2008) tera 1012 World: 5 billion people calling, tweeting, browsing on mobile phones peta 1015 exa 1018Exponential increase zetta 1021 doubles every two years yotta 1024 followed by the capacity to store, compute, and communicate 2
    4. 4. The digital universe 2.7 ZettabytesData deluge in all sectors of activity kilo 103 U.S. Library of Congress: 235 Terabytes of data Walmart: 2.5 petabytes of data, 1 million customer transactions / hour mega 106 Facebook: 30 Petabytes of user data giga 109 Google: processing 20 petabytes a day (2008) tera 1012 World: 5 billion people calling, tweeting, browsing on mobile phones peta 1015 exa 1018Exponential increase zetta 1021 doubles every two years yotta 1024 followed by the capacity to store, compute, and communicate 2
    5. 5. The digital universe 2.7 ZettabytesData deluge in all sectors of activity kilo 103 U.S. Library of Congress: 235 Terabytes of data Walmart: 2.5 petabytes of data, 1 million customer transactions / hour mega 106 Facebook: 30 Petabytes of user data giga 109 Google: processing 20 petabytes a day (2008) tera 1012 World: 5 billion people calling, tweeting, browsing on mobile phones peta 1015 exa 1018Exponential increase zetta 1021 yotta 1024 doubles every two years 35 zettabytes in 2020 followed by the capacity to store, compute, and communicate 2
    6. 6. The Big Data Industry Advertising Capture users data Generate users profiles Target ads 3
    7. 7. The Big Data Industrybeyond advertising•$300 billion/year US health care•€250 billion/year Europe public administration [McKinsey 2011]Tremendous economic impact Teraeuros (thousands billions) 4
    8. 8. First challenge: Data Harvesting70% of the data produced by individuals directly produced by users: email, photos, blogs, etc. (less than half) indirectly digital shadow/footprint: surveillance, web usage, transactions 5
    9. 9. First challenge: Data Harvesting70% of the data produced by individuals directly produced by users: email, photos, blogs, etc. (less than half) indirectly digital shadow/footprint: surveillance, web usage, transactionsThe free paradigm of the 2.0 Free services traded for private user data Free exploitation of the accumulated data 5
    10. 10. Second challenge: knowledge extractionUser profiles (business) => Ads targetAutomatic discovery (science) => Google Flu monitoring of flu related queries a search engine company knows everything => Biological, sociological data...NSA (security) => Ambition to handle yottabytes (1024) !!! 6
    11. 11. Second challenge: knowledge extractionUser profiles (business) => Ads targetAutomatic discovery (science) => Google Flu monitoring of flu related queries a search engine company knows everything => Biological, sociological data...NSA (security) => Ambition to handle yottabytes (1024) !!! 6
    12. 12. Data: raw material of the 21st century (much like crude oil) 7
    13. 13. Data: raw material of the 21st century (much like crude oil) extractionfrom natural consumption transport refining at users reservoirs 7
    14. 14. Data: raw material of the 21st century (much like crude oil) extractionfrom natural consumption transport refining at users reservoirs accumulation production data in large Internet of data analytics repositories at users 7
    15. 15. Where are these data? 8
    16. 16. Where are these data?Huge concentration of data85% of data handled by (large) corporations Virtualization/dematerialization of infrastructures Social networks, Cloud, ...Most of the prominent corporations based in the USA Google, Facebook, Amazon, Twitter, ... Storage capacity of Europe = 70% USA [McKinsey 2011] 8
    17. 17. Where are these data?Huge concentration of data85% of data handled by (large) corporations Virtualization/dematerialization of infrastructures Social networks, Cloud, ...Most of the prominent corporations based in the USA Google, Facebook, Amazon, Twitter, ... Storage capacity of Europe = 70% USA [McKinsey 2011] 1/3 of world data stored in the cloud by 2020 8
    18. 18. Geopolitics of big data Alexa.com 9
    19. 19. Geopolitics of big dataData from the Web 2.0 produced by users everywhere in the world but accumulated by corporations most often abroadPercentage of national web corporations among top 25 by country Alexa.com 9
    20. 20. Geopolitics of big dataData from the Web 2.0 produced by users everywhere in the world but accumulated by corporations most often abroadPercentage of national web corporations among top 25 by country • USA: 100% • China: 92% (only Google makes it in the top 25) • France: 36% (but mostly marginal sites, not data intensive) leboncoin, Orange, Free, commentcamarche, lemonde, lequipe, lefigaro, pagesjaunes, sfr Alexa.com 9
    21. 21. Geopolitics of big data 10
    22. 22. Geopolitics of big dataThe Top 50 websites worldwide • USA: 72 % 10
    23. 23. Geopolitics of big dataThe Top 50 websites worldwide • USA: 72 % • China: 16 % (Baidu: 5; QQ: 8; Taobao: 13; Sina:17; 163: 28; Soso:29; Sina weibo:31; Sohu:43) • Russia: 6 % (Yandex: 21; kontakte:30; Mail: 33; ) • Israel: 2 % (Babylon: 22) • UK: 2 % (BBC: 46) • Netherland: 2 % (AVG: 47) 10
    24. 24. Geopolitics of big dataDiversity of search engines • USA: Google: 65 % ; Bing: 15% ;Yahoo: 15% • China: Baidu: 78% ; Google: 16% • Russia: Yandex: 60% ; Google: 25% • UK: Google: 91 % ; Bing: 5% • France: Google: 92 % ; Bing: 3%In France, • Google has a de facto monopoly • Google knows more about France than INSEE 11
    25. 25. The global imbalance Information asymmetry“Since asymmetries of information give rise to market power,and perfect competition is required if markets are to be efficient,it is perhaps not surprising that markets with informationasymmetries and other information imperfections are far fromefficient.” JOSEPH E. STIGLITZ 12
    26. 26. Impact of the global imbalanceRegulation What legislations over a dematerialized global industry? Aren’t the rules defined by those who have the control?Business How to face monopolistic positions? How to handle the information asymmetry?Security Data at the core of nations independence 13
    27. 27. The power of data Map Ecological Footprint 14 http://www.csa.com/discoveryguides/china/review.php
    28. 28. The power of data Map Ecological Footprint 14 http://www.csa.com/discoveryguides/china/review.php
    29. 29. What’s at stake in Europe?Suspicion (fear?) regarding data concern for privacy protection high in Europe active legislative work historical reasons?Weak industrial/innovation environment no strong corporation emergingBut essential dependence on foreign systems 15
    30. 30. Are there alternatives?dominant (centralized) model unclear privacy lost property active (centralized) business little share of business capacitydecentralized ‘utopian’ model high privacy Faroo, Yacy real ownership little business Diaspora 16
    31. 31. odel m ed) aliz ce ntr vac y ( ar pri ant uncle perty nessdomin t pr o bus i y los z ed) acit rali cap c ent in ess ve ( bus acti reo f e sha littl de ce ntr aliz e d ‘u rea hig h pri top l ow vacy ian l i tt ne ’m le bu rship od sin ess el Faroo, Yacy Diaspora 17
    32. 32. odel m ed) aliz ce ntr vac y ( ar pri ant uncle perty nessdomin t pr o bus i y los z ed) acit rali cap c ent in ess ve ( bus acti reo f e sha littl an alternative path ? active (competitive) business symmetry of information de ce ntr ownership & privacy aliz anti monopoly e d ‘u rea hig h pri top l ow vacy ian l i tt ne ’m le bu rship od sin ess el Faroo, Yacy Diaspora 17
    33. 33. An alternative path for Europe?The information society it is only emerging it will continue to evolve it will impact political systems new business models, new equilibrium will appear 18
    34. 34. An alternative path for Europe?The information society it is only emerging it will continue to evolve it will impact political systems new business models, new equilibrium will appear Europe should embrace the future 18
    35. 35. 19
    36. 36. 谢谢 19

    ×