SlideShare une entreprise Scribd logo
1  sur  22
Real Time-Big Data-Social
 Network-Data Science-Gamified!




                                                  Jason Capehart
a.k.a. The Cascade Project                            12/12/12

(Okay … that last part of the title isn’t true)
1. Visualization

2. Data

3. Analysis
Show Me!
The Good, The Bad, The Ugly
Surely, You Must Be Joking.



            Store            Examples
Key-Value           Hadoop, Memcached, Redis
Document            MongoDB, CouchDB
Graph               Neo4j, Giraph, Titan
Real Time           Storm, Impala
Citation:
Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News
Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600).
Raleigh, NC: ACM.
Citation:
A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM
Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)
800,000,000
   (that’s a lot of users)


   (cost = 200k for fire hose)
Sampled

                  Not Sampled




Citation:
Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free:
Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.
# Pseudo Code

id_guess = randint(0, 10^9)

user = api.get_user(id = id_guess)

Repeat until tired or rate limited
Power Law (xmin = 281, α = 2.19)
 Lognormal



Discrete Power Law vs.
Lognormal
Loglikelihood
                89.46
Ratio
Vuong’s Test
                7.14
Statistic
p-val
                >0.99
(1-sided)
Power Law (xmin = 222, α = 2.33)
Lognormal
Stretched Exponential
• Conclusions = None!
  – All work is in progress



• Discussion
  – Cascade uses open source
  – Opportunities to give back?
References
1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703
   (2009). (arXiv:0706.1062, doi:10.1137/070710111)
      –      Code: http://tuvalu.santafe.edu/~aaronc/powerlaws/
2.   Newman, M. (2005, September-October). Power laws, Pareto distributions and Zipf's law. Contemporary Physics,
     46(5), 323-351.
3.   Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings
     of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM
4.   Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of
     networks. Proceedings of the National Academy of Sciences, 4221-4224.

Contenu connexe

Tendances

Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska PacificResearchPlatform
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
2020 ml swarm ascend presentation
2020 ml swarm ascend presentation2020 ml swarm ascend presentation
2020 ml swarm ascend presentationKyongsik Yun
 
Portable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge ComputersPortable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge ComputersThomas Rausch
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureDavid LeBauer
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
A Biological Internet?: Eywa
A Biological Internet?: EywaA Biological Internet?: Eywa
A Biological Internet?: EywaEugene Siow
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Mike Warren Keynote
Mike Warren KeynoteMike Warren Keynote
Mike Warren KeynoteData Con LA
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World FosterIan Foster
 
Python for data science
Python for data sciencePython for data science
Python for data scienceWei-Wen Hsu
 

Tendances (20)

Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
Deep Learning of Astronomical Spectroscopy, J. Xavier Prochaska
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
2020 ml swarm ascend presentation
2020 ml swarm ascend presentation2020 ml swarm ascend presentation
2020 ml swarm ascend presentation
 
Portable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge ComputersPortable Energy-Aware Cluster-Based Edge Computers
Portable Energy-Aware Cluster-Based Edge Computers
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
 
Reusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize AgricultureReusable Software and Open Data To Optimize Agriculture
Reusable Software and Open Data To Optimize Agriculture
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
A Biological Internet?: Eywa
A Biological Internet?: EywaA Biological Internet?: Eywa
A Biological Internet?: Eywa
 
E scidocdays review
E scidocdays reviewE scidocdays review
E scidocdays review
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Mike Warren Keynote
Mike Warren KeynoteMike Warren Keynote
Mike Warren Keynote
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Application of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
 
Application of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in ThailandApplication of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in Thailand
 
Amman Workshop - Overview - M MacKay
Amman Workshop - Overview - M MacKayAmman Workshop - Overview - M MacKay
Amman Workshop - Overview - M MacKay
 
Python for data science
Python for data sciencePython for data science
Python for data science
 

Similaire à Real Time-Big Data-Social Network-Data Science

When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesData2B
 
Genomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light SpeedGenomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light SpeedLarry Smarr
 
The Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research EffortsThe Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research EffortsLarry Smarr
 
The Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human RealityThe Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human RealityLarry Smarr
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Niki Pavlopoulou
 
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineThe Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineLarry Smarr
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningLarry Smarr
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 
Education in a Globally Connected World
Education in a Globally Connected WorldEducation in a Globally Connected World
Education in a Globally Connected WorldLarry Smarr
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesIan Foster
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFlávio Codeço Coelho
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesLarry Smarr
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
 
Building a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive DiscoveryBuilding a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive DiscoveryLarry Smarr
 
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...Larry Smarr
 

Similaire à Real Time-Big Data-Social Network-Data Science (20)

Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphes
 
Genomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light SpeedGenomic Research: The Jump to Light Speed
Genomic Research: The Jump to Light Speed
 
The Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research EffortsThe Importance of Large-Scale Computer Science Research Efforts
The Importance of Large-Scale Computer Science Research Efforts
 
The Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human RealityThe Singularity: Toward a Post-Human Reality
The Singularity: Toward a Post-Human Reality
 
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
Using Embeddings for Dynamic Diverse Summarisation in Heterogeneous Graph Str...
 
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic MedicineThe Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
The Future of the Internet and its Impact on Digitally Enabled Genomic Medicine
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Education in a Globally Connected World
Education in a Globally Connected WorldEducation in a Globally Connected World
Education in a Globally Connected World
 
Opportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architecturesOpportunities for X-Ray science in future computing architectures
Opportunities for X-Ray science in future computing architectures
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic Sciences
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...
 
Building a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive DiscoveryBuilding a Global Collaboration System for Data-Intensive Discovery
Building a Global Collaboration System for Data-Intensive Discovery
 
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
The Jump to Light Speed - Data Intensive Earth Sciences are Leading the Way t...
 

Dernier

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Real Time-Big Data-Social Network-Data Science

  • 1. Real Time-Big Data-Social Network-Data Science-Gamified! Jason Capehart a.k.a. The Cascade Project 12/12/12 (Okay … that last part of the title isn’t true)
  • 3.
  • 4.
  • 6.
  • 7.
  • 8. The Good, The Bad, The Ugly
  • 9. Surely, You Must Be Joking. Store Examples Key-Value Hadoop, Memcached, Redis Document MongoDB, CouchDB Graph Neo4j, Giraph, Titan Real Time Storm, Impala
  • 10.
  • 11. Citation: Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM.
  • 12. Citation: A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)
  • 13. 800,000,000 (that’s a lot of users) (cost = 200k for fire hose)
  • 14. Sampled Not Sampled Citation: Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.
  • 15.
  • 16. # Pseudo Code id_guess = randint(0, 10^9) user = api.get_user(id = id_guess) Repeat until tired or rate limited
  • 17.
  • 18. Power Law (xmin = 281, α = 2.19) Lognormal Discrete Power Law vs. Lognormal Loglikelihood 89.46 Ratio Vuong’s Test 7.14 Statistic p-val >0.99 (1-sided)
  • 19.
  • 20. Power Law (xmin = 222, α = 2.33) Lognormal Stretched Exponential
  • 21. • Conclusions = None! – All work is in progress • Discussion – Cascade uses open source – Opportunities to give back?
  • 22. References 1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111) – Code: http://tuvalu.santafe.edu/~aaronc/powerlaws/ 2. Newman, M. (2005, September-October). Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 46(5), 323-351. 3. Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM 4. Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.

Notes de l'éditeur

  1. Nature of the Beast, comparison to weblogs