SlideShare a Scribd company logo
1 of 22
Download to read offline
Ori Pekelman @ Open World Forum 2014 
Combining Big Data & Open Source strategy
Me 
I am an entrepreneur and a consultant check out http://platform.sh on which I have been working a lot on lately 
I am the originator and co-organizer of a bunch of meetups such as the Functional Languages User Group (btw happenning right now…) and the big informal Data group we call ParisDataGeeks (with people like Olivier Grisel and Sam Bessalah.. And btw this one happens all day tomorrow!) 
On Social Media I am @OriPekelman 
OWF 2014 
2
Big Data Small Talk 
This is a short talk. There won’t be anything overly technical here. 
I don’t remember how this got to be the title of the talk.. 
If you come tomorrow you will get an incredible birds-eye view of current trends in real time big machine learningy data applications 
OWF 2014 
3
Data this Data that 
Data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data … data 
Everybody loves the data. 
OWF 2014 
4
Data this Data that 
OWF 2014 
5
Data this Data that 
Well, are contractions are so hard that with 100 petabytes we can’t do some simple Markov chains in the 24th century ? 
We say “Big Data” so often these days it has become an extremely vague term. 
And when we say “Open Data” we get the same form of vagueness. Let’s try to frame this.
What applications of big data are we talking about here? 
The machine learning kind: Everything else is mostly trivial or just a bit of engineering away. 
When we say Machine Learning it basically means: 
Rediculous amount of data 
100% proprietary mostly about intimate human interactions 
Software 
Mostly Open Source 
Model 
Mostly Opaque. Mostly Closed. Some with APIs 
Robust Predictions 
Mostly about human behaviour
The ingredients 
Data sources 
Proprietary and closed (property of whom?) 
Proprietary with some APIs 
Open with an open license 
Software 
Proprietary 
Free 
Stuff to run the software on 
Proprietary 
Models 
Proprietary and closed 
Proprietary with some APIs 
? 
OWF 2014 
8
Data property 
Us as individuals to a very faint degree 
Governements 
Google 
Apple 
Credit Card Companies and banks 
People we haven’t heard about 
OWF 2014 
9
On the software ingredient 
Big Data is predominantly an Open Source game 
How much big data software is not prefixed by « Apache »? 
OWF 2014 
10
Laws of data. I like laws. 
« Data expands to fill the space available for storage. » 
Parkinson’s law applied to data 
« Free disk space is always pronounced in percentage, and the percentage is always a single digit » 
My father 
OWF 2014 
11
Parkinson’s law 
Cloud technologies represent an ultimate phase in the commoditization of computing storage and calculation power 
Becomes limited only by cost (well at least in theory). 
So if we take Parkinson’s law to the letter data will expand until we have spent humanity’s last dime. 
OWF 2014 
12
The Cloud, Data and Free Software 
The cloud is orthogonal at the least to the basic idea of free software (the libre variety) 
Because what makes free software economically possible is that the marginal cost of duplicating code tends to zero. 
The marginal cost of duplicating data grows at best linearly and because of Parkinson's law.. Probably more than that. 
This means that in the list of ingredients we noted before “data” will by nature be mostly proprietary. Because its cost is directly linked to that of machines and because Moore’s law is of no help.
Models 
Models are better than data 
They are less sparse , more dense 
They are data reduced 
They always give an answer 
They are immediately useful 
Its like the thing with Data->Information->Knowledge + (Wisdom?) 
As we noted before the models we are talking about are mostly Opaque, they do not generate Wisdom. 
OWF 2014 
14
Laws of data. I like laws. 
«Information wants to be free » 
Stewart Brand 
Well this one is less of a law in the sens of a physical one, and more of a moral one. We will get back to this at the end. 
OWF 2014 
15
Laws.. 
“Hybrid data makes all your data big” 
I think that's me.. But you know, zeitgeist 
Hybrid data denotes “Data Applications” where the data comes from your own internal data sources and either open or proprietary external sources. 
Often enough mixing data sources has a combinatorial effect. Data locality become really important. 
Using Predictive APIs means building a Hybrid Data application where you only have access to the resulting model.
Watson in the mix 
ML requires data. The bigger it gets the more robust you will be. 
Open Source mostly commoditizes the algorithmic and software layer, not a lot of secret source there. 
Players with the most data will probably be able to build more robust models 
And as basically all “Data Applications” will be Hybrid ones, we will see more and more applications dependent on external derived, opaque, models
Predictive APIs 
The "As A service" crowd is becoming the more potent rival to Free software 
While most of them will run Open Source solutions in any case 
Most of the value will remain proprietary and these robust models are going to be at least as important as the software 
As a company, blindly going into this means you might very well find yourself extremly dependent on others for some of your core operations 
Free software alone will not defend you
2014 this is happenning already 
OWF 2014 
19
It’s a social issue too 
There is a strong ethical reason we want to fight not only for open source but also for open data 
The advent of opaque systems with smart algorithms and an extreme amount of data on us (the proprietary data + as a service model) is not only going to be bad for our privacy, its going to have tangible effects on our livelihoods, on our place is society as it can introduce an extreme form of information asymmetry at a scale not seen before. 
In this domain more then in others the actors of Free Software need to be more vigilant and by working with the other actors of freedom make sure we are not constructing the tools of our demise.
Information wants to be free 
Well, if you are stuck in the 2000s and do nightly batches you are probably not managing well your own internal data wealth. So get on it. 
Learn about what we can currently do in Machine Learning. Start having a plan. 
Don’t hoard the data. Open it at least to some extent. 
Collaborate on the economical and social framework for open data and open models. 
Either because you are a government and you have a moral obligation to defend your citizens. 
Or because if you become a consumer only you will not be able to manage your dependency on external opaque sources. 
OWF 2014 
21
#ParisDataGeeks 
… and come tomorrow starting at 9am for talks such as: 
 Algebird : algebra for efficient big data processing Abstract algebra for data mining par Sam Bessalah (Software Engineer, Independant) 
Context Awareness : From NEST to Google Now and IFTTT, in this talk we will go through some of the most successful use cases of context awareness, and explain some of the technology behind the pocket brain we are currently building at Snips. par Dr. Rand Hindi 
Apache Kafka distributed publish-subscribe messaging system Par Charly Clairmont (CTO, Altic) 
Data encoding and Metadata for Streams Par Jonathan Winandy (Founder, Primatice) 
Next Open Source Big Data Suite A new low level approach for BigData Par Emmanuel Keller (CEO/CTO, OpenSearchServer) 
State Of the Art in Machine Learning Par Olivier Grisel (Software Engineer, Inria) 
Take back control of your web tracking Go further by doing it yourself par Clément Stenac (CTO, Dataiku) 
Real time energy data analysis with Apache Storm par Simon Maby (Software Architect, Octo Technology) 
OWF 2014 
22

More Related Content

What's hot

Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
Edureka!
 
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Heidi Nance
 

What's hot (20)

Business analytics
Business analyticsBusiness analytics
Business analytics
 
2015 data-science-salary-survey
2015 data-science-salary-survey2015 data-science-salary-survey
2015 data-science-salary-survey
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015Data Culture Series  - Keynote & Panel - Birmingham - 8th April 2015
Data Culture Series - Keynote & Panel - Birmingham - 8th April 2015
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
Open Source Data Visualization for Resource Sharing: An Ivy Plus Libraries Pr...
 
Big data - What is It?
Big data - What is It?Big data - What is It?
Big data - What is It?
 
Big data, big opportunities
Big data, big opportunitiesBig data, big opportunities
Big data, big opportunities
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Big data
Big dataBig data
Big data
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 

Viewers also liked

Viewers also liked (6)

Construire Des Applications Cloud Natives - SymfonyLive Paris 2016
Construire Des Applications Cloud Natives - SymfonyLive Paris 2016Construire Des Applications Cloud Natives - SymfonyLive Paris 2016
Construire Des Applications Cloud Natives - SymfonyLive Paris 2016
 
Чингис Санданов - Что такое DevOps (What is DevOps)
Чингис Санданов - Что такое DevOps (What is DevOps)Чингис Санданов - Что такое DevOps (What is DevOps)
Чингис Санданов - Что такое DevOps (What is DevOps)
 
Latency vs everything
Latency vs everythingLatency vs everything
Latency vs everything
 
Brisbane Drupal meetup - 2016 Jan - Drupal hostings
Brisbane Drupal meetup - 2016 Jan - Drupal hostingsBrisbane Drupal meetup - 2016 Jan - Drupal hostings
Brisbane Drupal meetup - 2016 Jan - Drupal hostings
 
(micro)services avec Symfony et Tolerance
(micro)services avec Symfony et Tolerance(micro)services avec Symfony et Tolerance
(micro)services avec Symfony et Tolerance
 
SAAS IS THE ENEMY OF OPEN SOURCE GOOD THING THAT WE ARE IN THE POST-SAAS ERA
SAAS IS THE  ENEMY OF OPEN SOURCE  GOOD THING THAT WE ARE IN THE POST-SAAS ERASAAS IS THE  ENEMY OF OPEN SOURCE  GOOD THING THAT WE ARE IN THE POST-SAAS ERA
SAAS IS THE ENEMY OF OPEN SOURCE GOOD THING THAT WE ARE IN THE POST-SAAS ERA
 

Similar to OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

Open source vs. open data
Open source vs. open dataOpen source vs. open data
Open source vs. open data
data publica
 

Similar to OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix (20)

Open data 4 startups (2°edition)
Open data 4 startups (2°edition)Open data 4 startups (2°edition)
Open data 4 startups (2°edition)
 
Tim w open data strategy 12th may 2011
Tim w open data strategy  12th may 2011Tim w open data strategy  12th may 2011
Tim w open data strategy 12th may 2011
 
The IT Intelligence Foundation For Digital Business Transformation Builds fro...
The IT Intelligence Foundation For Digital Business Transformation Builds fro...The IT Intelligence Foundation For Digital Business Transformation Builds fro...
The IT Intelligence Foundation For Digital Business Transformation Builds fro...
 
Machine Learning and Social Participation
Machine Learning and Social ParticipationMachine Learning and Social Participation
Machine Learning and Social Participation
 
Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
 
How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...
 
BDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the massesBDS14 Big Data Analytics to the masses
BDS14 Big Data Analytics to the masses
 
Using AI to Solve Data and IT Complexity -- And Better Enable AI
Using AI to Solve Data and IT Complexity -- And Better Enable AIUsing AI to Solve Data and IT Complexity -- And Better Enable AI
Using AI to Solve Data and IT Complexity -- And Better Enable AI
 
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-...
 
BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?BIG DATA | How to explain it & how to use it for your career?
BIG DATA | How to explain it & how to use it for your career?
 
Open source vs. open data
Open source vs. open dataOpen source vs. open data
Open source vs. open data
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
The Open Group Panel: Internet of Things – Opportunities and Obstacles
The Open Group Panel: Internet of Things – Opportunities and ObstaclesThe Open Group Panel: Internet of Things – Opportunities and Obstacles
The Open Group Panel: Internet of Things – Opportunities and Obstacles
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021The 4 Biggest Trends In Big Data and Analytics Right For 2021
The 4 Biggest Trends In Big Data and Analytics Right For 2021
 
Big Data 2.0
Big Data 2.0Big Data 2.0
Big Data 2.0
 
Choice, Consistency, Confidence Keys to Improving Services' Performance throu...
Choice, Consistency, Confidence Keys to Improving Services' Performance throu...Choice, Consistency, Confidence Keys to Improving Services' Performance throu...
Choice, Consistency, Confidence Keys to Improving Services' Performance throu...
 
A people-centred approach to Data and the Internet of Things
A people-centred approach to Data and the Internet of ThingsA people-centred approach to Data and the Internet of Things
A people-centred approach to Data and the Internet of Things
 
A Primer for a layman about Big Data, Business Analytics and Cloud
A Primer for a layman  about Big Data, Business Analytics and CloudA Primer for a layman  about Big Data, Business Analytics and Cloud
A Primer for a layman about Big Data, Business Analytics and Cloud
 

More from Paris Open Source Summit

#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
Paris Open Source Summit
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
Paris Open Source Summit
 

More from Paris Open Source Summit (20)

#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
 
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
 
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
 
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
 
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
 
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec Wa...
 
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
 
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
 
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
 
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
 
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
 
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
 
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
 
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
 
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
 
#OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données #OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
 
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
 
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
 

Recently uploaded

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 

Recently uploaded (20)

2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heap
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 

OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix

  • 1. Ori Pekelman @ Open World Forum 2014 Combining Big Data & Open Source strategy
  • 2. Me I am an entrepreneur and a consultant check out http://platform.sh on which I have been working a lot on lately I am the originator and co-organizer of a bunch of meetups such as the Functional Languages User Group (btw happenning right now…) and the big informal Data group we call ParisDataGeeks (with people like Olivier Grisel and Sam Bessalah.. And btw this one happens all day tomorrow!) On Social Media I am @OriPekelman OWF 2014 2
  • 3. Big Data Small Talk This is a short talk. There won’t be anything overly technical here. I don’t remember how this got to be the title of the talk.. If you come tomorrow you will get an incredible birds-eye view of current trends in real time big machine learningy data applications OWF 2014 3
  • 4. Data this Data that Data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data data … data Everybody loves the data. OWF 2014 4
  • 5. Data this Data that OWF 2014 5
  • 6. Data this Data that Well, are contractions are so hard that with 100 petabytes we can’t do some simple Markov chains in the 24th century ? We say “Big Data” so often these days it has become an extremely vague term. And when we say “Open Data” we get the same form of vagueness. Let’s try to frame this.
  • 7. What applications of big data are we talking about here? The machine learning kind: Everything else is mostly trivial or just a bit of engineering away. When we say Machine Learning it basically means: Rediculous amount of data 100% proprietary mostly about intimate human interactions Software Mostly Open Source Model Mostly Opaque. Mostly Closed. Some with APIs Robust Predictions Mostly about human behaviour
  • 8. The ingredients Data sources Proprietary and closed (property of whom?) Proprietary with some APIs Open with an open license Software Proprietary Free Stuff to run the software on Proprietary Models Proprietary and closed Proprietary with some APIs ? OWF 2014 8
  • 9. Data property Us as individuals to a very faint degree Governements Google Apple Credit Card Companies and banks People we haven’t heard about OWF 2014 9
  • 10. On the software ingredient Big Data is predominantly an Open Source game How much big data software is not prefixed by « Apache »? OWF 2014 10
  • 11. Laws of data. I like laws. « Data expands to fill the space available for storage. » Parkinson’s law applied to data « Free disk space is always pronounced in percentage, and the percentage is always a single digit » My father OWF 2014 11
  • 12. Parkinson’s law Cloud technologies represent an ultimate phase in the commoditization of computing storage and calculation power Becomes limited only by cost (well at least in theory). So if we take Parkinson’s law to the letter data will expand until we have spent humanity’s last dime. OWF 2014 12
  • 13. The Cloud, Data and Free Software The cloud is orthogonal at the least to the basic idea of free software (the libre variety) Because what makes free software economically possible is that the marginal cost of duplicating code tends to zero. The marginal cost of duplicating data grows at best linearly and because of Parkinson's law.. Probably more than that. This means that in the list of ingredients we noted before “data” will by nature be mostly proprietary. Because its cost is directly linked to that of machines and because Moore’s law is of no help.
  • 14. Models Models are better than data They are less sparse , more dense They are data reduced They always give an answer They are immediately useful Its like the thing with Data->Information->Knowledge + (Wisdom?) As we noted before the models we are talking about are mostly Opaque, they do not generate Wisdom. OWF 2014 14
  • 15. Laws of data. I like laws. «Information wants to be free » Stewart Brand Well this one is less of a law in the sens of a physical one, and more of a moral one. We will get back to this at the end. OWF 2014 15
  • 16. Laws.. “Hybrid data makes all your data big” I think that's me.. But you know, zeitgeist Hybrid data denotes “Data Applications” where the data comes from your own internal data sources and either open or proprietary external sources. Often enough mixing data sources has a combinatorial effect. Data locality become really important. Using Predictive APIs means building a Hybrid Data application where you only have access to the resulting model.
  • 17. Watson in the mix ML requires data. The bigger it gets the more robust you will be. Open Source mostly commoditizes the algorithmic and software layer, not a lot of secret source there. Players with the most data will probably be able to build more robust models And as basically all “Data Applications” will be Hybrid ones, we will see more and more applications dependent on external derived, opaque, models
  • 18. Predictive APIs The "As A service" crowd is becoming the more potent rival to Free software While most of them will run Open Source solutions in any case Most of the value will remain proprietary and these robust models are going to be at least as important as the software As a company, blindly going into this means you might very well find yourself extremly dependent on others for some of your core operations Free software alone will not defend you
  • 19. 2014 this is happenning already OWF 2014 19
  • 20. It’s a social issue too There is a strong ethical reason we want to fight not only for open source but also for open data The advent of opaque systems with smart algorithms and an extreme amount of data on us (the proprietary data + as a service model) is not only going to be bad for our privacy, its going to have tangible effects on our livelihoods, on our place is society as it can introduce an extreme form of information asymmetry at a scale not seen before. In this domain more then in others the actors of Free Software need to be more vigilant and by working with the other actors of freedom make sure we are not constructing the tools of our demise.
  • 21. Information wants to be free Well, if you are stuck in the 2000s and do nightly batches you are probably not managing well your own internal data wealth. So get on it. Learn about what we can currently do in Machine Learning. Start having a plan. Don’t hoard the data. Open it at least to some extent. Collaborate on the economical and social framework for open data and open models. Either because you are a government and you have a moral obligation to defend your citizens. Or because if you become a consumer only you will not be able to manage your dependency on external opaque sources. OWF 2014 21
  • 22. #ParisDataGeeks … and come tomorrow starting at 9am for talks such as:  Algebird : algebra for efficient big data processing Abstract algebra for data mining par Sam Bessalah (Software Engineer, Independant) Context Awareness : From NEST to Google Now and IFTTT, in this talk we will go through some of the most successful use cases of context awareness, and explain some of the technology behind the pocket brain we are currently building at Snips. par Dr. Rand Hindi Apache Kafka distributed publish-subscribe messaging system Par Charly Clairmont (CTO, Altic) Data encoding and Metadata for Streams Par Jonathan Winandy (Founder, Primatice) Next Open Source Big Data Suite A new low level approach for BigData Par Emmanuel Keller (CEO/CTO, OpenSearchServer) State Of the Art in Machine Learning Par Olivier Grisel (Software Engineer, Inria) Take back control of your web tracking Go further by doing it yourself par Clément Stenac (CTO, Dataiku) Real time energy data analysis with Apache Storm par Simon Maby (Software Architect, Octo Technology) OWF 2014 22