SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
What is hack/reduce?

• A Home for the Big Data Community
• 24/7 Access to Cluster Compute Power
• Regular Hackathons
hack/reduce
2011
        Montreal
        Toronto
        Boston
        Ottawa


2012   hack/reduce
        Boston’s Big Data Hackspace
Why should you care?
• Work with Millions and Billions of records
• Find patterns in Big Data sets
• Use data to detect, predict, forecast
• Extract new information from raw data
APIs Suck
In Big data there are:
    •   no requests,
    •   no predefined parameters
    •   no structured responses.
You are free to intersect anything with anything.
You can analyse, mutate, group, split, reorder in any
way you can imagine.
What you can do today


• Access the hack/reduce GoGrid Cluster:
 • 240 Cores
 • 240GB of RAM
 • 10TB of Disk
What you can do today
Use Hadoop to Explore big Open Data sets, like:
  • 20 Years of the Federal Parliament Hansard
  • Hourly Canadian Weather 1953 to 2001
  • The 1881 Census. Details about 4.3M people
  • One Summer of Bixi Station Status Updates
What is Map/Reduce?
• Framework for distributed computing on
  large data sets on clusters of computers
• MapReduce patented by Google
• Hadoop implementation is Googlesque
• Michael Stonebraker hates it
What is Map/Reduce?
• Map = function applied in parallel to every
  item in the dataset
• Reduce = function applied in parallel to
  groups of values emitted by Map function
What is Map/Reduce?

map(String docId, String document):
  for each word w in document: emit(w, 1);

reduce(String word, Iterator counts):
  int sum = 0;
  for each count in counts: sum += count;
  emit(word, sum);
private key (“hackreduce”):
http://bit.ly/X13pNh

wiki:
http://github.com/hackreduce/Hackathon


SSH:
ssh -i hackreduce hackreduce@cluster-



MapReduce:
http://cluster-1-master.gg.hackreduce

Contenu connexe

Tendances

Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case Muh Saleh
 
Big data and hadoop lightining talk
Big data and hadoop   lightining talkBig data and hadoop   lightining talk
Big data and hadoop lightining talkEsther Kundin
 
A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primersogrady
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solutionStatice
 
From BigTable to HBase and back again
From BigTable to HBase and back againFrom BigTable to HBase and back again
From BigTable to HBase and back againLeonardo Gamas
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014James Chittenden
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps Ontotext
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
How is smart data cooked?
How is smart data cooked?How is smart data cooked?
How is smart data cooked?Ontotext
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesData Driven Innovation
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionMurtaza Doctor
 
Big Data presentation Tensing
Big Data presentation TensingBig Data presentation Tensing
Big Data presentation Tensingtensing-gis
 

Tendances (20)

Hadoop
HadoopHadoop
Hadoop
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Hadoop Training
Hadoop TrainingHadoop Training
Hadoop Training
 
Big data and hadoop lightining talk
Big data and hadoop   lightining talkBig data and hadoop   lightining talk
Big data and hadoop lightining talk
 
A Hadoop Primer
A Hadoop PrimerA Hadoop Primer
A Hadoop Primer
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution
 
Data Skipping Technology
Data Skipping TechnologyData Skipping Technology
Data Skipping Technology
 
Big Data
Big DataBig Data
Big Data
 
From BigTable to HBase and back again
From BigTable to HBase and back againFrom BigTable to HBase and back again
From BigTable to HBase and back again
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
How is smart data cooked?
How is smart data cooked?How is smart data cooked?
How is smart data cooked?
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple Sources
 
SGI Big Data Launch
SGI Big Data LaunchSGI Big Data Launch
SGI Big Data Launch
 
Big Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to ActionBig Data at Tube: Events to Insights to Action
Big Data at Tube: Events to Insights to Action
 
Big Data presentation Tensing
Big Data presentation TensingBig Data presentation Tensing
Big Data presentation Tensing
 
Smart data hub
Smart data hubSmart data hub
Smart data hub
 

Similaire à Hack reduce introduction

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015Zohar Elkayam
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemPetr Novotný
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 

Similaire à Hack reduce introduction (20)

Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data
Big dataBig data
Big data
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Introduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 SystemIntroduction to Big Data & Big Data 1.0 System
Introduction to Big Data & Big Data 1.0 System
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
big data
big data big data
big data
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Big data ankita1
Big data ankita1Big data ankita1
Big data ankita1
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 

Plus de montrealouvert

5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologies5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologiesmontrealouvert
 
5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juin5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juinmontrealouvert
 
Contrats net - analyse des contrats
Contrats net - analyse des contratsContrats net - analyse des contrats
Contrats net - analyse des contratsmontrealouvert
 
Journée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc GauvreauJournée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc Gauvreaumontrealouvert
 
DataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement CanadaDataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement Canadamontrealouvert
 
Serveur Weather Environnement Canada
Serveur Weather Environnement CanadaServeur Weather Environnement Canada
Serveur Weather Environnement Canadamontrealouvert
 
Présentation opendata christiangendreau
Présentation opendata christiangendreauPrésentation opendata christiangendreau
Présentation opendata christiangendreaumontrealouvert
 
Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés montrealouvert
 
Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)montrealouvert
 
Allumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la CorrutpionAllumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la Corrutpionmontrealouvert
 
Jean Fortier Hackons la Corruption
Jean Fortier Hackons la CorruptionJean Fortier Hackons la Corruption
Jean Fortier Hackons la Corruptionmontrealouvert
 
Présentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruptionPrésentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruptionmontrealouvert
 
Ffctn hackons la-corruption
Ffctn hackons la-corruptionFfctn hackons la-corruption
Ffctn hackons la-corruptionmontrealouvert
 
Communautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directeCommunautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directemontrealouvert
 
Congrès des archivestes
Congrès des archivestesCongrès des archivestes
Congrès des archivestesmontrealouvert
 
Première rencontre publique Québec Ouvert
Première rencontre publique Québec OuvertPremière rencontre publique Québec Ouvert
Première rencontre publique Québec Ouvertmontrealouvert
 
How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...montrealouvert
 
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à QuébecPrésentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québecmontrealouvert
 

Plus de montrealouvert (20)

5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologies5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologies
 
5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juin5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juin
 
Contrats net - analyse des contrats
Contrats net - analyse des contratsContrats net - analyse des contrats
Contrats net - analyse des contrats
 
Journée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc GauvreauJournée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc Gauvreau
 
DataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement CanadaDataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement Canada
 
Serveur Weather Environnement Canada
Serveur Weather Environnement CanadaServeur Weather Environnement Canada
Serveur Weather Environnement Canada
 
Joost ouwerkerk
Joost ouwerkerk Joost ouwerkerk
Joost ouwerkerk
 
Présentation opendata christiangendreau
Présentation opendata christiangendreauPrésentation opendata christiangendreau
Présentation opendata christiangendreau
 
Hack reduce mr-intro
Hack reduce mr-introHack reduce mr-intro
Hack reduce mr-intro
 
Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés
 
Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)
 
Allumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la CorrutpionAllumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la Corrutpion
 
Jean Fortier Hackons la Corruption
Jean Fortier Hackons la CorruptionJean Fortier Hackons la Corruption
Jean Fortier Hackons la Corruption
 
Présentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruptionPrésentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruption
 
Ffctn hackons la-corruption
Ffctn hackons la-corruptionFfctn hackons la-corruption
Ffctn hackons la-corruption
 
Communautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directeCommunautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directe
 
Congrès des archivestes
Congrès des archivestesCongrès des archivestes
Congrès des archivestes
 
Première rencontre publique Québec Ouvert
Première rencontre publique Québec OuvertPremière rencontre publique Québec Ouvert
Première rencontre publique Québec Ouvert
 
How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...
 
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à QuébecPrésentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
 

Hack reduce introduction

  • 1.
  • 2. What is hack/reduce? • A Home for the Big Data Community • 24/7 Access to Cluster Compute Power • Regular Hackathons
  • 3.
  • 4. hack/reduce 2011 Montreal Toronto Boston Ottawa 2012 hack/reduce Boston’s Big Data Hackspace
  • 5.
  • 6.
  • 7. Why should you care? • Work with Millions and Billions of records • Find patterns in Big Data sets • Use data to detect, predict, forecast • Extract new information from raw data
  • 8. APIs Suck In Big data there are: • no requests, • no predefined parameters • no structured responses. You are free to intersect anything with anything. You can analyse, mutate, group, split, reorder in any way you can imagine.
  • 9. What you can do today • Access the hack/reduce GoGrid Cluster: • 240 Cores • 240GB of RAM • 10TB of Disk
  • 10. What you can do today Use Hadoop to Explore big Open Data sets, like: • 20 Years of the Federal Parliament Hansard • Hourly Canadian Weather 1953 to 2001 • The 1881 Census. Details about 4.3M people • One Summer of Bixi Station Status Updates
  • 11.
  • 12. What is Map/Reduce? • Framework for distributed computing on large data sets on clusters of computers • MapReduce patented by Google • Hadoop implementation is Googlesque • Michael Stonebraker hates it
  • 13. What is Map/Reduce? • Map = function applied in parallel to every item in the dataset • Reduce = function applied in parallel to groups of values emitted by Map function
  • 14. What is Map/Reduce? map(String docId, String document): for each word w in document: emit(w, 1); reduce(String word, Iterator counts): int sum = 0; for each count in counts: sum += count; emit(word, sum);
  • 15. private key (“hackreduce”): http://bit.ly/X13pNh wiki: http://github.com/hackreduce/Hackathon SSH: ssh -i hackreduce hackreduce@cluster- MapReduce: http://cluster-1-master.gg.hackreduce

Notes de l'éditeur

  1. We are hopper. Hopper is using Big Data to solve travel planning.
  2. Hopper ’ s Montreal office was home to the inaugural Hack/Reduce event two years ago.
  3. Hack/reduce is a community We held 4 events, in Montreal, Toronto, Boston and Ottawa. More than 300 hackers participated. Now we ’ re building a permanent Hack/Reduce community hackspace in Boston.
  4. We are hopper. Hopper is using Big Data to solve travel planning.
  5. GoGrid is sponsoring the cluster
  6. GoGrid is sponsoring the cluster
  7. If you ’ re interested in learning something different. Come talk to us.
  8. If you ’ re interested in learning something different. Come talk to us.