Data Engine

•

0 j'aime•900 vues

Slides to accompany Patrick McSweeney's winning pitch in the Open Repositories 2012 DevCSI Developer Challenge. More information about this entry can be found at http://devcsi.ukoln.ac.uk/or2012-developer-challenge-data-engine

Formation Technologie

Dave Mills
● PhD electrical engineering
● 10-15 Number of experiments

per month
● Raw data: 1 GB

● Processed data : 5-10 MB

● Processed with MATLAB.

●
Raw data when zipped: 450 MB

Dave
Patrick

State of the onion
Researchers are using many different methods to collect or generate data from sensors
and CCDs to supercomputers and particle colliders. When the data finally shows up in
your computer, what do you do with all this information that is now in your digital
shoebox? People are continually seeking me out and saying, “Help! I’ve got all this data.
What am I supposed to do with it? My Excel spreadsheets are getting out of hand!”

The suggestion that I have been making is that we now have terrible data management
tools for most of the science disciplines. Commercial organizations like Walmart can
afford to build their own data management software, but in science we do not have that
luxury. At present, we have hardly any data visualization and analysis tools. Some
research communities use MATLAB, for example, but the funding agencies in the U.S.
and elsewhere need to do a lo more to foster the building of tools to make scientists
more productive. When you go and look at what scientists are doing, day in and day out,
in terms of data analysis, it is truly dreadful. And I suspect that many of you are in the
same state that I am in where essentially the only tools I have at my disposal are
MATLAB and Excel!

Take home
● An important step on the road to data science
● Make the repository a tool
● Get the data at the point of creatation
● Repeatable experiments

Contenu connexe

Tendances

Introduction to Python for Data ScienceArc & Codementor

Session 10 handling bigger databodaceacat

Open Data, Big Data and Machine LearningSteven Van Vaerenbergh

Data Science Project LifecycleJason Geng

Introduction to Data ScienceEdureka!

Session 01 designing and scoping a data science projectbodaceacat

Is one enough? Data warehousing for biomedical researchGreg Landrum

Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...Daniel Katz

Machine Learning using Big data Vaibhav Kurkute

Big Data AnalyticsIMC Institute

Intro to Python for Data ScienceTJ Stalcup

Application of Clustering in Data Science using Real-life Examples Edureka!

Data Science Isn't a Fad: Let's Keep it That WayMelinda Thielbar

Introduction_OF_Hadoop_and_BigDataNilay Mishra

Python for Data ScienceGabriel Moreira

Data Science: Past, Present, and FutureGregory Piatetsky-Shapiro

Overview of bigdataAbinaya B

Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)Toshiyuki Shimono

Life of a data scientist (pub)Buhwan Jeong

"Research Data: Management, Access, Control" Symposium at the University at B...Charles Lyons

Tendances (20)

Introduction to Python for Data Science

Session 10 handling bigger data

Open Data, Big Data and Machine Learning

Data Science Project Lifecycle

Introduction to Data Science

Session 01 designing and scoping a data science project

Is one enough? Data warehousing for biomedical research

Quantitative Methods for Lawyers - R Boot Camp Bonus Module - Professor Danie...

Machine Learning using Big data

Big Data Analytics

Intro to Python for Data Science

Application of Clustering in Data Science using Real-life Examples

Data Science Isn't a Fad: Let's Keep it That Way

Introduction_OF_Hadoop_and_BigData

Python for Data Science

Data Science: Past, Present, and Future

Overview of bigdata

Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)

Life of a data scientist (pub)

"Research Data: Management, Access, Control" Symposium at the University at B...

Similaire à Data Engine

Data science presentationMSDEVMTL

Getting to Know Your Data with RStephen Withington

Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.

NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang

Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey

Thinkful DC - Intro to Data Science TJ Stalcup

Materials for getting started with data scienceihansel

JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West

POWRR Tools: Lessons learned from an IMLS National Leadership GrantLynne Thomas

What Managers Need to Know about Data ScienceAnnie Flippo

Intro to Data ScienceTJ Stalcup

NDC Oslo : A Practical Introduction to Data ScienceMark West

Big data visualization frameworks and applications at Kitwarebigdataviz_bay

Building Data Products with Python (Georgetown)Benjamin Bengfort

Big Data Talent in Academic and Industry R&DUniversity of Washington

OpenML data@SheffieldJoaquin Vanschoren

Python's Role in the Future of Data AnalysisPeter Wang

Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesKrishna Sankar

00-01 DSnDA.pdfSugumarSarDurai

Data Science towards the Digital EnterpriseJake Bouma

Similaire à Data Engine (20)

Data science presentation

Getting to Know Your Data with R

Bridging Big Data and Data Science Using Scalable Workflows

NYC Open Data Meetup-- Thoughtworks chief data scientist talk

Data Science - An emerging Stream of Science with its Spreading Reach & Impact

Thinkful DC - Intro to Data Science

Materials for getting started with data science

JavaZone 2018 - A Practical(ish) Introduction to Data Science

POWRR Tools: Lessons learned from an IMLS National Leadership Grant

What Managers Need to Know about Data Science

Intro to Data Science

NDC Oslo : A Practical Introduction to Data Science

Big data visualization frameworks and applications at Kitware

Building Data Products with Python (Georgetown)

Big Data Talent in Academic and Industry R&D

OpenML data@Sheffield

Python's Role in the Future of Data Analysis

Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes

00-01 DSnDA.pdf

Data Science towards the Digital Enterprise

Dernier

How to Add Barcode on PDF Report in Odoo 17Celine George

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

Karra SKD Conference Presentation Revised.pptxAshokKarra1

Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan

Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105

ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli

Keynote by Prof. Wurzer at Nordex about IP-designMIPLM

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood

Field Attribute Index Feature in Odoo 17Celine George

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George

ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxConquiztadors- the Quiz Society of Sri Venkateswara College

Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543

Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2

Dernier (20)

How to Add Barcode on PDF Report in Odoo 17

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx

Karra SKD Conference Presentation Revised.pptx

Gas measurement O2,Co2,& ph) 04/2024.pptx

Barangay Council for the Protection of Children (BCPC) Orientation.pptx

ACC 2024 Chronicles. Cardiology. Exam.pdf

Keynote by Prof. Wurzer at Nordex about IP-design

ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx

Field Attribute Index Feature in Odoo 17

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17

ANG SEKTOR NG agrikultura.pptx QUARTER 4

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx

Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)

Grade 9 Q4-MELC1-Active and Passive Voice.pptx

Data Engine

1. DataEngine By Patrick McSweeney

2. Dave Mills ● PhD electrical engineering ● 10-15 Number of experiments per month ● Raw data: 1 GB ● Processed data : 5-10 MB ● Processed with MATLAB. ● Raw data when zipped: 450 MB Dave Patrick

3. State of the onion Researchers are using many different methods to collect or generate data from sensors and CCDs to supercomputers and particle colliders. When the data finally shows up in your computer, what do you do with all this information that is now in your digital shoebox? People are continually seeking me out and saying, “Help! I’ve got all this data. What am I supposed to do with it? My Excel spreadsheets are getting out of hand!” The suggestion that I have been making is that we now have terrible data management tools for most of the science disciplines. Commercial organizations like Walmart can afford to build their own data management software, but in science we do not have that luxury. At present, we have hardly any data visualization and analysis tools. Some research communities use MATLAB, for example, but the funding agencies in the U.S. and elsewhere need to do a lo more to foster the building of tools to make scientists more productive. When you go and look at what scientists are doing, day in and day out, in terms of data analysis, it is truly dreadful. And I suspect that many of you are in the same state that I am in where essentially the only tools I have at my disposal are MATLAB and Excel!

4. State of the onion

5. Data imported

6. Data provenance

7. Data manipulation

8. Choose Visualsiation

9. Save visualisation

10. Lots of possibilities

11. Take home ● An important step on the road to data science ● Make the repository a tool ● Get the data at the point of creatation ● Repeatable experiments

12. The outlook is good

Data Engine

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Data Engine

Similaire à Data Engine (20)

Dernier

Dernier (20)

Data Engine