Winning with Big Data: Secrets of the Successful Data Scientist

•

46 j'aime•5,216 vues

A new class of professionals, called data scientists, have emerged to address the Big Data revolution. In this talk, I discuss nine skills for munging, modeling, and visualizing Big Data. Then I present a case study of using these skills: the analysis of billions of call records to predict customer churn at a North American telecom. http://en.oreilly.com/datascience/public/schedule/detail/15316

Technologie

WINNING WITH BIG DATA Secrets of the Successful Data Scientist Making Data Work June 9, 2010 Michael Driscoll @dataspora

WHAT IS BIG DATA? Data that is distributed.

1. CHOOSE THE RIGHT TOOL You don’t need a chainsaw to cut butter.

2. COMPRESS EVERYTHING mysqldump -u myuser -p mypasssourceDB| br />gzip| sshmike@dataspora.com "cat - | br />gunzip | mysql-u myuser -p mypasstargetDB" The world is IO-bound.

3. SPLIT UP YOUR DATA Split, apply, combine. See Hadley Wickham’s paper at http://had.co.nz/plyr/plyr-intro-090510.pdf

4. WORK WITH SAMPLES perl -ne "print if (rand() < 0.01)" data.csv > sample.csv Big Data is heavy, samples are light.

COPY FROM OTHERS git clone git://github.com/kevinweil/hadoop-lzo Use open source.

7. ESCAPE CHART TYPOLOGIES Charts are compositions, not containers.

8. USE COLOR WISELY Color can enhance or insult.

WHY DO TELCO CUSTOMERS LEAVE? Sign up Leave Goal: “less churn.”

DATA: BILLIONS OF CALLS … and millions of callers.

DOES CALL QUALITY MATTER? … a difference, but not significant.

BUILD THE CALL GRAPH … but is it predictive?

700% INCREASE IN CHURN when a cancellation occurs in a call network.

THANKS! QUESTIONS? Michael Driscoll twitter @dataspora http://www.dataspora.com/blog Making Data Work June 9, 2010

Contenu connexe

En vedette

Multi Level Modelling&Weights Workshop Kiel09

egebhardt72

A Survey Of R Graphics

Dataspora

ForecastIT 6. Multi-Variable Linear Regression

DeepThought, Inc.

Social Network Analysis for Telecoms

Dataspora

R is a fun and versatile language for statistical analysis, visualization, and data exploration. Target audience are software engineers/programmers who can code comfortably in another language. Emphasis in this lesson is on data structures, and light on analysis examples (to be covered at later date) but you are exposed to the basic concepts and commands. Email me for the pptx file which has notes.

Introduction to R

Stacy Irwin

An Interactive Introduction To R (Programming Language For Statistics)

Dataspora

How to Become a Data Scientist

ryanorban

A graph is a data structure composed of vertices/dots and edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012.

Titan: The Rise of Big Graph Data

Marko Rodriguez

Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland. ABSTRACT There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics. The presentation is also available at http://www.statoo.com/BigDataDataScience/.

A Statistician's View on Big Data and Data Science (Version 1)

Prof. Dr. Diego Kuonen

We at Revolution Analytics are often asked “What is the best way to learn R?” While acknowledging that there may be as many effective learning styles as there are people we have identified three factors that greatly facilitate learning R. For a quick start: - Find a way of orienting yourself in the open source R world - Have a definite application area in mind - Set an initial goal of doing something useful and then build on it In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will: - Provide an orientation to R’s data mining resources - Show how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data. - Show the simple R commands to accomplish these same tasks without the GUI - Demonstrate how to build on these fundamental skills to gain further competence in R - Move away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR Data scientists and analysts using other statistical software as well as students who are new to data mining should come away with a plan for getting started with R.

Introduction to R for Data Mining

Revolution Analytics

En vedette (10)

Multi Level Modelling&Weights Workshop Kiel09

A Survey Of R Graphics

ForecastIT 6. Multi-Variable Linear Regression

Social Network Analysis for Telecoms

Introduction to R

An Interactive Introduction To R (Programming Language For Statistics)

How to Become a Data Scientist

Titan: The Rise of Big Graph Data

A Statistician's View on Big Data and Data Science (Version 1)

Introduction to R for Data Mining

Similaire à Winning with Big Data: Secrets of the Successful Data Scientist

Date: 13th November 2018 Location: Data Ops Theatre Time: 13:50 - 14:20 Speakers: Terry McCann, Adatis & Chris Conroy, Rank Group About: Rank Group approached Adatis Consulting Ltd in 2017 to help tackle a key issue their data science team were encountering – “How do you to gracefully transition from one machine learning model to another as models are retrained and rewritten?” Rank Group is the owner of many popular gaming brands in the UK, including Grosvenor Casinos and Mecca Bingo. Rank use Machine Learning to optimise and influence business decisions across their enterprise. Models are deployed to identify customers churn, improve cross sale, enhance retention and most importantly to identify customers who are at risk of having a gambling addiction. These models are constantly being evaluated and retrained as gambling habits change and new games are introduced. London-based advanced analytics consultancy Adatis implemented a new advanced Machine Learning Model Management service based on the "Rendezvous" Architecture created by Ellen Friedman and Ted Dunning. Rendezvous handles the distribution of a single request to multiple models, scoring all in parallel, then decides on the most appropriate output to return. This is a massively scalable, flexible architecture that solves one of the key problems encountered by Data Science teams today. In this session we will look at the original problem and the architecture which was used to solve it.

Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...

Matt Stubbs

Around Data Science

Frieda Brioschi

Literacy in the Age of Big Data

Centre for Advanced Management Education

Starting with outlining the history of conventional version control before diving into explaining QoDs (Quantitative Oriented Developers) and the unique problems their ML systems pose from an operations perspective (MLOps). With the only status quo solutions being proprietary in-house pipelines (exclusive to Uber, Google, Facebook) and manual tracking/fragile "glue" code for everyone else. Datmo works to solve this issue by empowering QoDs in two ways: making MLOps manageable and simple (rather than completely abstracted away) as well as reducing the amount of glue code so to ensure more robust pipelines.

Version Control in AI/Machine Learning by Datmo

Nicholas Walsh

Big Data Maturity and its Evolution

Sriram Murali K J

Gluecon miller horizon

Mike Miller

Around Data Science (v. 2021 ITA)

Frieda Brioschi

Satyam open analytics nyc

Open Analytics

Big data - Aditya Yadav

Aditya Yadav

Big data myths busted

Gary Allemann

big data

Jisha Aravind

BigData Analytics

Mayank Kumar Sharma

HadoopWorkshopJuly2014

Dieter De Witte

Big Data Analytics

IMC Institute

Horizon 20110928

Mike Miller

Introduction Data Warehouse With BigQuery

Yatno Sudar

Technology Outlook - The new Era of computing

Swiss Big Data User Group

Big data

FACTS Computer Software L.L.C

Hadoop Tutorial

Ujjwal Gupta

The New Era of Cognitive Computing

IBM Research

Similaire à Winning with Big Data: Secrets of the Successful Data Scientist (20)

Big Data LDN 2018: HOW RANK GAMING PRODUCTIONISED & AUTOMATED THE MANAGEMENT ...

Around Data Science

Literacy in the Age of Big Data

Version Control in AI/Machine Learning by Datmo

Big Data Maturity and its Evolution

Gluecon miller horizon

Around Data Science (v. 2021 ITA)

Satyam open analytics nyc

Big data - Aditya Yadav

Big data myths busted

big data

BigData Analytics

HadoopWorkshopJuly2014

Big Data Analytics

Horizon 20110928

Introduction Data Warehouse With BigQuery

Technology Outlook - The new Era of computing

Big data

Hadoop Tutorial

The New Era of Cognitive Computing

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Martijn de Jong

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

DBX First Quarter 2024 Investor Presentation

Dropbox

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

ICT role in 21st century education and its challenges

rafiqahmad00786416

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

apidays

Manulife - Insurer Transformation Award 2024

The Digital Insurer

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

MINDCTI Revenue Release Quarter One 2024

MIND CTI

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

DBX First Quarter 2024 Investor Presentation

A Year of the Servo Reboot: Where Are We Now?

presentation ICT roal in 21st century education

Strategies for Landing an Oracle DBA Job as a Fresher

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

AXA XL - Insurer Innovation Award Americas 2024

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Powerful Google developer tools for immediate impact! (2023-24 C)

ICT role in 21st century education and its challenges

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Manulife - Insurer Transformation Award 2024

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

AWS Community Day CPH - Three problems of Terraform

MINDCTI Revenue Release Quarter One 2024

MS Copilot expands with MS Graph connectors

Winning with Big Data: Secrets of the Successful Data Scientist

1. WINNING WITH BIG DATA Secrets of the Successful Data Scientist Making Data Work June 9, 2010 Michael Driscoll @dataspora

2. WHY DATA MATTERS

3. THE INDUSTRIAL AGE OF DATA

4. WHAT IS BIG DATA? Data that is distributed.

5. WHAT IS DATA SCIENCE?

6. NINE WAYS TO WIN

7. 1. CHOOSE THE RIGHT TOOL You don’t need a chainsaw to cut butter.

8. 2. COMPRESS EVERYTHING mysqldump -u myuser -p mypasssourceDB| br />gzip| sshmike@dataspora.com "cat - | br />gunzip | mysql-u myuser -p mypasstargetDB" The world is IO-bound.

9. 3. SPLIT UP YOUR DATA Split, apply, combine. See Hadley Wickham’s paper at http://had.co.nz/plyr/plyr-intro-090510.pdf

10. 4. WORK WITH SAMPLES perl -ne "print if (rand() < 0.01)" data.csv > sample.csv Big Data is heavy, samples are light.

11. 5. USE STATISTICS

12. COPY FROM OTHERS git clone git://github.com/kevinweil/hadoop-lzo Use open source.

13. 7. ESCAPE CHART TYPOLOGIES Charts are compositions, not containers.

14. 8. USE COLOR WISELY Color can enhance or insult.

15. 9. TELL A STORY People are listening.

16. ONE SUCCESS STORY

17. WHY DO TELCO CUSTOMERS LEAVE? Sign up Leave Goal: “less churn.”

18. DATA: BILLIONS OF CALLS … and millions of callers.

19. DOES CALL QUALITY MATTER? … a difference, but not significant.

20. WHAT ABOUT SOCIAL NETWORKS? Hmmm...

21. BUILD THE CALL GRAPH … but is it predictive?

22. EVOLUTION OF A CALL GRAPH April

23. EVOLUTION OF A CALL GRAPH May

24. EVOLUTION OF A CALL GRAPH June

25. EVOLUTION OF A CALL GRAPH July

26. 700% INCREASE IN CHURN when a cancellation occurs in a call network.

27. THANKS! QUESTIONS? Michael Driscoll twitter @dataspora http://www.dataspora.com/blog Making Data Work June 9, 2010

Notes de l'éditeur

If you had to put your finger on the beginning of the information age, it might be the creation of the first telegraph in 1792, in France, by a pair of brothers.The first time that man-made information began at the speed of light, over long distances.Cars, cash registers, subway turnstyles, gene chips, TiVos, and cell phones are streaming billions of data points.Prof. Joe Hellerstein of Berkeley has dubbed it “The Industrial Revolution of Data” – where machines, not people, are the dominant producers of data.
In this talk I’m also going to be talking about tools for medium data; b/c these translate well into the Big Data space.
In this talk I’m also going to be talking about tools for medium data; b/c these translate well into the Big Data space.I’m defining data Science is: applying tools to data to answer questions. It is at the intersection of these tools. And it is a growing field, because data is getting bigger, and our tools are getting better. (Suffice to say, the questions we ask have been around since time immemorial: whoAnother word for questions is hypotheses.
Do you really need Hadoop for that job? Think twice about it.Can you do everything on one machine?Escalate only as necessary… don’t solve problems that don’t yet exist.At the same time, optimize for scalability, not performance. Cleverness is usually punished in the long run.
Compressing gives you a 6-8x bump immediately in network and disk IO, out of the gate.This example also illustrates another piece: avoid hitting disk at all costs.If you’re working on the cloud,
This is the essence of parallelism: find some independent dimension on which to split your data.* Even your data isn’t in a database, split it up the old-fashioned way – one file per hour, day, or month, depending on its size – these often form natural samples to work from.* Learn & understand how to partition, shard, or otherwise distribute your data in a database.* Parallel load is your friend: Several databases have parallel load features; Hadoop has distcp.
do you want to moving GBs and TBs around?sometimes you want to visualize and work on the data locally…so sample!* reservoir sampling is a fixed-memory algorithm for achieving a defined-sized sample* the above illustrates how to get a basic 1% uniform sample method in a perl one-liner
When we compare two real-valued measures, they will almost always be different.The critical question is: How confident are we in the difference? Is it significant?
Don’t reinvent the wheel, steal someone else’s wheels of 1s and 0s.Statistics is hard – so go ahead & use someone else’s stuff. Go ahead. It’s there. That what’s great about R. 2000 statistical libraries written by professors.
Not machines, people.
Okay, now I want you to try and forget everything you just heard about base graphics.ggplot2 is a new visualization package formally released in 2009, developed by Professor Hadley Wickham.It is a based a different perspective of developing graphics, and has its own set of functions and parameters.
Most telcos lose 1-2% of their customers every month.It’s 7x more expensive to acquire a customer, than to retain.
Not machines, people.
This illustrates what we said earlier: statistics matters. We needed to rule this out.(If anything the correlation occurs opposite of what we expected).
“A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009.Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package.I’ll also discuss some methods for visualizing large data sets.I’ll end with an overview of Rapache, a tool for embedding R in web applications.For questions beyond this talk, I can be contacted at:Michael E Driscollhttp://www.dataspora.commike@dataspora.com.
Windowing functions in Greenplum, which is a modified Postgres distributed database.
“A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009.Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package.I’ll also discuss some methods for visualizing large data sets.I’ll end with an overview of Rapache, a tool for embedding R in web applications.For questions beyond this talk, I can be contacted at:Michael E Driscollhttp://www.dataspora.commike@dataspora.com.
“A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009.Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package.I’ll also discuss some methods for visualizing large data sets.I’ll end with an overview of Rapache, a tool for embedding R in web applications.For questions beyond this talk, I can be contacted at:Michael E Driscollhttp://www.dataspora.commike@dataspora.com.
“A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009.Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package.I’ll also discuss some methods for visualizing large data sets.I’ll end with an overview of Rapache, a tool for embedding R in web applications.For questions beyond this talk, I can be contacted at:Michael E Driscollhttp://www.dataspora.commike@dataspora.com.
“A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009.Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package.I’ll also discuss some methods for visualizing large data sets.I’ll end with an overview of Rapache, a tool for embedding R in web applications.For questions beyond this talk, I can be contacted at:Michael E Driscollhttp://www.dataspora.commike@dataspora.com.
“A Survey of R Graphics” – presented to the LA R Users Group, June 18, 2009.Today I’m going to go through a survey of data visualization functions and packages in R. In particular, I’ll discuss three approaches for data visualization in R: (i) the built-in base graphics functions, (ii) the ggplot2 package, and (iii) the lattice package.I’ll also discuss some methods for visualizing large data sets.I’ll end with an overview of Rapache, a tool for embedding R in web applications.For questions beyond this talk, I can be contacted at:Michael E Driscollhttp://www.dataspora.commike@dataspora.com.

Winning with Big Data: Secrets of the Successful Data Scientist

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (10)

Similaire à Winning with Big Data: Secrets of the Successful Data Scientist

Similaire à Winning with Big Data: Secrets of the Successful Data Scientist (20)

Dernier

Dernier (20)

Winning with Big Data: Secrets of the Successful Data Scientist

Notes de l'éditeur