SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Thought leadership whitepaper
Cloud data services Big data and Analytics
Introducing Notebooks:
A power tool for data scientists
2 Introducing Notebooks: A power tool for data scientists
Fast, flexible and collaborative data
exploration and analysis
Data exploration and analysis is a repetitive, iterative process, but
in order to meet business demands, data scientists do not always
have the luxury of long development cycles. What if data scientists
could answer bigger and tougher questions faster? What if they
could more easily and rapidly experiment, test hypotheses and
work more collaboratively on interactive analytics?
the business’s high expectations. For both the data scientist and
business stakeholder, wrong turns and dead ends are often the
rule, not the exception. The trick is to be able to quickly spot
errors, change course, and find the right path to the conclusions
and insights that could be game changers for the business.
Rapid, iterative development and
collaboration using Notebooks
Consider a common scenario: a company’s vice president of
sales notices a sharp increase in the sale of replacement parts
for one of its most popular machines. She enlists the help of a
data scientist to engage in dynamic data exploration to uncover
the source of the anomaly. The data scientist first explores the
data by extracting and shaping it, the ultimate goal being
discovery of patterns and trends that can guide the business’s
future sales strategy and forecasts for the particular replace-
ment part in question. This requires the data scientist to
understand the business and be able to communicate facts,
insights, patterns, and problems regarding the data in an easy,
understandable way. This may include the delivery of quick,
visual data models for business stakeholders to view and confer
with the data scientist to make sure their investigation is on the
right path. Unfortunately, many existing tools are not conducive
to this “quick and dirty” approach and require switching back
and forth between tools for extracting, shaping, and visualizing
data, which is time consuming and does not always allow the
business to respond to events in a timely fashion.
Data science — necessary for digital
business transformation and survival
Leading companies are disrupting markets using data
science — and they are doing it in agile and fast-moving
environments. The pressure to be this responsive gets passed to
experts with deep analytical skills, most notably the data scientist.
The data scientist must grapple with difficult constraints and
revise cumbersome development processes in order to meet
Big data and Analytics
Figure1: View of the Jupyter Notebook interface.
What does a Notebook look like?
4 Introducing Notebooks: A power tool for data scientists
Figure 2: Easily load data to see and analyze that data in a flexible and iterative analysis environment.
Big data and Analytics
Open source analytic notebooks —
 the game-changer for data scientists
What, exactly are analytic notebooks? The first answers that
often come to mind are a laptop computer or pad of paper,
which do not exactly conjure up images of speedy, advanced
analysis. In fact, a notebook is a web application that contains
all the code, text, and visualizations for a particular data-
intensive project.
These documents can also easily be shared with collaborators
and stakeholders. They are primarily used by data scientists for
data diagnosis, simulation, statistical modeling, and machine
learning. Use of a notebook speeds up the process of trying out
data models and frameworks and testing hypotheses, enabling
the data science team and their business counterparts to work
quickly, iteratively and collaboratively.
Just like a paper notebook that can be used to jot down
methodologies, results, and quick graphical representations,
a notebook serves as a single working document for data
science teams and is very trial-and-error friendly. The code a
data scientist is working with is exposed in the notebook so
that he or she can refer back to previous steps and track
progress through each iteration. Code can be hidden or
removed when sharing a notebook with a business stakeholder
who only needs to look at data visualizations.
Jupyter Notebooks, an evolution of IPython notebooks, are one
of the most established and widely adopted open source data
science tools available today. Instead of having to work on the
command line, the data scientist can perform an analysis, generate
visualizations, and add media and annotations within a single
notebook artifact. Notebooks are also shareable, which means the
data scientist can work collaboratively with others in the business
and are themselves an invaluable learning tool. Data scientists can
annotate the analyses for future reference and keep track of each
step in the data discovery process so that the notebook becomes a
source of project communication and documentation, as well as a
learning resource for other data scientists in the company.
Let’s look at three key areas in which Jupyter Notebooks are
being used by data scientists to dramatically improve data
exploration and discovery.
Jupyter Notebooks, an evolution of IPython
notebooks, are one of the most established and
widely adopted open source data science tools
available today.
6 Introducing Notebooks: A power tool for data scientists
Navigate the learning curve
Like any new tool, there is a learning curve to getting started
with notebooks. Fortunately, there is a vast user community
that data scientists can tap into to make the process easier.
With over 150,000 publically available Jupyter Notebooks on
GitHub and access to an extensive network of experienced
notebook users, there’s no need for data scientists just getting
started with notebooks to go at it alone or build everything
from scratch. This level of support reduces the need for data
scientists to solve technical problems, freeing up more time to
manipulate data, test hypotheses and more quickly reach
conclusions and actionable insights.
Rapidly iterate
The data scientist spends a significant amount of their time
getting data ready for analysis. This is not a job for the faint
of heart. Data coming from multiple sources is not consistently
formatted. Data is not always complete. It’s not unusual for
data scientists to find themselves drowning in huge volumes of
dirty data. While notebooks do not eliminate the need for data
cleaning, they are very valuable in enabling the data scientist to
quickly see where the problems lie so that they can work
“smarter” by focusing on the right areas as they transform data
for further analysis. As we mentioned earlier, businesses need
insights quickly.
Notebooks support the data scientist’s need to iterate along
with the business’s need for speed.
In a notebook, an analysis is performed in a sequence of code
cells embedded within the document. These code cells can be
used to build upon each other in stages, neatly containing
different operations that the data scientist is performing on the
data. Should the data scientist want to change the experiment
and modify a particular stage of the analysis, they easily make
those changes and rerun the appropriate cells, without
necessarily needing to rework the entire analysis. And because
notebooks can also produce visualizations, which often make
it easier to spot and communicate a discovery, the data scientist
can quickly change their experiment and see updated
visualizations without the need to employ another tool.
Collaborate
Data scientists are often expected to work with employees across
the company in IT, in engineering, on the business side, and with
others in the data science ecosystem like engineers, developers,
and analysts. The ability to share information and work
collaboratively is not optional. It’s also important to note that
data science is not necessarily centralized within a company; it
often lives at the departmental or business unit level. This means
that data analysis, exploration, and visualization can be taking
place in different areas of the company that is not being shared.
As a result, the business does not realize the full benefit of its data
science efforts.
The ability to share notebooks and collaborate on projects is
critical when working on a team with di- verse skill sets. Data
scientists possess considerable math, statistical, and computer
science skills, but this is not true of everyone across the IT and
business organizations. Additionally, data scientists may wish to
confer with peers. One example of this is when data scientists
use notebooks in collaboration with developers. As developers
work hand in hand with data scientists and become familiar
with analytical models, they will gain a better understanding
of what the data scientist needs and adjust their data pipelines
accordingly. All of these capabilities — analysis of big data,
multi-language support and collaboration — enable the data
scientist to work through iterations faster and move on to
the next project. By working collaboratively on a shareable
platform, the data exploration project — and consequently, the
business — realizes the full benefit of everyone’s strengths and
skill sets.
Big data and Analytics
Conclusion
Open source notebooks like Jupyter provide an environment
in which data science can work collaboratively with multiple
stakeholders, in rapid iterations, all while managing the
learning curve. Notebooks connected to a managed big data
service give data scientists power, scale, and time to focus on
the last mile. In being so empowered, the data scientist forges
a path and leads the business through formerly uncharted
territory to operate predictively and proactively.
To learn more about notebooks and how to get started
with them today, visit:
Learn more about the IBM Data Science Experience
Get started with notebooks in the Data Science
Experience
The future of notebooks and the promise
for data scientists
Notebooks are a powerful data science tool and can be made
even more powerful when embedded with big data technology
platforms like Apache Spark, a lightning-fast in-memory
cluster computing engine for large-scale data processing. A
notebook connected to a big data engine like Apache Spark
can produce output up to one hundred times faster than in a
traditional distributed architecture, further speeding iterative
exploration. Jupyter Notebooks are dependent on an IPython
kernel, which is automatically included with the Notebook;
however, there are currently over 65 kernels that can be
installed with Jupyter Notebooks to support over 40 languages,
and the list of available kernels continues to grow. Because the
most popular analytic notebooks are open source, they are
available in a variety of languages that support the most
common and burgeoning languages in data science, such as
Python, R, and Scala. Notebooks may also be shared and used
as APIs that connect to programs and apps around a company
or organization.
© Copyright IBM Corporation 2016
IBM Corporation
Route 100
Somers, NY 10589
Produced in the United States of America
June 2016
IBM, the IBM logo and ibm.com are trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product
and service names might be trademarks of IBM or other companies. A
current list of IBM trademarks is available on the Web at “Copyright and
trademark information” at www.ibm.com/legal/copytrade.shtml.
This document is current as of the initial date of publication and may be
changed by IBM at any time. Not all offerings are available in every country
in which IBM operates.
THE INFORMATION IN THIS DOCUMENT IS PROVIDED
“AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED,
INCLUDING WITHOUT ANY WARRANTIES OF MERCHANT­
ABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY
WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM
products are warranted according to the terms and conditions of the
agreements under which they are provided.
Please Recycle
CDW12358USEN-00

Contenu connexe

Tendances

The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
mark madsen
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project Managers
Tze-Yiu Yong
 

Tendances (20)

Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Make compliance fulfillment count double
Make compliance fulfillment count doubleMake compliance fulfillment count double
Make compliance fulfillment count double
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Unit 2
Unit 2Unit 2
Unit 2
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
 
Agile Data
Agile DataAgile Data
Agile Data
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Datascienceindia article
Datascienceindia articleDatascienceindia article
Datascienceindia article
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
Data Analytics
Data AnalyticsData Analytics
Data Analytics
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Better Architecture for Data: Adaptable, Scalable, and Smart
Better Architecture for Data: Adaptable, Scalable, and SmartBetter Architecture for Data: Adaptable, Scalable, and Smart
Better Architecture for Data: Adaptable, Scalable, and Smart
 
Data science
Data scienceData science
Data science
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project Managers
 

En vedette (17)

Imnswp 16038 analytics manifesto_final
Imnswp 16038 analytics manifesto_finalImnswp 16038 analytics manifesto_final
Imnswp 16038 analytics manifesto_final
 
Newspapers
NewspapersNewspapers
Newspapers
 
i made dis fr u
i made dis fr ui made dis fr u
i made dis fr u
 
Mensajes subliminales
Mensajes subliminales Mensajes subliminales
Mensajes subliminales
 
Famous Democrats and Nations Established a Prosper Society in the History
Famous Democrats and Nations Established a Prosper Society in the History Famous Democrats and Nations Established a Prosper Society in the History
Famous Democrats and Nations Established a Prosper Society in the History
 
Tv commercials
Tv commercialsTv commercials
Tv commercials
 
Props for media
Props for mediaProps for media
Props for media
 
Question 2 & 3 of evaluation
Question 2 & 3 of evaluation Question 2 & 3 of evaluation
Question 2 & 3 of evaluation
 
Interactive media
Interactive mediaInteractive media
Interactive media
 
Photographic imaging
Photographic imagingPhotographic imaging
Photographic imaging
 
Advertising and marketing
Advertising and marketingAdvertising and marketing
Advertising and marketing
 
Potential Genre 2
Potential Genre 2Potential Genre 2
Potential Genre 2
 
Ibm test data_management_v0.4
Ibm test data_management_v0.4Ibm test data_management_v0.4
Ibm test data_management_v0.4
 
Potential Genre
Potential GenrePotential Genre
Potential Genre
 
Commercial radio
Commercial radioCommercial radio
Commercial radio
 
Representation of Youth
Representation of YouthRepresentation of Youth
Representation of Youth
 
Koncepti i sw drejtws publike
Koncepti i sw drejtws publikeKoncepti i sw drejtws publike
Koncepti i sw drejtws publike
 

Similaire à Notebooks in IBM

Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
MuhammadTahiriqbal13
 

Similaire à Notebooks in IBM (20)

Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Top 10 areas of expertise in data science
Top 10 areas of expertise in data scienceTop 10 areas of expertise in data science
Top 10 areas of expertise in data science
 
Mighty Guides- Data Disruption
Mighty Guides- Data DisruptionMighty Guides- Data Disruption
Mighty Guides- Data Disruption
 
Research Methodology (how to choose Datasets ).pptx
Research Methodology (how to choose Datasets ).pptxResearch Methodology (how to choose Datasets ).pptx
Research Methodology (how to choose Datasets ).pptx
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptxDATASCIENCE vs BUSINESS INTELLIGENCE.pptx
DATASCIENCE vs BUSINESS INTELLIGENCE.pptx
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
What is data science ?
What is data science ?What is data science ?
What is data science ?
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
 

Dernier

Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 

Dernier (20)

Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft DatingDubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
Dubai Call Girls Milky O525547819 Call Girls Dubai Soft Dating
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 

Notebooks in IBM

  • 1. Thought leadership whitepaper Cloud data services Big data and Analytics Introducing Notebooks: A power tool for data scientists
  • 2. 2 Introducing Notebooks: A power tool for data scientists Fast, flexible and collaborative data exploration and analysis Data exploration and analysis is a repetitive, iterative process, but in order to meet business demands, data scientists do not always have the luxury of long development cycles. What if data scientists could answer bigger and tougher questions faster? What if they could more easily and rapidly experiment, test hypotheses and work more collaboratively on interactive analytics? the business’s high expectations. For both the data scientist and business stakeholder, wrong turns and dead ends are often the rule, not the exception. The trick is to be able to quickly spot errors, change course, and find the right path to the conclusions and insights that could be game changers for the business. Rapid, iterative development and collaboration using Notebooks Consider a common scenario: a company’s vice president of sales notices a sharp increase in the sale of replacement parts for one of its most popular machines. She enlists the help of a data scientist to engage in dynamic data exploration to uncover the source of the anomaly. The data scientist first explores the data by extracting and shaping it, the ultimate goal being discovery of patterns and trends that can guide the business’s future sales strategy and forecasts for the particular replace- ment part in question. This requires the data scientist to understand the business and be able to communicate facts, insights, patterns, and problems regarding the data in an easy, understandable way. This may include the delivery of quick, visual data models for business stakeholders to view and confer with the data scientist to make sure their investigation is on the right path. Unfortunately, many existing tools are not conducive to this “quick and dirty” approach and require switching back and forth between tools for extracting, shaping, and visualizing data, which is time consuming and does not always allow the business to respond to events in a timely fashion. Data science — necessary for digital business transformation and survival Leading companies are disrupting markets using data science — and they are doing it in agile and fast-moving environments. The pressure to be this responsive gets passed to experts with deep analytical skills, most notably the data scientist. The data scientist must grapple with difficult constraints and revise cumbersome development processes in order to meet
  • 3. Big data and Analytics Figure1: View of the Jupyter Notebook interface. What does a Notebook look like?
  • 4. 4 Introducing Notebooks: A power tool for data scientists Figure 2: Easily load data to see and analyze that data in a flexible and iterative analysis environment.
  • 5. Big data and Analytics Open source analytic notebooks —  the game-changer for data scientists What, exactly are analytic notebooks? The first answers that often come to mind are a laptop computer or pad of paper, which do not exactly conjure up images of speedy, advanced analysis. In fact, a notebook is a web application that contains all the code, text, and visualizations for a particular data- intensive project. These documents can also easily be shared with collaborators and stakeholders. They are primarily used by data scientists for data diagnosis, simulation, statistical modeling, and machine learning. Use of a notebook speeds up the process of trying out data models and frameworks and testing hypotheses, enabling the data science team and their business counterparts to work quickly, iteratively and collaboratively. Just like a paper notebook that can be used to jot down methodologies, results, and quick graphical representations, a notebook serves as a single working document for data science teams and is very trial-and-error friendly. The code a data scientist is working with is exposed in the notebook so that he or she can refer back to previous steps and track progress through each iteration. Code can be hidden or removed when sharing a notebook with a business stakeholder who only needs to look at data visualizations. Jupyter Notebooks, an evolution of IPython notebooks, are one of the most established and widely adopted open source data science tools available today. Instead of having to work on the command line, the data scientist can perform an analysis, generate visualizations, and add media and annotations within a single notebook artifact. Notebooks are also shareable, which means the data scientist can work collaboratively with others in the business and are themselves an invaluable learning tool. Data scientists can annotate the analyses for future reference and keep track of each step in the data discovery process so that the notebook becomes a source of project communication and documentation, as well as a learning resource for other data scientists in the company. Let’s look at three key areas in which Jupyter Notebooks are being used by data scientists to dramatically improve data exploration and discovery. Jupyter Notebooks, an evolution of IPython notebooks, are one of the most established and widely adopted open source data science tools available today.
  • 6. 6 Introducing Notebooks: A power tool for data scientists Navigate the learning curve Like any new tool, there is a learning curve to getting started with notebooks. Fortunately, there is a vast user community that data scientists can tap into to make the process easier. With over 150,000 publically available Jupyter Notebooks on GitHub and access to an extensive network of experienced notebook users, there’s no need for data scientists just getting started with notebooks to go at it alone or build everything from scratch. This level of support reduces the need for data scientists to solve technical problems, freeing up more time to manipulate data, test hypotheses and more quickly reach conclusions and actionable insights. Rapidly iterate The data scientist spends a significant amount of their time getting data ready for analysis. This is not a job for the faint of heart. Data coming from multiple sources is not consistently formatted. Data is not always complete. It’s not unusual for data scientists to find themselves drowning in huge volumes of dirty data. While notebooks do not eliminate the need for data cleaning, they are very valuable in enabling the data scientist to quickly see where the problems lie so that they can work “smarter” by focusing on the right areas as they transform data for further analysis. As we mentioned earlier, businesses need insights quickly. Notebooks support the data scientist’s need to iterate along with the business’s need for speed. In a notebook, an analysis is performed in a sequence of code cells embedded within the document. These code cells can be used to build upon each other in stages, neatly containing different operations that the data scientist is performing on the data. Should the data scientist want to change the experiment and modify a particular stage of the analysis, they easily make those changes and rerun the appropriate cells, without necessarily needing to rework the entire analysis. And because notebooks can also produce visualizations, which often make it easier to spot and communicate a discovery, the data scientist can quickly change their experiment and see updated visualizations without the need to employ another tool. Collaborate Data scientists are often expected to work with employees across the company in IT, in engineering, on the business side, and with others in the data science ecosystem like engineers, developers, and analysts. The ability to share information and work collaboratively is not optional. It’s also important to note that data science is not necessarily centralized within a company; it often lives at the departmental or business unit level. This means that data analysis, exploration, and visualization can be taking place in different areas of the company that is not being shared. As a result, the business does not realize the full benefit of its data science efforts. The ability to share notebooks and collaborate on projects is critical when working on a team with di- verse skill sets. Data scientists possess considerable math, statistical, and computer science skills, but this is not true of everyone across the IT and business organizations. Additionally, data scientists may wish to confer with peers. One example of this is when data scientists use notebooks in collaboration with developers. As developers work hand in hand with data scientists and become familiar with analytical models, they will gain a better understanding of what the data scientist needs and adjust their data pipelines accordingly. All of these capabilities — analysis of big data, multi-language support and collaboration — enable the data scientist to work through iterations faster and move on to the next project. By working collaboratively on a shareable platform, the data exploration project — and consequently, the business — realizes the full benefit of everyone’s strengths and skill sets.
  • 7. Big data and Analytics Conclusion Open source notebooks like Jupyter provide an environment in which data science can work collaboratively with multiple stakeholders, in rapid iterations, all while managing the learning curve. Notebooks connected to a managed big data service give data scientists power, scale, and time to focus on the last mile. In being so empowered, the data scientist forges a path and leads the business through formerly uncharted territory to operate predictively and proactively. To learn more about notebooks and how to get started with them today, visit: Learn more about the IBM Data Science Experience Get started with notebooks in the Data Science Experience The future of notebooks and the promise for data scientists Notebooks are a powerful data science tool and can be made even more powerful when embedded with big data technology platforms like Apache Spark, a lightning-fast in-memory cluster computing engine for large-scale data processing. A notebook connected to a big data engine like Apache Spark can produce output up to one hundred times faster than in a traditional distributed architecture, further speeding iterative exploration. Jupyter Notebooks are dependent on an IPython kernel, which is automatically included with the Notebook; however, there are currently over 65 kernels that can be installed with Jupyter Notebooks to support over 40 languages, and the list of available kernels continues to grow. Because the most popular analytic notebooks are open source, they are available in a variety of languages that support the most common and burgeoning languages in data science, such as Python, R, and Scala. Notebooks may also be shared and used as APIs that connect to programs and apps around a company or organization.
  • 8. © Copyright IBM Corporation 2016 IBM Corporation Route 100 Somers, NY 10589 Produced in the United States of America June 2016 IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANT­ ABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Please Recycle CDW12358USEN-00