SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
Awalkthroughthemazeofunderstanding
“DATAVISUALIZATION”
Analytics in Every Domain
"In God We Trust…All Other's, Bring Data,"
Deming
1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
3
AGENDA
What is EDA?
• Exploratory data analysis is a
data analysis approach to reveal the
important characteristics of a dataset,
mainly through visualization.
• Get to know your data!
• Distributions (symmetric, normal, skewed)
• Data quality problems
• Outliers
• Correlations and inter-relationships
• Functional relationships
• Derived attributes, keys such as Primary,
Foreign keys,
• Static attributes, dynamic attributes etc
Get a good look and feel of the Data.
• Always check your datasets
• Mean
• Medians
• Quantiles
• Histograms
• Boxplots
• Scatter Diagrams
Consider looking at every attribute - you will understand
what it represents!
Visualization beforeAnalysis
(Anscombe’s Quartet)
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
For all the Datasets
Property Value Accuracy
Mean of x 9 exact
Sample variance of x 11 exact
Mean of y 7.50 to 2 decimal places
Sample variance of y 4.125 plus/minus 0.003
Correlation between x and y 0.816 to 3 decimal places
Linear regression line y = 3.00 + 0.500x
to 2 and 3 decimal places,
respectively
Coefficient of
determination of the linear
regression
0.67 to 2 decimal places
• The first scatter plot (top left) appears to be a simple linear relationship, corresponding to
two variables correlated and following the assumption of normality.
• The second graph (top right) is not distributed normally; while a relationship between the two
variables is obvious, it is not linear, and the Pearson correlation coefficient is not relevant. A
more general regression and the corresponding coefficient of determination would be more
appropriate.
• In the third graph (bottom left), the distribution is linear, but should have a different regression
line (a robust regression would have been called for). The calculated regression is offset by the
one outlier which exerts enough influence to lower the correlation coefficient from 1 to 0.816.
• Finally, the fourth graph (bottom right) shows an example when one outlier is enough to produce
a high correlation coefficient, even though the other data points do not indicate any relationship
between the variables.
Get a general sense of the data
• Make sure your first visualization is - Data-driven (model-free)
• Think interactive and visual
• Humans are the best pattern recognizers
• Use as many dimensions as your data will permit 2, 3
• x,y,z, space, color, time….
• Visualization is useful in early stages of data mining
• detect outliers (e.g. assess data quality)
• test assumptions (e.g. normal distributions or skewed?)
• identify useful raw data & transforms (e.g. log(x))
Take Away: it is always well worth looking at your data!
1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
11
AGENDA
FUNDAMENTALSOF EFFECTIVE DATAVISUALIZATION
Data Quality Issues
A good understanding of
Statistical Theories
How to move volumes of Data
Should I use Machine Learning?
12
DATA QUALITY ISSUES
13
Duplicates
Incomplete
Data
Too Much
Data
Inconsistent
Data
And more…
14
Poor
Organization
of Data
Incorrect
Data
Poorly
Defined
Data
Poor Data
Security
A GOOD UNDERSTANDINGOF STATISTICALTHEORIES
15
A GOOD UNDERSTANDINGOF STATISTICALTHEORIES
16
VOLUMES OF DATA
Introduction to Information Visualization - Fall 2013
*Adapted from The ParaView
Tutorial, Moreland
Visualization: Converting raw data to a graphics that is
understandable to people
Information visualization
Where data does not
have a well-defined
representation in 2D
or 3D space.
Data is abstract
Network Visualization
Introduction to Information Visualization - Fall 2013
Geo Data Visualization
https://archive.nytimes.com/www.nytimes.com/interactive/2009/03/10/us/20090310-immigration-explorer.htmlntroduction to Information Visualization - Fall 2013
⚫ Demo
HEATMAPVISUALIZATION
• A heatmap is a two-dimensional
graphical representation of data
where the individual values that
are contained in a matrix are
represented as colors.
• The seaborn python package
allows the creation of
annotated heatmaps which can
be tweaked
using Matplotlib tools as per the
creator's requirement.
22
1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
23
AGENDA
Tools for DataVisualization
- Knime
- Python Libraries
- R Libraries
- Google Data Studio
- D3.js
24
KNIME DATAVISUALIZATIONTOOLS
• KNIME Analytics Platform provides many nodes for data visualization,
including scatter plots, pie charts, box plots, histograms as well as tag
clouds and visualizations of networks.
Data Visualization Nodes
• KNIME has a number of native visualization dedicated nodes.
• Hiliting
• Geo-location
• R Choropleths
25
KNIME FEATURES
KNIME uses modular workflow approach, which documents and
stores the analysis process in the exact same order it was conceived
and implemented. All results in the workflow are instantly available
for review by the user, aiding debugging at every stage in the
workflow
Core KNIME features include:
• Scalability through sophisticated data handling
(intelligent automatic caching of data in the background
while maximizing throughput performance)
• Highly and easily extensible via a well-defined API for
plugin extensions
• Intuitive user interface
• Import/export of workflows (for exchanging with other
KNIME users)
• Parallel execution on multi-core systems
• Command line version for "headless" batch executions
26
KNIME FUNCTIONALITIES
Available KNIME modules cover a vast range of functionality,
such as:
• I/O: retrieves data from files or data bases
• Data Manipulation: pre-processes your input data with
filtering, group-by, pivoting, binning, normalization,
aggregation, joining, sampling, partitioning, etc.
• Views: inspects the data and results with several
interactive views, supporting interactive data exploration
• Hiliting: ensures hillite data points in one view are also
immediately hillite in all other views
• Mining: uses state-of-the-art data mining algorithms like
clustering, rule induction, decision tree, association rules,
naïve bayes, neural networks, support vector machines,
etc. to better understand your data
27
Tools for DataVisualization
Major Python Libraries
- Matplotlib
- Seaborn
- Ggplot
- Bokeh
- Plotly
- Pygal
- Altair
- Geoplotlib
Etc..
28
MATPLOTLIB – 2 D Graphics
29
Simple and powerful visualizations can be generated
using the Matplotlib Python Library.
It is the most widely-used library for plotting in the
Python community.
Libraries like pandas are “wrappers” over Matplotlib
allowing access to a number of Matplotlib’s methods
with less code.
The versatility of Matplotlib can be used to make many
visualization types:-
•Scatter plots
•Bar charts and Histograms
•Line plots
•Pie charts
•Stem plots
•Contour plots etc
SEABORN
• Seaborn is a popular data
visualization library that is built
on top of Matplotlib.
• Seaborn’s default styles and
color palettes are much more
sophisticated than Matplotlib.
• Seaborn is a higher-level library,
meaning it’s easier to generate
certain kinds of plots, including
heat maps, time series, and
violin plots.
30
ggplot
• Ggplot is a python visualization library
based on R’s ggplot2 and the Grammar of
Graphics.
• Ggplot operates differently compared to
Matplotlib: it lets users layer components
to create a full plot.
• The Grammar of Graphics has been hailed
as an “intuitive” method for plotting,
though, seasoned Matplotlib users might
need time to adjust to this new mindset.
31
Bokeh
https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery
• Bokeh is native to Python, not ported over from R, unlike ggplot. Bokeh, like
ggplot, is also based on The Grammar of Graphics.
• It also supports streaming, and real-time data and its unique selling proposition
is its ability to create interactive, web-ready plots, which can easily output as
JSON objects, HTML documents, or interactive web applications.
• Bokeh has three interfaces with varying degrees of control to accommodate
different types of users.
• The topmost level is for creating charts quickly. It includes methods for creating
common charts such as bar plots, box plots, and histograms.
• The middle level allows the user to control the basic building blocks of each chart (for
example, the dots in a scatter plot) and has the same specificity as Matplotlib.
• The bottom level is geared toward developers and software engineers. It has no pre-
set defaults and requires the user to define every element of the chart.
32
https://demo.bokehplots.com/apps/crossfilter https://realpython.com/python-data-visualization-bokeh/
PLOTLY
• Plotly is widely known as an online platform for
data visualization.
• It can be accessed from a Python notebook.
• Like Bokeh, Plotly’s strength lies in making
interactive plots, and it offers some charts not
found in most libraries, like contour plots.
• Can also be used by people with no technical
background for creating interactive plots by
uploading the data and using plotly GUI.
• Plotly is compatible with ggplots in R and Python.
• It allows to embed interactive plots in projects or
websites using iframes or html.
33
https://plot.ly/python/line-and-scatter/ https://plot.ly/feed/?q=plottype:choropleth
PYGAL
• Offers interactive plots that can be
embedded in a web browser. The ability
to output charts as SVGs, is its prime
differentiator. For work involving smaller
datasets, SVGs will do just fine. However,
for charts with hundreds of thousands of
data points, they become sluggish and
have trouble rendering.
It’s easy to create a nice-looking chart
with just a few lines of code since each
chart type is packaged into a method and
the built-in styles are pretty.
34
ALTAIR
35
Altair is a declarative statistical
visualization python library based
on Vega-lite.
Declarative means you only need to
mention the links between data columns
to the encoding channels, such as x-axis,
y-axis, color, etc. and the rest of the
plotting details are handled automatically.
Being declarative makes Altair simple,
friendly and consistent. It is easy to
design effective and beautiful
visualizations with a minimal amount of
code using Altair.
Geoplotlib
• It is a toolbox used for plotting
geographical data and map creation.
• It can be used to create a variety of map-
types, like choropleths, heatmaps, and dot
density maps.
• It provides a set of in-built tools for the
most common tasks such as density
visualization, spatial graphs, and shape
files.
• Simply said Geoplotlib is a Python library
dedicated to visualization of maps
36
Major RVisual Libraries
37
• Plotly - Plotly's R graphing library makes interactive, publication-quality
graphs online. Can be used to make line plots, scatter plots, area
charts, bar charts, error bars, box plots, histograms, heatmaps,
subplots, multiple-axes, and 3D (WebGL based) charts.
• Ggplot2 - The ggplot2 package lets you make beautiful and
customizable plots of your data. It implements the grammar of
graphics, an easy to use system for building plots.
• Shiny - Shiny is an R package that makes it easy to build interactive web
apps straight from R. You can host standalone apps on a webpage or
embed them in R Markdown documents or build dashboards. You can
also extend your Shiny apps with CSS themes, htmlwidgets, and
JavaScript actions.
https://shiny.rstudio.com/gallery/genome-browser.html
https://rdrr.io/snippets/http://gallery.htmlwidgets.org/ docs.ggplot2.or
GOOGLE DATA STUDIO
• Currently in beta, Google Data Studio allows you
to create branded reports
with data visualizations to share with your
clients. ... Google Data Studio is part of
theGoogle Analytics 360 Suite — the high-end
(i.e., pricey)Google Analytics Enterprise package.
• Data Studio is Google's reporting solution for
power users who want to go beyond
the data and dashboards of Google Analytics.
The data widgets in Data Studio are notable for
their variety, customization options,
live data and interactive controls (such as
column sorting and table pagination).
• You can create up to five custom reports for free
earlier – now you can create as many as required 38
https://datastudio.google.com/reporting/1Rg5y6r0640X8uo2xo
2XY48sG9IyMiYEN/page/wcCU
D3.JS
• D3.js is a JavaScript library for
manipulating documents based on data.
• D3 helps you bring data to life using
HTML, SVG, and CSS. D3’s emphasis on
web standards gives you the full
capabilities of modern browsers without
tying yourself to a proprietary framework,
combining powerful visualization
components and a data-driven approach
to DOM manipulation.
39
1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
40
AGENDA
1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
41
AGENDA
Reporting and Analysis
• Reporting is “the process of
organizing data into informational
summaries in order to monitor
how different areas of a business
are performing.”
• Analytics is “the process of
exploring data and reports in order
to extract meaningful insights,
which can be used to better
understand and improve business
performance.”
42
An Analytical Report?
An analytical report is a business report
• It uses qualitative and quantitative data
to analyze as well as evaluate a business
strategy or process.
• Empowers decision makers to make data-
driven decisions based on evidence and
analytics.
43
CREATINGTHE REPORTTHROUGH PYTHON
Sequence of Steps:
44
Collecting Metrics is easy – Generating Insights is what nails it!
Generate - Actionable insight
• Actionable Insights is a term in data analytics and big data for information that can be
acted upon or information that gives enough insight into the future that the actions
that should be taken become clear for decision makers.
• Analytics (mathematical ways of synthesizing metrics) must illuminate business
conditions, sentiment and directional changes over time.
• Insights are what humans make from analytics - once you have data and perform the
analysis, you have the knowledge to form insights and change your actions or
responses.
45
1. Exploratory Data Analysis
2. Fundamentals of Effective Data
Visualization
3. Tools for Data Visualization
4. Demo using Python, R and Knime to
create visualization
5. Creating insightful reports with
Visual tools
6. Q & A
46
AGENDA
47
This session is for education purpose and the material used in this presentation has been compiled from various free
and readily available resources, a full acknowledgement list can be furnished on request
ThankYou
Moushmi Dasgupta
contact@analyticsdomain.com
www.analyticsdomain.com

Contenu connexe

Tendances

Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
Ana Jofre
 

Tendances (20)

Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau Data visualisation & analytics with Tableau
Data visualisation & analytics with Tableau
 
Pandas
PandasPandas
Pandas
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
 
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Tableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, DisadvantagesTableau PPT Intro, Features, Advantages, Disadvantages
Tableau PPT Intro, Features, Advantages, Disadvantages
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Seaborn.pptx
Seaborn.pptxSeaborn.pptx
Seaborn.pptx
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Data visualization
Data visualizationData visualization
Data visualization
 
Tableau ppt
Tableau pptTableau ppt
Tableau ppt
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
 
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statisticspandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
 
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
DATA VISUALIZATION USING MATPLOTLIB (PYTHON)
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Presentation on data preparation with pandas
Presentation on data preparation with pandasPresentation on data preparation with pandas
Presentation on data preparation with pandas
 

Similaire à Data visualization

Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
Shalin Hai-Jew
 
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Vincenzo Patruno
 
Book Recommendations.pptx
Book Recommendations.pptxBook Recommendations.pptx
Book Recommendations.pptx
DishaSharma337110
 

Similaire à Data visualization (20)

Visual Analytics in Big Data
Visual Analytics in Big DataVisual Analytics in Big Data
Visual Analytics in Big Data
 
Big data visualization
Big data visualizationBig data visualization
Big data visualization
 
chapter 6 data visualization ppt.pptx
chapter 6 data visualization ppt.pptxchapter 6 data visualization ppt.pptx
chapter 6 data visualization ppt.pptx
 
datavisualization-5thUnit.pdf
datavisualization-5thUnit.pdfdatavisualization-5thUnit.pdf
datavisualization-5thUnit.pdf
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
UNit4.pdf
UNit4.pdfUNit4.pdf
UNit4.pdf
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 
Visualising montioring and evaluation data
Visualising montioring and evaluation dataVisualising montioring and evaluation data
Visualising montioring and evaluation data
 
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
 
Book Recommendations.pptx
Book Recommendations.pptxBook Recommendations.pptx
Book Recommendations.pptx
 
bookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdfbookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdf
 
Power BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics ResearchPower BI vs Tableau vs Cognos: A Data Analytics Research
Power BI vs Tableau vs Cognos: A Data Analytics Research
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XINagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
Nagios Conference 2013 - Andy Brist - Data Visualizations and Nagios XI
 

Dernier

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Data visualization

  • 2. "In God We Trust…All Other's, Bring Data," Deming
  • 3. 1. Exploratory Data Analysis 2. Fundamentals of Effective Data Visualization 3. Tools for Data Visualization 4. Demo using Python, R and Knime to create visualization 5. Creating insightful reports with Visual tools 6. Q & A 3 AGENDA
  • 4. What is EDA? • Exploratory data analysis is a data analysis approach to reveal the important characteristics of a dataset, mainly through visualization. • Get to know your data! • Distributions (symmetric, normal, skewed) • Data quality problems • Outliers • Correlations and inter-relationships • Functional relationships • Derived attributes, keys such as Primary, Foreign keys, • Static attributes, dynamic attributes etc
  • 5. Get a good look and feel of the Data. • Always check your datasets • Mean • Medians • Quantiles • Histograms • Boxplots • Scatter Diagrams Consider looking at every attribute - you will understand what it represents!
  • 6. Visualization beforeAnalysis (Anscombe’s Quartet) I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
  • 7. For all the Datasets Property Value Accuracy Mean of x 9 exact Sample variance of x 11 exact Mean of y 7.50 to 2 decimal places Sample variance of y 4.125 plus/minus 0.003 Correlation between x and y 0.816 to 3 decimal places Linear regression line y = 3.00 + 0.500x to 2 and 3 decimal places, respectively Coefficient of determination of the linear regression 0.67 to 2 decimal places
  • 8. • The first scatter plot (top left) appears to be a simple linear relationship, corresponding to two variables correlated and following the assumption of normality. • The second graph (top right) is not distributed normally; while a relationship between the two variables is obvious, it is not linear, and the Pearson correlation coefficient is not relevant. A more general regression and the corresponding coefficient of determination would be more appropriate.
  • 9. • In the third graph (bottom left), the distribution is linear, but should have a different regression line (a robust regression would have been called for). The calculated regression is offset by the one outlier which exerts enough influence to lower the correlation coefficient from 1 to 0.816. • Finally, the fourth graph (bottom right) shows an example when one outlier is enough to produce a high correlation coefficient, even though the other data points do not indicate any relationship between the variables.
  • 10. Get a general sense of the data • Make sure your first visualization is - Data-driven (model-free) • Think interactive and visual • Humans are the best pattern recognizers • Use as many dimensions as your data will permit 2, 3 • x,y,z, space, color, time…. • Visualization is useful in early stages of data mining • detect outliers (e.g. assess data quality) • test assumptions (e.g. normal distributions or skewed?) • identify useful raw data & transforms (e.g. log(x)) Take Away: it is always well worth looking at your data!
  • 11. 1. Exploratory Data Analysis 2. Fundamentals of Effective Data Visualization 3. Tools for Data Visualization 4. Demo using Python, R and Knime to create visualization 5. Creating insightful reports with Visual tools 6. Q & A 11 AGENDA
  • 12. FUNDAMENTALSOF EFFECTIVE DATAVISUALIZATION Data Quality Issues A good understanding of Statistical Theories How to move volumes of Data Should I use Machine Learning? 12
  • 15. A GOOD UNDERSTANDINGOF STATISTICALTHEORIES 15
  • 16. A GOOD UNDERSTANDINGOF STATISTICALTHEORIES 16
  • 18. Introduction to Information Visualization - Fall 2013 *Adapted from The ParaView Tutorial, Moreland Visualization: Converting raw data to a graphics that is understandable to people
  • 19. Information visualization Where data does not have a well-defined representation in 2D or 3D space. Data is abstract
  • 20. Network Visualization Introduction to Information Visualization - Fall 2013
  • 22. HEATMAPVISUALIZATION • A heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors. • The seaborn python package allows the creation of annotated heatmaps which can be tweaked using Matplotlib tools as per the creator's requirement. 22
  • 23. 1. Exploratory Data Analysis 2. Fundamentals of Effective Data Visualization 3. Tools for Data Visualization 4. Demo using Python, R and Knime to create visualization 5. Creating insightful reports with Visual tools 6. Q & A 23 AGENDA
  • 24. Tools for DataVisualization - Knime - Python Libraries - R Libraries - Google Data Studio - D3.js 24
  • 25. KNIME DATAVISUALIZATIONTOOLS • KNIME Analytics Platform provides many nodes for data visualization, including scatter plots, pie charts, box plots, histograms as well as tag clouds and visualizations of networks. Data Visualization Nodes • KNIME has a number of native visualization dedicated nodes. • Hiliting • Geo-location • R Choropleths 25
  • 26. KNIME FEATURES KNIME uses modular workflow approach, which documents and stores the analysis process in the exact same order it was conceived and implemented. All results in the workflow are instantly available for review by the user, aiding debugging at every stage in the workflow Core KNIME features include: • Scalability through sophisticated data handling (intelligent automatic caching of data in the background while maximizing throughput performance) • Highly and easily extensible via a well-defined API for plugin extensions • Intuitive user interface • Import/export of workflows (for exchanging with other KNIME users) • Parallel execution on multi-core systems • Command line version for "headless" batch executions 26
  • 27. KNIME FUNCTIONALITIES Available KNIME modules cover a vast range of functionality, such as: • I/O: retrieves data from files or data bases • Data Manipulation: pre-processes your input data with filtering, group-by, pivoting, binning, normalization, aggregation, joining, sampling, partitioning, etc. • Views: inspects the data and results with several interactive views, supporting interactive data exploration • Hiliting: ensures hillite data points in one view are also immediately hillite in all other views • Mining: uses state-of-the-art data mining algorithms like clustering, rule induction, decision tree, association rules, naïve bayes, neural networks, support vector machines, etc. to better understand your data 27
  • 28. Tools for DataVisualization Major Python Libraries - Matplotlib - Seaborn - Ggplot - Bokeh - Plotly - Pygal - Altair - Geoplotlib Etc.. 28
  • 29. MATPLOTLIB – 2 D Graphics 29 Simple and powerful visualizations can be generated using the Matplotlib Python Library. It is the most widely-used library for plotting in the Python community. Libraries like pandas are “wrappers” over Matplotlib allowing access to a number of Matplotlib’s methods with less code. The versatility of Matplotlib can be used to make many visualization types:- •Scatter plots •Bar charts and Histograms •Line plots •Pie charts •Stem plots •Contour plots etc
  • 30. SEABORN • Seaborn is a popular data visualization library that is built on top of Matplotlib. • Seaborn’s default styles and color palettes are much more sophisticated than Matplotlib. • Seaborn is a higher-level library, meaning it’s easier to generate certain kinds of plots, including heat maps, time series, and violin plots. 30
  • 31. ggplot • Ggplot is a python visualization library based on R’s ggplot2 and the Grammar of Graphics. • Ggplot operates differently compared to Matplotlib: it lets users layer components to create a full plot. • The Grammar of Graphics has been hailed as an “intuitive” method for plotting, though, seasoned Matplotlib users might need time to adjust to this new mindset. 31
  • 32. Bokeh https://bokeh.pydata.org/en/latest/docs/gallery.html#gallery • Bokeh is native to Python, not ported over from R, unlike ggplot. Bokeh, like ggplot, is also based on The Grammar of Graphics. • It also supports streaming, and real-time data and its unique selling proposition is its ability to create interactive, web-ready plots, which can easily output as JSON objects, HTML documents, or interactive web applications. • Bokeh has three interfaces with varying degrees of control to accommodate different types of users. • The topmost level is for creating charts quickly. It includes methods for creating common charts such as bar plots, box plots, and histograms. • The middle level allows the user to control the basic building blocks of each chart (for example, the dots in a scatter plot) and has the same specificity as Matplotlib. • The bottom level is geared toward developers and software engineers. It has no pre- set defaults and requires the user to define every element of the chart. 32 https://demo.bokehplots.com/apps/crossfilter https://realpython.com/python-data-visualization-bokeh/
  • 33. PLOTLY • Plotly is widely known as an online platform for data visualization. • It can be accessed from a Python notebook. • Like Bokeh, Plotly’s strength lies in making interactive plots, and it offers some charts not found in most libraries, like contour plots. • Can also be used by people with no technical background for creating interactive plots by uploading the data and using plotly GUI. • Plotly is compatible with ggplots in R and Python. • It allows to embed interactive plots in projects or websites using iframes or html. 33 https://plot.ly/python/line-and-scatter/ https://plot.ly/feed/?q=plottype:choropleth
  • 34. PYGAL • Offers interactive plots that can be embedded in a web browser. The ability to output charts as SVGs, is its prime differentiator. For work involving smaller datasets, SVGs will do just fine. However, for charts with hundreds of thousands of data points, they become sluggish and have trouble rendering. It’s easy to create a nice-looking chart with just a few lines of code since each chart type is packaged into a method and the built-in styles are pretty. 34
  • 35. ALTAIR 35 Altair is a declarative statistical visualization python library based on Vega-lite. Declarative means you only need to mention the links between data columns to the encoding channels, such as x-axis, y-axis, color, etc. and the rest of the plotting details are handled automatically. Being declarative makes Altair simple, friendly and consistent. It is easy to design effective and beautiful visualizations with a minimal amount of code using Altair.
  • 36. Geoplotlib • It is a toolbox used for plotting geographical data and map creation. • It can be used to create a variety of map- types, like choropleths, heatmaps, and dot density maps. • It provides a set of in-built tools for the most common tasks such as density visualization, spatial graphs, and shape files. • Simply said Geoplotlib is a Python library dedicated to visualization of maps 36
  • 37. Major RVisual Libraries 37 • Plotly - Plotly's R graphing library makes interactive, publication-quality graphs online. Can be used to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, and 3D (WebGL based) charts. • Ggplot2 - The ggplot2 package lets you make beautiful and customizable plots of your data. It implements the grammar of graphics, an easy to use system for building plots. • Shiny - Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions. https://shiny.rstudio.com/gallery/genome-browser.html https://rdrr.io/snippets/http://gallery.htmlwidgets.org/ docs.ggplot2.or
  • 38. GOOGLE DATA STUDIO • Currently in beta, Google Data Studio allows you to create branded reports with data visualizations to share with your clients. ... Google Data Studio is part of theGoogle Analytics 360 Suite — the high-end (i.e., pricey)Google Analytics Enterprise package. • Data Studio is Google's reporting solution for power users who want to go beyond the data and dashboards of Google Analytics. The data widgets in Data Studio are notable for their variety, customization options, live data and interactive controls (such as column sorting and table pagination). • You can create up to five custom reports for free earlier – now you can create as many as required 38 https://datastudio.google.com/reporting/1Rg5y6r0640X8uo2xo 2XY48sG9IyMiYEN/page/wcCU
  • 39. D3.JS • D3.js is a JavaScript library for manipulating documents based on data. • D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation. 39
  • 40. 1. Exploratory Data Analysis 2. Fundamentals of Effective Data Visualization 3. Tools for Data Visualization 4. Demo using Python, R and Knime to create visualization 5. Creating insightful reports with Visual tools 6. Q & A 40 AGENDA
  • 41. 1. Exploratory Data Analysis 2. Fundamentals of Effective Data Visualization 3. Tools for Data Visualization 4. Demo using Python, R and Knime to create visualization 5. Creating insightful reports with Visual tools 6. Q & A 41 AGENDA
  • 42. Reporting and Analysis • Reporting is “the process of organizing data into informational summaries in order to monitor how different areas of a business are performing.” • Analytics is “the process of exploring data and reports in order to extract meaningful insights, which can be used to better understand and improve business performance.” 42
  • 43. An Analytical Report? An analytical report is a business report • It uses qualitative and quantitative data to analyze as well as evaluate a business strategy or process. • Empowers decision makers to make data- driven decisions based on evidence and analytics. 43
  • 45. Collecting Metrics is easy – Generating Insights is what nails it! Generate - Actionable insight • Actionable Insights is a term in data analytics and big data for information that can be acted upon or information that gives enough insight into the future that the actions that should be taken become clear for decision makers. • Analytics (mathematical ways of synthesizing metrics) must illuminate business conditions, sentiment and directional changes over time. • Insights are what humans make from analytics - once you have data and perform the analysis, you have the knowledge to form insights and change your actions or responses. 45
  • 46. 1. Exploratory Data Analysis 2. Fundamentals of Effective Data Visualization 3. Tools for Data Visualization 4. Demo using Python, R and Knime to create visualization 5. Creating insightful reports with Visual tools 6. Q & A 46 AGENDA
  • 47. 47 This session is for education purpose and the material used in this presentation has been compiled from various free and readily available resources, a full acknowledgement list can be furnished on request