4. Defining Data Science
• Data Science deals with the science and algorithms
related to data.
• Data generated from various sort of sources.
• Report says, “Every day, approximately 2 quintillion bytes
of data is generated. If it grows at this pace, then by the
next 3 years, it is expected that 2MB of data will be
created every second for every individual on this planet.”
• Last 2 years witnessing the creation of 90% of data over
5. • Data has two sources:
• Structured sources include information that is compatible
with the relational database.
• E.g. ATM transactions, Flight Tickets which enable SQL to
make changes in them.
• Unstructured data is generated from tweets and comments
on social media, audio and video files which the SQL cannot
“ Data Science is a broad field which is an assembly of scientific techniques,
methods, processes used to clean the data and then extract some useful
patterns and insights in form of visualizations.”
• Visualizations are crucial to make important business decisions and come up
with strategies that are instrumental for organization’s well-being.
In 1997, when C. F. Jeff at University of Michigan, stated that below concepts
should be studied under phrase Data Science.
• Data Collection
• Data Modeling
8. Role of Data Science on Statistics
• Computer Science
• Problem Solving
• Machine Learning
9. Data Science??
In 2012, it was titled as the “The sexiest job of the
21st Century” by Harvard Business School.
• Statistics is the branch of mathematics that deals with data collection,
categorization, interpretation and presentation.
• These techniques helped with the processing and analyzing of the data at a
12. StatisticsTechniquesTo Deal with Data
• Data Collection
– Collecting relevant data/information
– Primary data includes surveys, observations and experiments.
– Secondary data has internal records and government published data.
• Data Categorization and Classification
– Organized to get some insights
For example, we have data of heights of 10 people
160cm, 165cm, 155cm, 190cm, 177cm, 181cm, 179cm, 185cm, 159cm, 173cm
This data in an ordered array will look like
155cm, 159cm, 160cm ,165cm, 173cm, 177cm, 179cm, 181cm, 185cm, 190cm
The above data tells us that 155cm is the shortest height while 190cm is the tallest.
13. StatisticsTechniquesTo Deal with Data
• Data Classification
– Assembly of relevant facts/data into different categories/groups as per features.
– Factors are:
• Chronological (basis of time)
• Data Presentation
– Includes frequency distribution using histograms.
– For example, assume you are looking for prospective clients for your new
product which is an electric bike.
• Data Science has tons of applications in real-world implementation.
• Recommender Systems
– Content based – keeps track of users watching habits.
– Collaborative based – recognizes users with similar tastes.
• Voice and Image Recognition
• Spam and Fraud Detection
• Many more…….
15. Data Scientists andTheir Role
• Data Scientist is a Rockstar!!!
• A Data Scientist is an individual who has the power and freedom to
experiment with tons of different kinds of data.
• Based on knowledge of:
– Problem solving
– Critical thinking
– Careful analysis
16. • For anyone who is willing to carry this “tag” along should be well-versed with a lot
Some of them are
• Data wrangling or data munging
• Coding prowess in both R and Python
• Machine learning and AI
• Data visualization
• Communication skills
17. Data Analyst v/s Data Scientist
• Data Analyst has a lot to do with converting the data into a structured
format in order to process it further.
• Focus more on Data Mining and Data Auditing
• Data mining involves retrieving information from large databases with the help of SQL to
extract new data/information.
• Data auditing involves checking the essence of data and trying to figure out if the data is
capable enough for gaining useful insights or not.
18. Data Analyst v/s Data Scientist
• Data Scientist take the clean data and trying to gain some meaningful
• An algorithm either from classification or regression is implemented in
order to create a model and make it sustainable enough to gain some
business insights with the help of visualization tools.
20. Are There Enough Skilled Data Scientists In The Industry?
• According to a survey conducted by IBM, the demand for data
scientists will soar by 28% by 2020.
• That includes all jobs which require machine learning, big data,
visualization likeTableau and PowerBI expertise and knowledge of
• This is divided among the industries looking for such professionals in
finance, insurance, professional services, and IT sectors.
21. A candidate who is always thirsty for new challenges and loves problem-solving
of any kind is capable to become a skilled data scientist.
He likes observing and defining a problem from different angles and
Coding is his daily hustle and loves doing it, not because the problem demands
him to do, but he knows how interesting it becomes to come up with new findings
and insights and then make a cute little story out of it!
22. Data Science Effects
How Can Data Science Help A Business/CompanyGrow?
• Data Science was breathing in the IT industry for a long time.
• The sudden increase in the amount of data hinted the companies to make it a norm slowly and steadily.
• There are numerous ways in which this emerging discipline can help an organization grow and achieve
• Business logistics, including supply chain optimization
• Health and wellness
• Education and electronic teaching
• Climate and energy
23. Popular Data ProcessingTOOLS in Data Science
• Jupyter – open source tool to create and distribute documents
• R Studio – open source tool for R programming.
• SAS – analytics tool.
• Apache Spark – open source shared software specializes in cluster computing.
• Microsoft Excel – spreadsheet.
• SQL – programming language.
• Tableau – data visualization tool used for representing data in terms of charts.
• PowerBI – business intelligence tool developed by Microsoft.
• Data Science is turning out to be one of the fastest growing fields in the US and India.
• Today, it has its foot in weather forecasting, sales prediction, fraud and spam detection, pattern recognition, taxi fare
prediction, sentiment analysis, and neural networks.
• The future of data science is going to be dominated byArtificial Intelligence and Automation.
• These two big-heads have the capability of changing the current market scenario into something that data scientists describe
as the “age of revolution”.
• Machines are enriching themselves with new concepts and technology every counting second which is making them smarter
and sharper than humans.
• Looking at the current scenario of the market, data science is slowly and gradually making its
way into businesses and enterprises.