1. 1 | P a g e
What is data science and why it is important now?
What is data science and why it is important now?
Author – Bohitesh Misra (bohitesh.misra@gmail.com), September 2017
Data Science!
Fundamentally, in layman terms, data scientists collect data from various
data sources, clean them, organize the data and shape them to be able to
analyze them. We can separate data into training and testing to assess and
experiment the algorithm or model that is developed using statistics and
apply them to any area or sector that we find suitable. Data mining helps end
users extract useful business information from large databases.
Asking the right questions
Asking the right questions is extremely important, and hence apt
communications skills is essential for data scientists. With the advent of
technology and the internet, we now have access to data instantly and the
technology to test our interpretation to make decisions rapidly and promptly.
Data scientist
Data scientists use their data and analytical ability to find and interpret rich
data sources; manage large volume of data; merge data sources; ensure
consistency of datasets; create visualizations in understanding data; build
mathematical models using the data; and present and communicate the data
insights and findings to business decision makers.
"Data scientist" has become a popular buzzword with Harvard Business
Review dubbing it "The Sexiest Job of the 21st Century" and McKinsey &
Company projecting a global excess demand of 1.5 million new data
scientists.
Statistical models
2. 2 | P a g e
What is data science and why it is important now?
How does data mining works? It works the same way a human being does.
Basically, it uses historical information to learn for future. Mathematical
models like linear algebra, probability, statistics and calculus, regression,
clustering, predictive analysis are indispensable in data science. Python and
R are preferred programming languages that have packages and libraries
built specifically for data science which allow us to learn programming and
start applying. I’ve begun with R and use basic libraries for text and data
mining.
Data Cleaning
80% of the work by data scientists is data cleaning. Data is sometimes
available in preferred formats such as csv and xls, but you’ll find very little
data directly available to be executed using programming. APIs, web scraping
and SQL come in to the rescue of Data Scientists. Spark and Map-Reduce are
used to clean and analyze large and distributed datasets.
It’s everywhere!
Data-driven solutions are being used everywhere, from e-commerce websites,
social networking sites, financial visualization and interpretation.
Data-driven practices are increasingly being employed by companies over the
last few years. In fact, it would be difficult to find a sector in which data
science cannot be used to take better decisions, and companies are slowly
realizing this and adopting it.
Want to learn it?
I came across data science and decided it was the right fit for me and recently
completed Executive Management Programme from Indian Institute of
Technology Delhi in the same subject. Learning data science is very easy and
convenient, with the large number of MOOCs and eBooks available for free
online.
I urge you to think about how it may be applied to you, whether it is your
business where you can gather data in the form of reviews and opinions of
3. 3 | P a g e
What is data science and why it is important now?
customers to make better data-driven decisions. You can use the data from
movie review sites to choose your next movie.
Data science for Startups
Startups critically need a Data strategy around the collection, storage and
usage of large data, in a way that data can serve the purpose behind the selling
point of a startup and can also open-up additional potential monetisation
avenues in the future.
A common case can be recommendation engine, which can benefit from
all kinds of information about the users: age, gender, purchases, offerings and
discounts. Designing the platform in a way that improves information
collection from its users, results in a big database that can be used to improve
in better managing discount deals, improving advertising or even the user
experience on the platform.
A clear data strategy can provide startups with additional revenue scope
and can also provide with a competitive advantage.