3. • ‘Big Data’ is similar to ‘small data’, but
bigger
•…but having data bigger it requires different
approaches:
• Techniques, tools and architecture
•…with an aim to solve new problems
• …or old problems in a better way
4.
5. Why Big Data
• Key enablers of appearance and growth of Big Data
are
–Increase of storage capacities
–Increase of processing power
–Availability of data
–Every day we create 2.5 quintillion bytes of data;
90% of the data in the world today has been
created in the last two years alone
6. Big Data Analytics
• Examining large amount of data
• Appropriate information
• Identification of hidden patterns, unknown correlations
• Competitive advantage
• Better business decisions: strategic and operational
• Effective marketing, customer satisfaction, increased
revenue
7. Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare
Multi-channel sales
Telecom
Manufacturing
Traffic Control
Trading Analytics Fraud and Risk
Log Analysis
Search Quality
Retail: Churn, NBO
8. Healthcare
• 80% of medical data is unstructured and is clinically
relevant
• Data resides in multiple places like individual EMRs,
lab and imaging systems, physician notes, medical
correspondence, claims etc
• Leveraging Big Data
• Build sustainable healthcare systems
• Collaborate to improve care and outcomes
• Increase access to healthcare
9. Market Size
Source: Wikibon Taming Big Data
By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself
10. India – Big Data
• Gaining attraction
• Huge market opportunities for IT services (82.9% of
revenues) and analytics firms (17.1 % )
• Current market size is $200 million. By 2015 $1
billion
• The opportunity for Indian service providers lies in
offering services around Big Data implementation
and analytics for global multinationals
11. India will require a minimum of 1 lakh data scientists in the next couple of years
in addition to data analysts and data managers to support the Big Data space.
12.
13.
14. NoSQL : non-relational or at least non-SQL database
solutions such as HBase (also a part of the Hadoop
ecosystem), Cassandra, MongoDB, Riak, CouchDB, and
many others.
Hadoop: It is an ecosystem of software packages,
including MapReduce, HDFS, and a whole host of other
software packages
NoSQL : approach to data management and database design that's useful for very large sets of distributed data. Hadoop: free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment Map Reduce: software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processor s or stand-alone computers. Map, a function that parcels out work to different nodes in the distributed cluster. Reduce, another function that collates the work and resolves the results into a single value.