In todays competitive environment companies are faced with different types of challenges. Implementation of Big Data is one of them. Dr. Nilesh Karnik takes us through some of them.
2. What we will discuss
The Challenge of BIG Data
ADVANCED Analytics
SOLUTIONS in the Pipeline
Copyright 2013 RESTRICTED CIRCULATION
2
3. Big Data : Distributed Processing
Aureus Claims Solution
!
Footer Option 2
OLD IDEA
Copyright 2013 RESTRICTED CIRCULATION
NEW IDEA
3
4. EXAMPLE 1: Task of storing books on a shelf
Aureus Claims Solution
Footer Option 2
Image source Flickr. Image copyright belongs with original artist.
Simple, right?
Copyright 2013 RESTRICTED CIRCULATION
5
5. EXAMPLE 1: Task of storing books on a shelf
Aureus Claims Solution
And now?
Footer Option 2
Image source Flickr. Image copyright belongs with original artist.
Copyright 2013 RESTRICTED CIRCULATION
6
8. EXAMPLE 2 : Summarizing a Report
Aureus Claims Solution
Footer Option 2
And now?
Copyright 2013 RESTRICTED CIRCULATION
9
9. EXAMPLE 3 : Baking a Cake
Simple, right?
Aureus Claims Solution
And now?
Footer Option 2
Image source PINTEREST. Image copyright belongs with original artist.
Copyright 2013 RESTRICTED CIRCULATION
10
10. Advanced Analytics
•
Well developed tool set for “small data” environment
•
Aureus Claims Solution
Challenges in Big Data environment
Footer Option 2
Copyright 2013 RESTRICTED CIRCULATION
11
13. Advanced Analytics: MapReduce Difficulties
Aureus Claims Solution
Footer Option 2
BATCH LEARNING SCANS ALL DATA IN ONE GO
Copyright 2013 RESTRICTED CIRCULATION
14
14. Some Solutions Data Scientists are working on
Aureus Claims Solution
New frameworks
• E.g., HaLoop*, PrIter# (Extensions of Hadoop)
• Percolator$ (Proprietary Google framework)
Footer Option 2
* Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: Efficient iterative data processing on large clusters”, VLDB, 2010.
# Y. Zhang, Q. Gao, L. Gao and C. Wang, “PrIter: A distributed framework for prioritized iterative computations”, SoCC, 2011.
$ D. Peng and F. Dabek, “Large-scale incremental processing using distributed transactions and notifications”, OSDI, 2010
Copyright 2013 RESTRICTED CIRCULATION
15
15. Some Solutions Data Scientists are working on
Aureus Claims Solution
Smarter algorithms / Different
implementations
• Random forest
• Parallelized Stochastic Gradient
Descent
Footer Option 2
Copyright 2013 RESTRICTED CIRCULATION
16