1. Krishna Tewari
Global Head
Digital Publishing & Retail solutions
Datamatics Global Services Ltd
Leveraging Big Data Opportunities for Growth
2. Challenges for publishers
Big Data in publishing industry
The technology landscape
Use cases for publishing
Planning for Big Data
1
2
3
4
5
Agenda
3. This is ‘The Library of Alexandria’
Here the Egyptians once collected and managed every scroll of information
then available in the world
The classical content
Artist: O. Von Corven, Source: Wikipedia
8. Challenges for publishers
Big Data in publishing industry
The technology landscape
Use cases for publishing
Planning for Big Data
1
2
3
4
5
Agenda
9. Data & content in the publishing world
Structured Semi structured Unstructured
Content
Databases
XML Files
PDFs
Headers
Metadata
Image Banks
Application Files
Adverts
Feeds
Info Graphics
Audio
Video
Content sharing
Ratings
Readers / Content
Consumers
Subscriptions
Customer Information
CRM Data
Purchase History
Demographics
Service Logs
Reading Modes
Interest Areas
Buying Patterns
Searches,eMails
Spend Analysis
Likes
Tweets
Shares
Ratings
Reading ,Chats
Sales
Channels
Geo Spread
Publication type Performance
Geographical Performance Campaign Data
Discounts
Bundled offers
Geo preferences
Channel data
Hit counts
Events
Surveys
Marketing copies
Test runs
Authors/ Data
providers
Author Databases Contracts
Permissions
Rights
Market performances
Subject expertise
Qualifications
Affiliations
Emails, Payments
Tweets
Shares
Peer Reviews
80 % data existing in any enterprise today is unstructured
10. What Consists of Big Data?
Big Data
Integration
Big Transaction Data Big Interaction Data
Transactional Data:
Orders, Invoices,
Payments, Plans,
Deliverables, Travel
records
Other Interaction
Data
Big Data Processing
Analytical Data:
Historical Data, Machine
Streams, Clickstream
data, Log files
Volume
Velocity
Variety
Complexity
Big Data is the confluence of the three trends consisting of Big
Transaction Data, Big Interaction Data, and Big Data Processing
11. Challenges for publishers
Big Data in publishing industry
The technology landscape
Use cases for publishing
Planning for Big Data
1
2
3
4
5
Agenda
13. Challenges for publishers
Big Data in publishing industry
The technology landscape
Use cases for publishing
Planning for Big Data
1
2
3
4
5
Agenda
14. Use Case: Large Scale Data Archival
Data segregated in disparate platforms in different file
formats can be acquired & organized easily using Big Data
Transactional Data
Publishing House
Historical Data
• Millions of Images
• Millions of Data Files
• Thousands of articles
from hundreds of authors
Contracts
Board
Comments
Mails &
Tweets
Integrated
Data Repository
(Powered by
Big Data) Automatically indexed and
tagged and made available
for end users through a portal
15. Case Study : Archiving at RSC
• About Royal Society of Chemistry
– Europe’s largest society in advances of chemical science
• Business Challenge
– To organize assets accumulated since 1840s
– Content Summary:
• 1 million images
• Millions of Scientific data files
• Hundreds of thousands of articles from 200,000 authors
• Recent Captures – Social Media, Video and Digital Assets
• Solution
– MarkLogic (NoSQL solution) was used to create a repository accessible for RSC’s online
users, entrepreneurs, researchers and educators
– Content stored as XML documents (using document centric model)
• Benefits
– Allows RSC to publish 3x times as journals and 4x times as many articles
Source: http://is.gd/oyEu01
16. Case Study: Converting Large Scale Images in NYT
• About New York Times
– American daily newspaper, published in New York city since 1851
• Business Challenge
– NYT decided to make all public domain articles dated 1851-1922 available to the readers
free of charge
– 11 million articles available in images were to be converted to PDF format
– Previously PDF were generated dynamically. But as traffic scaled this approach ran out of
feasibility
• Solution
– Pre-generating articles & serving them as static files to readers
• Amazon S3 as File System
• Amazon EC2 for Web Services
• Hadoop to convert articles into PDF files
• Benefits
– NYT were able to save tremendous IT investments and were able to deliver over 1.5 TB
of data to users instantaneously
Source: http://is.gd/kMqKSe
17. Use Case: Leveraging Value in Social Media
GoodReads Reviews
Facebook Page Likes
and Comments
No of Tweets with
hashtag of bookname
Source: Twitter, Facebook, Goodreads pages of RailSea [Author: Chine Mieville, Publishers: Random House]
Publishing Companies can leverage Big Data to
aggregate and track social data in real time
18. Case Study: Personalizing Interactions at De
Persgroep
• About De Persgroep
– Leading Publishing and Broadcasting network in Belgium and Netherlands
• Business Challenge
– Millions of readers, viewers tune into De Persgroep’s print and digital, TV and radio
channels
– With users accessing content through multiple devices (iPad, Kindle, iPhone) consumer
data outgrew the bounds of siloed solutions
• Solution
– Customer used Lily 2.0 (with help from NGData – customer intelligence management
company) to get an intelligent view on how customers are leveraging the content
generated by the group
• Creating personalized interactions, messages, and offers based on user preferences and
purchase history
• They realized an increase in Customer Lifetime Value
• Benefits
– The adoption enabled De Persgroep to understand viewing and content preference of
customers, and to create and share timely and relevant content on those lines
Source: http://is.gd/M7lVWw
19. Challenges for publishers
Big Data in publishing industry
The technology landscape
Use cases for publishing
Planning for Big Data
1
2
3
4
5
Agenda
20. insights for growing the business
Reader / Content
Consumer
Past Searches
67% - LIFE SCIENCES
Entomology
Coleoptera - 56%
Lady bird beetle (72%)
Beetles (28%)
ad banners in the website
Display Lady bird research articles
Discount coupons for subject books
Customize bundled offers
Demographics
Prof in Humboldt Universität, Berlin
Dept of Agricultural entomology
Editor in Chief – Life sciences journal
Customized bills with focused ads
Upcoming publications
Discounts
Time of reading
Subject related searches 10 AM – 4 pm
device read 8 pm – 10 pm
Device content share – 9 - 9.30 pm
80% tweets – 6PM – 7 PM
Customized ad release timings
Ad release in devices
Do not disturb timings
Tailored call center action
Spend Analysis
Total monthly spend – euro 350
Research articles - euro 250
Books -euro 45
Journal subscription -euro 55
Ads of publications in price range
Bundled savings
Spend trend and alerts to sales
Social Media
Activity
Very active social media
FB shares – 27% XYZ | 80% ABC
Tweets – 18% XYZ | 82% ABC
Low share of wallet
Watch customer surveys
Alert customer Account Manager
Reading Device
24% online searches – desktop
76% Book reading - iPad
More focus on ipad alerts for
books
Offers on ebook versions
DATA ANALYTICS ACTIONS
22. Recommended Steps to consider Big Data
• Identify the business problem that you are trying to solve
• Identify the relevant technology that will be able to address the problem
• Break organization silos and form cross functional teams
• Assign responsibility to a mix of ‘left brain’ analytical and ‘right brain’
depicter type of people
• Start small, with proof of concepts playing around with existing commodity
hardware and free solutions
• Striking a balance between the existing technology infrastructure and
introduction of Big Data technologies
24. Leveraging Big Data Opportunities for Growth
Krishna Tewari
Global Head
Digital Publishing & Retail solutions
Datamatics Global Services Ltd
Krishna.tewari@datamatics.com