More Related Content Similar to A New Data Architecture for the App Economy - StampedeCon 2013 (20) More from StampedeCon (20) A New Data Architecture for the App Economy - StampedeCon 20131. 1 ©2013 Apigee. Confidential – All Rights Reserved.
Apps + Data + APIs
A New Data Architecture for the
App Economy
Anant Jhingran, Apigee
2. 2 ©2013 Apigee. Confidential – All Rights Reserved.
Developer User
Digital Business Value Chain
API APPBackend Services
Internal
Partner
External
Customer
Employee
Partner
Existing
Partner
New
3. 3 ©2013 Apigee. Confidential – All Rights Reserved.
Digital Signals come in Three Forms in this value chain
Digital Assets
B&M
Web
Events
Entities
Context
4. 4 ©2013 Apigee. Confidential – All Rights Reserved.
• /timestamp:
• {“timestamp”: 134578901234,
• “payload”: {
• “sending entity”: UUID1,
• “receiving entitiy”: UUID2,
• “data”: {
• “field1”: value1,
• …
• }
• }
• }
• Outside the billionaire’s club, might be more typically 30 – 50
MM/day
Event Structure – generalization of “Facts” in Data Warehouse
5. 5 ©2013 Apigee. Confidential – All Rights Reserved.
• POST/GET
• /users
• /developers
• /buddies
• /locations
• /products
• …
• Typical environments, ~100,000 – 1MM entities
Entity Structure, generalization of “Dimensions” in Data
Warehouse
6. 6 ©2013 Apigee. Confidential – All Rights Reserved.
Context
= “Secondary Entities + Events”
7. 7 ©2013 Apigee. Confidential – All Rights Reserved.
★
Time of
Event
Context = Other
nearby relevant and
interesting events
Time as Context
8. 8 ©2013 Apigee. Confidential – All Rights Reserved.
The Rugby World Cup’s Effect on Beer Consumption in AU
Context
Analysis
9. 9 ©2013 Apigee. Confidential – All Rights Reserved.
Context = Nearby,
interesting, relevant locations
Location as Context
10. 10 ©2013 Apigee. Confidential – All Rights Reserved.
Where does a User fulfill her needs?
/storelocator
/product
/search
/buy
/findinstore
< 3 days
< 1 day
Context
Analysis
11. 11 ©2013 Apigee. Confidential – All Rights Reserved.
Context = Complementary, supplementary and substitute
entities (products, services, data)
Related Entities as Context
12. 12 ©2013 Apigee. Confidential – All Rights Reserved.
• /addtocart/product/12345
• /addtocart/product/34577
• Context is
– Product Categories
– /addtocart/product/12345?category=menscoats
– /addtocart/product/34577?category=menscoats
• Analysis is
– Promotion Effectiveness (within a 1 week window) grouped
by product category (not product)
Determining effectiveness of promotions
13. 13 ©2013 Apigee. Confidential – All Rights Reserved.
Developer Activity as Context
• Developer Activity
– Checkins, Repos, Follows
• Developer Profile
– Skills, Languages, Platforms
• Developer Network
– Follows, Followers, Watchers
14. 14 ©2013 Apigee. Confidential – All Rights Reserved.
Building the right APIs, Hackathons, SDKs for developers
Context
Analysis
15. 15 ©2013 Apigee. Confidential – All Rights Reserved.
Information and Use as Context
Reviews
Description
Category
Demand
User Action
(e.g. Purchase)
Context = Information leading to decisions in end user
use cases
16. 16 ©2013 Apigee. Confidential – All Rights Reserved.
Behavior Patterns as Context (Habits)
• User Activity on Apps establishes
patterns of Behavior and Actions
• Deviations from the behavior profile
are interesting also
17. 17 ©2013 Apigee. Confidential – All Rights Reserved.
Public Profiles and Social Activity as Context
• Social Profile, Network and Activity describe users
• Features like the Facebook Timeline for user’s
preferences
18. 18 ©2013 Apigee. Confidential – All Rights Reserved.
Critical Technical Features
19. 19 ©2013 Apigee. Confidential – All Rights Reserved.
The Big Data System for the App Economy must understand…
Events
Entities
Context
DATA:
ANALYSIS:
Both “Batch” and “Real-Time”
20. 20 ©2013 Apigee. Confidential – All Rights Reserved.
• Half Life of Data
• ETL
• Data Modeling
• Real-Time Complement
Many things are Different
21. 21 ©2013 Apigee. Confidential – All Rights Reserved.
Half Life of Data
Volume Value
NOWNOW – 1 YEAR
App
Economy
“Old”
Economy
22. 22 ©2013 Apigee. Confidential – All Rights Reserved.
APIs displace ETL
API
s
ET
L
Fed by handful of core apps Myriad apps and services
Concise data Verbose data
Data optimized for storage Data optimized for consumption
Well-modeled business systems
and data owned by enterprise
Disparate, dynamic data in fast-paced
mobile, social apps ecosystems
Works as self-contained ‘cubes’ Works by mixing with other APIs
23. 23 ©2013 Apigee. Confidential – All Rights Reserved.
The new Broad Data Platform needs some new constructs
Enterprise
Systems"
External
Online Data"
Data Collection
Data Processing
Entity and Event
Model
APIs
API DataApp Data
SQL
Dimensions
and Facts
Joins and
Aggregations
ETL
Map Reduce, Pig, Hive
Key Value
Aggregations
Bulk Loads, Flume…
REST, Odata?
Collections, Time
Series
Entity Resolution,
Signal Amplification,…
API based access
Warehousing Big Data Broad Data
24. 24 ©2013 Apigee. Confidential – All Rights Reserved.
Batch must also Affect Real-Time traffic, and vice-versa
Big Data “Batch” Analysis
?
Real-Time “Gateway”
25. 25 ©2013 Apigee. Confidential – All Rights Reserved.
Computer Science is about Abstractions
RDBMS
Map/Reduce
Entities, Events and
Context
Abstractions
Flexibility
File System
Abstractions Reduce the Number of
Problems that can be solved
But Significantly Improve Time to Value
26. 26 ©2013 Apigee. Confidential – All Rights Reserved.
One Possible Architectural Block Diagram
RDBMS Cassandra
Entities and Events in the App Economy
Data Import and Access
APIs
CRUD and Analytical Libraries
• Tailored for “data” and use cases in the App Economy
• Built around fundamental transformations of ETL, Warehousing and Big Data
Hadoop
27. 27 ©2013 Apigee. Confidential – All Rights Reserved.
And also requires a different approach given that context can be
overwhelming
Insights
Data
API Traffic
Developer
Activity
Mobile App
Activity
28. 28 ©2013 Apigee. Confidential – All Rights Reserved.
• New Big Data Abstractions of
– Entities
– Events
– Context (secondary entities and events)
• New Data Processing Techniques
– Determining “value” of the data
– Data Stitching for enhancing signal to noise
• New Analytical Techniques
– Time Series Analysis
– Graph Traversals
– Real-Time Complement to Batch Analysis
• New Approach to Data Science
Summary