Apigee is releasing a free developer version of Insights—a predictive analytics service built on Hadoop. This webcast will give enterprise architects and Hadoop enthusiasts an overview of how to build a real world system on top of Hadoop.
In this webcast, two Apigee engineering leads will explain how to:
- secure a shared Hadoop service using APIs
- make Hadoop elastically scale on demand
- maximize the efficiency of Hadoop across multiple tenants
- monitor and operate a multi-tenant Hadoop service
10. Insights approach for Apigee Developer
10
Accelerated
Development
Descriptive
&
Predictive
Behavior
Based
Algorithms
E2E
Experience
Free
11. Architecture
1
DATA
INSIGHTS
1.Data upload
Structured or Unstructured
2. Scalable
Volume, Variety &
Velocity
3. Core IP
Machine Learning
Graph Processing
Un-structured Data
4. Analytics Offerings
Predictive & Journey
analytics, segmentation
User Interactions
Prediction Journey Segmentation
Computational Algorithms
Machine Learning Library
Data
Pipelines Unstructured Data
Processors
GRASP Processor
Distributed Processing Foundation
Distributed Data and Job Management
Apache usergrid
Query Language
Modeling Work Bench User Interface
12. Transactional Datastore
Modeling, Scoring,
Data Transformation,
Aggregation/Reporting
Ephemeral Hadoop Cluster
Management
Service
Software Libraries
GRASP Unstructured Data
Machine Learning
Insights Master
Data Staging Area
Monitoring
service
Ingestion Datastore
GRASP Query Service
Query
Datastore
Query Server
Real Time Service (Edge)
Real Time
Datastore (usergrid)
node
Applications
UI, Modeling
Workbench
Application Data
HTTPS, AWS APIs
HTTP(S)
Persistent
Datastore
= S3
= HDFS
API
System Components
Metadata Service
Runtime Metadata
Job Queue, Job Dependencies, Data
Set partitions
Metadata - Store
Static Metadata
DataStore & Dataset, Application, Job
13. How does Insights work?
Ingest Customer
Data
Batch or browser based
Event based or Customer profile
Aggregate behavior
graphs
Cross-channel, domain-agnostic
customer journey graphs
Enriched with Customer profile
Query capability and
machine learning
Customer journey visualization
Models & Scores
Data scientist +
developer support
R interface for predictive modeling on
Hadoop
Integrated with API Edge (incl BaaS,
node.js)
Data Flow
Customer
Data store
Persistant
Data store
HDFS on
compute cluster
Serving Data store
(Customer,
usergrid)
Data Ingestion
(Batch or Browser
based)
Data Moved to
Persistent
storage
Data brought to the
compute cluster for
processing
Processed Data
exported to
appropriate
location
14. Transactional Datastore
Modeling, Scoring,
Data Transformation,
Aggregation/Reporting
Ephemeral Hadoop Cluster
Management
Service
Software Libraries
GRASP Unstructured Data
Machine Learning
Insights Master
GRASP Query Service
Query
Datastore
Query Server
Real Time Service
Real Time
Datastore (usergrid)
node
Applications
UI, Modeling
Workbench
Application Data
HTTPS, AWS APIs
HTTP(S)
Persistent
Datastore
= S3
= HDFS
API
Data level Multi-tenancy
Metadata Service
Runtime Metadata
Job Queue, Job Dependencies, Data
Set partitions
Metadata - Store
Static Metadata
DataStore & Dataset, Application, Job
Data Staging
Monitoring
service
Ingestion Datastore
Datasets segregated/sharded by Account ID
Data keyed by account ID
15. Applications
UI, Modeling
Workbench
Application Data
Transactional Datastore
Modeling, Scoring,
Data Transformation,
Aggregation/Reporting
Ephemeral Hadoop Cluster
Management
Service
Software Libraries
GRASP Unstructured Data
Machine Learning
Insights Master
Data Staging Area
Monitoring
service
Ingestion Datastore
GRASP Query Service
Query
Datastore
Query Server
Real Time Service
Real Time
Datastore (usergrid)
node
HTTPS, AWS APIs
HTTP(S)
Persistent
Datastore
= S3
= HDFS
API
Scalability
Metadata Service
Runtime Metadata
Job Queue, Job Dependencies, Data
Set partitions
Metadata - Store
Static Metadata
DataStore & Dataset, Application, Job
Horizontal ScalingElastic/Ephemeral scaling
Sharding
16. Insights UI & APIs
• HTML5 Single page application
• Interacts with RESTful APIs
• Guide a novice user through the experience – Help them
understand important Predictive / Machine learning concepts
• Scalable REST API infrastructure
16
19. Try it out Apigee Developer
https://accounts-beta.apigee.com
19
20. Summary
• Be practical when approaching multi-tenancy
• Cost can be drastically reduced with elastic scaling & Multi-
tenancy
• Developer Experience requires continual refinement
• Try it out our Free Service for yourself !
20
Notes de l'éditeur
Aggregate Behavior Graphs: technology platform, whether you are in retail or telco; It’s NOT a social graph
Query capability: import multiple independent streams of activity; system will do the joins and find patterns; It’s NOT traditional BI, because it provides the time-dimension access in behavior graphs; ML: feature selection is automated; easier to maintain models; It’s NOT a black-box vertical-specific algorithm
Data scientist + developer support: Last mile solution
Data isolation, data should not be accessible other customers.
System isolation, for compliance driven customer(health, finance), system should not be shared by other customer.
Amazon Auto Scaling
Beacon Servers and Real-time Servers scale up to a maximum number and scale down to a minimum number depending upon the traffic
Amazon load balancer
Ephemeral Scaling
Ephemeral Hadoop cluster scales up to a maximum size driven by SLA, data and concurrency of jobs.
Amazon elastic storage expands with your data needs
Cluster is terminated when the last job is finished.
Horizontal Scaling
The GRASP Query cluster scales linearly with data & concurrency by adding additional machines.
Admin controlled
Sharding
Ingestion datastore