Presentation at ODTUG KScope'18 on the data engineering and advanced analytics capabilities in Oracle Analytics Cloud Data Lake Edition, Oracle Big Data Cloud and Oracle Event Hub Cloud Service
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
1. Mark Rittman, Oracle ACE Director
ODTUG KScope’18, Orlando June 2018
From BI Developer to Data Engineer with
Oracle Analytics Cloud Data Lake Edition
2. • Oracle ACE Director, Independent Analyst
• Past ODTUG Exec Board Member + Oracle Scene Editor
• Author of two books on Oracle BI
• Co-founder & CTO of Rittman Mead
• 15+ Years in Oracle BI, DW, ETL + now Big Data
• Host of the Drill to Detail Podcast (www.drilltodetail.com)
• Based in Brighton & work in London, UK
About the Presenter
2
4. •Data now landed in Hadoop clusters, NoSQL
databases and Cloud Storage
•Flexible data storage platform with cheap storage,
flexible schema support + compute
•Solves problem of how to store new types of data
and flexibility on when to process
•Typically used by data scientists as source for new
models or insights of interest
•Data Warehouses still have their place
•But very few new ones are being built
•Nobody leaves college dreaming of being an ETL
developer
•Except Michael Rainey
Meet the New Data Warehouse : The “Data Lake”
4
From “What is a Data Lake”,
https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
6. What is Data Engineering?
•When “Big Data” first became popular, all
users were termed “data scientists”
•Over time, this evolved into two distinct
roles:
• Data Scientists who focus on new insights + models
working from laptops using R + sampled data
• Data Engineers, who make at-scale data
consumable in some form, either directly or by data
scientists
7. •Data Engineers
•Can code, run clusters
•Create data pipelines & prepare data
•Train and build predefined ML models
•Knowledge of the math of ML limited
•They may be DBAs, BI developers
•Experience with DevOps, cloud
and….
What is Data Engineering?
8. •Oracle’s Cloud Analytics platform, built-on Oracle BI EE and Oracle DV technology
•Available as customer-managed and Oracle-managed (Autonomous Analytics Cloud)
•Available as three packaging options
•Oracle Analytics Cloud Standard
(aka Oracle DV in Oracle Cloud)
•Oracle Analytics Cloud Enterprise
(aka OBIEE12c in Oracle Cloud)
•Oracle Analytics Cloud Data Lake
(aka …?)
Oracle Analytics Cloud Data Lake Edition
8
10. Oracle Analytics Cloud
social
sensors
enterprisepersonal
SaaS
mobile
Data
Sources
Developers
Executives
Data Stewards
AnalystsData Catalog
One place to collect, search, explore & curate all data
Data Preparation
Prepare enriched, sharable, & reliable datasets across all data
Data Analysis
Understand & act using smarts: search, visualization, & storytelling
Oracle
Database
Services
Oracle
Big Data Cloud
Oracle
Storage Cloud
Data Engineers
11. •All functionality in OAC Standard Edition plus
•Integration with Oracle Big Data Cloud
•Additional data flow/data prep operators
•ML model build and train capability
•Text analytics and NLP processing
•Data flow execution in Apache Spark (*)
•Replicate from Cloud and On-Premise Apps
•Oracle Service Cloud –Taleo, Fusion Apps
•Incremental Ingest from DBs, Cloud + files
•Continuous Ingest from GoldenGate
OAC Data Lake Edition: Key Features
11
13. Long-Term Replacement for Big Data Discovery
13
•Visual Face of Data in Hadoop
•Data Preparation and Enrichment
•Spark Data Transformations
•Standalone technology + processes
•Visual Face of Data in Cloud
•Data Preparation and Enrichment
•Spark Data Transformations
•Oracle Analytics Cloud
14. •Explore, catalog and discover data in Oracle Big Data Cloud, Oracle Database
•Enrich and transform raw data into valuable information and insights
•Analyze at-scale data using Data Visualization
•Combine data from SaaS, social and real-time
•Create predictive and classification models
•Analyze the sentiment in social media feeds
•Data engineering without the hand-coding
OAC Data Lake Edition Use-Cases
14
18. 18
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
Scenario : Ingest and Analyze Real-Time Feeds
19. 19
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Scenario : Ingest and Analyze Real-Time Feeds
20. 20
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Scenario : Ingest and Analyze Real-Time Feeds
21. 21
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Oracle Analytics Cloud
Data Lake Edition
Scenario : Ingest and Analyze Real-Time Feeds
22. 22
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Oracle Analytics Cloud
Data Lake Edition
TRANSFORM
Scenario : Ingest and Analyze Real-Time Feeds
23. 23
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Oracle Analytics Cloud
Data Lake Edition
TRANSFORM ANALYZE
Scenario : Ingest and Analyze Real-Time Feeds
24. 24
Scenario : Ingest and Analyze Real-Time Feeds
ID & Access
Management
Auditing
Object
Storage
VCN
25. 25
Scenario : Ingest and Analyze Real-Time Feeds
ID & Access
Management
Auditing
Object
Storage
VCNAvailability Domain 1
32. •Catalog of all data assets
•Projects
•Connection to Hive Thrift Server
•IoT and Social Media Data Sets
•Data Flows and Sequences
•Managed data lake store
•Control the lifecycle of your
data lake assets
•Security
•Scheduling
Managing and Cataloging the Cloud Data Lake
32
33. Data Preparation Features from OAC Standard Edition
33
1. Split timestamp field
that’s not in valid format
2. Choose “space”
character as delimiter
3. Convert the first split
column into a date datatype
4. Choose the correct date
format for this field’s values
5. Repeat for the TIME split column,
concatenate with ’T’ in-between and
finally convert resulting field into
TIMESTAMP
34. 34
Data Flows are sequences
of data transformations
executed on the BI Server -
Spark execution on roadmap
for OAC DL
Create Essbase
Cube
Time Series
Forecast
Sentiment
Analysis
Predictive / ML
Model Train and
Build
Run custom R and
other python scripts
Extended Data Flow Capability for Data Lake Edition
Data Flows are based on the
technology previously
announce as “Dataflow ML”,
now delivered as part of
Oracle Analytics Cloud
35. Example : Enrich With Sentiment, Then Visualize
35
1. Add Sentiment Analyse
step to data flow, persist
final enriched dataset back
to Hive table
2. Add a calculation to convert
sentiment description values to
positive/negative cumulative
score
3. Analyze Results in Data
Visualization UI
36. Using Explain Feature to Automate Deriving Context
36
1. Right-Click on attribute
column to “explain” the drivers
of its values
2. ML algorithm explains basic
facts, drivers, anomalies and
identifies segments of interest
38. Transform, Aggregate and Join Datasets
38
Multi-step dataset joins
Aggregate Datasets
Binning and Grouping
39. Predictive Modeling and Forecasting
39
1. Select Prediction Model best
suited to predicting Kudos
from Strava bike rides
2. Select column who’s values
are to be predicted, and model
parameter values
3. Train model and then test
against remaining dataset
41. •Data Flow feature enables multi-step transform of ingested data
•Sentiment Analyze operator useful for social/text data enrichment
•Enables BI developers to train and build predictive models
•ML-driven Explain feature automates understanding of context
•Basic data engineering for BI developers
•More data lake features expected in v5, v6
•
OAC Data Lake: What Works, What’s Coming?
41
Integration of features from
Oracle Big Data Preparation
Cloud Service
Enhanced Summary view
highlights data shape and
data quality
42. Coming soon to London, Autumn/Fall 2018
https://mjr-analytics.com
43. Mark Rittman, Oracle ACE Director
ODTUG KScope’18, Orlando June 2018
From BI Developer to Data Engineer with
Oracle Analytics Cloud Data Lake Edition