We produce quite a lot of data. Some of this data comes in the form of business transactions and is stored in a relational database. This relational data is often combined with other non-structured, high volume and rapidly changing datasets known in the industry as Big Data. The challenge for us as data integration professionals is to then combine this data and transform it into something useful. Not just that, but we must also do it in near real-time and using a big data target system such as Hadoop. The topic of this session, real-time data streaming, provides us a great solution for that challenging task. By combining GoldenGate, Oracle’s premier data replication technology, and Apache Kafka, the latest open-source streaming and messaging system for big data, we can implement a fast, durable, and scalable solution. This session will walk through the implementation of GoldenGate and Kafka.
Presented at Collaborate16 in Las Vegas.
3. info@rittmanmead.com www.rittmanmead.com @rittmanmead
About Rittman Mead
3
•World’s leading specialist partner for technical
excellence, solutions delivery and innovation in
Oracle Data Integration, Business Intelligence,
Analytics and Big Data
•Providing our customers targeted expertise; we are a
company that doesn’t try to do everything… only
what we excel at
•70+ consultants worldwide including 1 Oracle ACE
Director and 3 Oracle ACEs, offering training
courses, global services, and consulting
•Founded on the values of collaboration, learning,
integrity and getting things done
Unlock the potential of your organization’s data
•Comprehensive service portfolio designed to
support the full lifecycle of any analytics solution
4. info@rittmanmead.com www.rittmanmead.com @rittmanmead 4
Visual Redesign Business User Training
Ongoing SupportEngagement Toolkit
Average user adoption for BI
platforms is below 25%
Rittman Mead’s User Engagement Service can help
More info: http://ritt.md/ue
7. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Typical Example - Marketing
7
• Financial data stored in RDBMS
• Social media data, web logs, Google analytics, etc all in
various formats
• Bring it all together for analysis
‣ Marketing campaign effect on sales
13. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Kafka - How is it used?
13
• Pure Event Streams
• System Metrics
• Derived Streams
• Hadoop Data Loads / Data Publishing
• Application Logs
• Database Changes
- Log Compaction
- Data cleansing
14. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Let’s Jump Right In
14
• An example…near and dear to my heart
One single view of the Oracle Data Integrator logs!
- Oracle Data Integrator session logs stored in the repository
- ODI Agent logs are text based log files
- To see the full picture of your ODI environment, they must be
combined
19. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Extract from the ODI Repository with GoldenGate 12c
18
• Prepare the database
• Setup GoldenGate for Oracle Database
- Install and configure
• Setup Manager, Extract and Pump parameter files
• Add Extract and Pump process groups
• Start!
27. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Stream ODI Agent Logs to Kafka via Logstash
26
• Application log processing is a standard use for Kafka
- Many approaches to extract logs
• Logstash
- Part of the Elastic (formerly ELK) stack
- Robin Moffatt blogged —> http://ritt.md/kafka-elk
- Producer configuration for Kafka
28. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Logstash to Kafka - Setup and Startup
27
• Startup Zookeeper
- Already installed on Big Data Lite
• Set Kafka server.properties
- Broker ID
- Number of partitions
- Log retention period
- Zookeeper connection
• Start Kafka
33. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle GoldenGate for Big Data
32
• Kafka one of many handlers
- HDFS, HBase, Flume
• Pluggable Formatters
- Convert trail file transactions to alternate format
- Avro, delimited text, JSON, XML
• Metadata Provider
- Handles mapping of source to target columns that differ in structure/name
- Similar to SOURCEDEF file in GoldenGate
- Avro or Hive
34. info@rittmanmead.com www.rittmanmead.com @rittmanmead
Oracle GoldenGate for Big Data - Kafka Handler
33
• Standard GoldenGate Extract / Pump processes
- We just set this up
• Replicat parameter file & process group
• Kakfa Handler configuration
• Kafka Producer properties
- Note: Kafka 0.8.2.0 & 0.8.2.1 are certified with GoldenGate
• But, I’ve heard 0.9.0+ works…
35. info@rittmanmead.com www.rittmanmead.com @rittmanmead
GoldenGate and Kafka…Prerequisites
34
• Zookeeper & Kafka up and running
• Add topic to broker up front vs dynamically
• Kafka Handler must have access to broker server
• Kafka libraries must match Kafka version
41. info@rittmanmead.com www.rittmanmead.com @rittmanmead
GoldenGate and Kafka…Startup
40
• Create a topic in Kakfa
• Add Replicat process group to GoldenGate on target
• Start Kafka console consumer
• Start GoldenGate extract/pump on source, replicat on target
44. info@rittmanmead.com www.rittmanmead.com @rittmanmead
GoldenGate Big Data Adapter Challenges
43
• GoldenGate could be a single point of failure
- Kafka is a fault-tolerant, distributed system
• Source transactions may end up larger than expected
- max.request.size
• Need for speed?
- batch.size
- linger.ms