Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 31 Publicité

Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way

Télécharger pour lire hors ligne

Today, enterprise technology is entering a watershed moment, businesses are moving to end-to-end automation, which requires integrating data from different sources and destinations in real time. Every industry from Internet to retail to services are leveraging NoSQL database technology for more agile development, reduced operational costs, and scalable operations. This institutes a need to model relational data as documents, define ways to access them within applications, and identify ways to migrate data from a relational database. This is where streaming data pipelines come into play.

Over the years, as the cloud’s on-demand resource availability, full-service, API-driven, pay-per-use model became popular and competitive, cloud infrastructure consolidation began, requiring the automated deployment of infrastructure to be simplified and scalable.


This session details one of the easiest ways to deploy an end-to-end streaming data pipeline that facilitates real-time data transfer from an on-premises relational datastore like Oracle PDB to a document-oriented NoSQL database, MarkLogic, with low latency, all deployed on the Kubernetes clusters provided by Google Cloud (GKE). Apache Kafka® is leveraged using Confluent Cloud on AWS, depicting a true multi-cloud deployment.

Today, enterprise technology is entering a watershed moment, businesses are moving to end-to-end automation, which requires integrating data from different sources and destinations in real time. Every industry from Internet to retail to services are leveraging NoSQL database technology for more agile development, reduced operational costs, and scalable operations. This institutes a need to model relational data as documents, define ways to access them within applications, and identify ways to migrate data from a relational database. This is where streaming data pipelines come into play.

Over the years, as the cloud’s on-demand resource availability, full-service, API-driven, pay-per-use model became popular and competitive, cloud infrastructure consolidation began, requiring the automated deployment of infrastructure to be simplified and scalable.


This session details one of the easiest ways to deploy an end-to-end streaming data pipeline that facilitates real-time data transfer from an on-premises relational datastore like Oracle PDB to a document-oriented NoSQL database, MarkLogic, with low latency, all deployed on the Kubernetes clusters provided by Google Cloud (GKE). Apache Kafka® is leveraged using Confluent Cloud on AWS, depicting a true multi-cloud deployment.

Publicité
Publicité

Plus De Contenu Connexe

Similaire à Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way (20)

Plus par ScyllaDB (20)

Publicité

Plus récents (20)

Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way

  1. 1. Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way Geetha Anne, Sr Solutions Engineer
  2. 2. Geetha Anne ■ Silicon Valley ■ 2 daughters ■ Cloudera, Servicenow, Hawaiian Airlines prior to joining Confluent ■ 10 years in the space ■ Software Development, Automation Engineering/Presales are key areas of expertise ■ Cooking, Singing, Hiking
  3. 3. ■ The Problem - Migrating to a modern NoSQL Database is a complex process ■ Why Confluent - Database and data modernization with Confluent ■ The Solution - Proposed architecture and action plan ■ Takeaways - Food for thought and Next Steps Agenda
  4. 4. The Problem
  5. 5. Modern, cloud-native databases power business critical applications with lower operational overhead Self-Managed Databases ● Rigid architecture that makes it hard to integrate with other systems ● Expensive in both upfront and ongoing maintenance costs ● Slower to scale to meet evolving demands Cloud Databases ● Lower TCO by decoupling storage from compute and leveraging consumption- based pricing ● Increased overall flexibility and business agility ● Worry free operations with built into auto-scaling and maintenance cycles
  6. 6. Integrating multiple legacy system to the cloud could be a complex, multi-year process Time and resource intensive Replacing or refactoring legacy data systems across environments is not easy. During which, data visibility can be limited. Insight blind spots Getting actionable data from disparate data sources is cumbersome. Most data insight comes from nightly loads, merges, and batch updates to create a complete view. Data silos across environments Difficulties with integrating multiple data silos and data formats. On-Prem Legacy Database Cloud Cloud Database CRM SaaS App Nightly Reporting Applications ETL App Batch Jobs ETL & Database Syncs
  7. 7. Easily modernize your database by integrating legacy with the cloud using Confluent 1. Simplify and accelerate migration Link on-prem and cloud for easy data movement across environments and process data in flight with ksqlDB stream processing 2. Stay synchronized in real-time Move from batch to real-time streaming and access change data capture technology using Confluent and our CDC connectors 3. Reduce total cost of ownership Leverage fully managed services and avoid prohibitive licensing costs from existing solutions offered by legacy vendors
  8. 8. Why Confluent
  9. 9. Real-time & Historical Data A sale A shipment A trade A customer interaction A new paradigm is required for Data in Motion Continuously process streams of data in real time “We need to shift our thinking from everything at rest, to everything in motion.” Real-Time Stream Processing Rich, front-end customer experiences Real-time, software-driven business operations
  10. 10. Operationalizing Kafka on your own is difficult Kafka is hard in experimentation. It gets harder (and riskier) as you add mission-critical data and use cases. ● Architecture planning ● Cluster sizing ● Cluster provisioning ● Broker settings ● Zookeeper management ● Partition placement & data durability ● Source/sink connectors development & maintenance ● Monitoring & reporting tools setup ● Software patches and upgrades ● Security controls and integrations ● Failover design & planning ● Mirroring & geo-replication ● Streaming data governance ● Load rebalancing & monitoring ● Expansion planning & execution ● Utilization optimization & visibility ● Cluster migrations ● Infrastructure & performance upgrades / enhancements V A L U E 1 2 3 4 5 Experimentation / Early Interest Central Nervous System Mission critical, disparate LOBs Identify a Project Mission-critical, connected LOBs Key challenges: Operational burden & resources Manage and scale platform to support ever-growing demand Security & governance Ensure streaming data is as safe & secure as data-at-rest as Kafka usage scales Real-time connectivity & processing Leverage valuable legacy data to power modern, cloud-based apps & experiences Global availability Maintain high availability across environments with minimal downtime
  11. 11. Cloud-native Infinite Store unlimited data on Confluent to enhance your real-time apps and use cases with a broader set of data Global Create a consistent data fabric throughout your organization by linking clusters across your different environments Elastic Scale up instantly to meet any demand and scale back down to avoid over-provisioning infrastructure
  12. 12. Everywhere Confluent provides deployment flexibility to span all of your environments SELF-MANAGED SOFTWARE Confluent Platform The Enterprise Distribution of Apache Kafka Deploy on-premises or in your private cloud VM FULLY MANAGED SERVICE Confluent Cloud Cloud-native service for Apache Kafka Available on the leading public clouds
  13. 13. The Solution
  14. 14. Three Phase Plan Modernize your Databases with Confluent 1. Migrate ● Choose the workloads that you’d like to migrate to the cloud ● Seamlessly integrate your data source via managed Confluent source connectors 2. Optimize ● Perform real-time data transformations using ksqlDB ● Find the most useful queries for your cloud data ● Work with our ecosystem of partners to find the best use of your data 3. Modernize ● Use our managed sink connectors to send data into your cloud database of choice ● Continue migrating workloads into the cloud as chances arise
  15. 15. Migrate: Source Connectors
  16. 16. 18 Instantly Connect Popular Data Sources & Sinks 130+ pre-built connectors 100+ Confluent Supported 30+ Partner Supported, Confluent Verified AWS Lambda
  17. 17. Modernize and bridge your entire data architecture with Confluent robust connector portfolio Modern, cloud-based data systems Legacy data systems Oracle Database ksqlDB Mainframes Applications Cloud-native / SaaS apps Azure Synapse Analytics Expensive, custom-built integrations Expensive, custom-built integrations Expensive, custom-built integrations Source Connectors Expensive, custom-built integrations Expensive, custom-built integrations Sink Connectors
  18. 18. Modernize: Sink Connectors
  19. 19. 3 Modalities of Stream Processing with Confluent Kafka clients 21 Kafka Streams ksqlDB ConsumerRecords<String, String> records = consumer.poll(100); Map<String, Integer> counts = new DefaultMap<String, Integer>(); for (ConsumerRecord<String, Integer> record : records) { String key = record.key(); int c = counts.get(key) c += record.value() counts.put(key, c) } for (Map.Entry<String, Integer> entry : counts.entrySet()) { int stateCount; int attempts; while (attempts++ < MAX_RETRIES) { try { stateCount = stateStore.getValue(entry.getKey()) stateStore.setValue(entry.getKey(), entry.getValue() + stateCount) break; } catch (StateStoreException e) { RetryUtils.backoff(attempts); } } } builder .stream("input-stream", Consumed.with(Serdes.String(), Serdes.String())) .groupBy((key, value) -> value) .count() .toStream() .to("counts", Produced.with(Serdes.String(), Serdes.Long())); SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES; Flexibility Simplicity
  20. 20. ksqlDB at a Glance What is it? ksqlDB is an event streaming database for working with streams and tables of data. All the key features of a modern streaming solution. Aggregations Joins Windowing Event-Time Dual Query Support Exactly-Once Semantics Out-of-Order Handling User-Defined Functions Compute Storage CREATE TABLE activePromotions AS SELECT rideId, qualifyPromotion(distanceToDst) AS promotion FROM locations GROUP BY rideId EMIT CHANGES How does it work? It separates compute from storage, and scales elastically in a fault-tolerant manner. It remains highly available during disruption, even in the face of failure to a quorum of its servers. ksqlDB Kafka 22
  21. 21. Built on the Best Technology, Available as a Fully-Managed Service Kafka is the backbone of ksqlDB ksqlDB is built on top of Kafka’s battle-tested streaming foundation. Its design re-uses Kafka to achieve elasticity, fault-tolerance, and scalability for stream processing & analytics.. Use a fully-managed service With Confluent Cloud ksqlDB, you need not worry about any of the details of running it. You can forget about: ● Clusters ● Brokers ● Scaling ● Upgrading ● Monitoring Pay only for what you use. ksqlDB server Kafka topic topic changelog topic Push & Pull Queries Kafka Streams Engine Local State (transient) topic Compute Storage 23
  22. 22. Accelerate your migration from legacy on-prem systems to modern, cloud-based technologies 24 Modern, cloud-based data systems Legacy data systems Oracle Database ksqlDB Mainframes Applications Cloud-native / SaaS apps Azure Synapse Analytics Expensive, custom-built integrations Expensive, custom-built integrations Expensive, custom-built integrations Source Connectors Expensive, custom-built integrations Expensive, custom-built integrations Sink Connectors
  23. 23. Confluent the central nervous system of data 25
  24. 24. Confluent Cloud Fully Managed Connectors ● Limited set of the larger Connector Catalogue ● Elastic scaling with no infrastructure to manage ● Connector networking configuration dependent on your clusters networking ● Limited configuration options ● Stable Source IPs are Available for certain connectors
  25. 25. Proposed Architecture NOSQL DB
  26. 26. Three Phase Plan Modernize your Database with Confluent 28 1. Migrate ● Choose the workloads that you’d like to migrate to the cloud ● Seamlessly integrate your data source via managed Confluent source connectors 2. Optimize ● Perform real-time data transformations using ksqlDB ● Find the most useful queries for your cloud data ● Work with our ecosystem of partners to find the best use of your data 3. Modernize ● Use our managed sink connectors to send data into your cloud database of choice ● Continue migrating workloads into the cloud as chances arise
  27. 27. Cloud-native, Complete, Everywhere with Kafka at its core Infinite Storage Security & Data Governance ksqlDB & Stream Processing, Analytics Connectors APIs, UIs, CLIs Fully Managed ‘NoOps’ on AWS, Azure, GCP 29
  28. 28. Resources https://github.com/confluentinc/demo-database-modernization https://www.confluent.io/blog/real-time-cdc-pipelines-with-oracle-on-gke-using-co nfluent-connector/?utm_source=linkedin&utm_medium=organicsocial&utm_campa ign=tm.devx_ch.bp_building-a-real-time-data-pipeline-with-oracle-cdc-and-marklogi c-using-cfk-and-cloud_content.pipelines
  29. 29. Thank You Stay in Touch Geetha Anne geethaanne.sjsu@gmail.com Geethaay github.com/GeethaAnne www.linkedin.com/in/geetha-anne-8646011a/

×