This document discusses Redis, MongoDB, and Amazon DynamoDB. It begins with an overview of NoSQL databases and the differences between SQL and NoSQL databases. It then covers Redis data types like strings, hashes, lists, sets, sorted sets, and streams. Examples use cases for Redis are also provided like leaderboards, geospatial queries, and message queues. The document also discusses MongoDB design patterns like embedding data, embracing duplication, and relationships. Finally, it provides a high-level overview of DynamoDB concepts like tables, items, attributes, and primary keys.
This document provides an overview of Docker concepts including containers, images, Dockerfiles, and the Docker architecture. It defines key Docker terms like images, containers, and registries. It explains how Docker utilizes Linux kernel features like namespaces and control groups to isolate containers. It demonstrates how to run a simple Docker container and view logs. It also describes the anatomy of a Dockerfile and common Dockerfile instructions like FROM, RUN, COPY, ENV etc. Finally, it illustrates how Docker works by interacting with the Docker daemon, client and Docker Hub registry to build, run and distribute container images.
Building Cloud-Native App Series - Part 7 of 11
Microservices Architecture Series
Containers Docker Kind Kubernetes Istio
- Pods
- ReplicaSet
- Deployment (Canary, Blue-Green)
- Ingress
- Service
Building Cloud-Native App Series - Part 2 of 11
Microservices Architecture Series
Event Sourcing & CQRS,
Kafka, Rabbit MQ
Case Studies (E-Commerce App, Movie Streaming, Ticket Booking, Restaurant, Hospital Management)
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable.
Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.
This session explores different architectures to build serverless Apache Kafka and Apache Spark multi-cloud architectures across regions and continents.
We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data Data Lakehouse.
Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.
Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
The document provides an overview of microservices architecture. It discusses key characteristics of microservices such as each service focusing on a specific business capability, decentralized governance and data management, and infrastructure automation. It also compares microservices to monolithic and SOA architectures. Some design styles enabled by microservices like domain-driven design, event sourcing, and functional reactive programming are also covered at a high level. The document aims to introduce attendees to microservices concepts and architectures.
This document provides an overview of Docker concepts including containers, images, Dockerfiles, and the Docker architecture. It defines key Docker terms like images, containers, and registries. It explains how Docker utilizes Linux kernel features like namespaces and control groups to isolate containers. It demonstrates how to run a simple Docker container and view logs. It also describes the anatomy of a Dockerfile and common Dockerfile instructions like FROM, RUN, COPY, ENV etc. Finally, it illustrates how Docker works by interacting with the Docker daemon, client and Docker Hub registry to build, run and distribute container images.
Building Cloud-Native App Series - Part 7 of 11
Microservices Architecture Series
Containers Docker Kind Kubernetes Istio
- Pods
- ReplicaSet
- Deployment (Canary, Blue-Green)
- Ingress
- Service
Building Cloud-Native App Series - Part 2 of 11
Microservices Architecture Series
Event Sourcing & CQRS,
Kafka, Rabbit MQ
Case Studies (E-Commerce App, Movie Streaming, Ticket Booking, Restaurant, Hospital Management)
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable.
Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.
This session explores different architectures to build serverless Apache Kafka and Apache Spark multi-cloud architectures across regions and continents.
We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data Data Lakehouse.
Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.
Building Cloud-Native App Series - Part 11 of 11
Microservices Architecture Series
Service Mesh - Observability
- Zipkin
- Prometheus
- Grafana
- Kiali
The document provides an overview of microservices architecture. It discusses key characteristics of microservices such as each service focusing on a specific business capability, decentralized governance and data management, and infrastructure automation. It also compares microservices to monolithic and SOA architectures. Some design styles enabled by microservices like domain-driven design, event sourcing, and functional reactive programming are also covered at a high level. The document aims to introduce attendees to microservices concepts and architectures.
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
This document discusses OpenShift Container Platform, a platform as a service (PaaS) that provides a full development and deployment platform for applications. It allows developers to easily manage application dependencies and development environments across basic infrastructure, public clouds, and production servers. OpenShift provides container orchestration using Kubernetes along with developer tools and a user experience to support DevOps practices like continuous integration/delivery.
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
In this session, we will explore key challenges with function interactions and coordination, addressing these problems using Enterprise Integration Patterns (EIP) and modern approaches with the latest innovations from the Apache Camel community:
Apache Camel is the Swiss army knife of integration, and the most powerful integration framework. In this session you will hear about the latest features in the brand new 3rd generation.
Camel K, is a lightweight integration platform that enables Enterprise Integration Patterns to be used natively on any Kubernetes cluster. When used in combination with Knative, a framework that adds serverless building blocks to Kubernetes, and the subatomic execution environment of Quarkus, Camel K can mix serverless features such as auto-scaling, scaling to zero, and event-based communication with the outstanding integration capabilities of Apache Camel.
- Apache Camel 3
- Camel K
- Camel Quarkus
We will show how Camel K works. We’ll also use examples to demonstrate how Camel K makes it easier to connect to cloud services or enterprise applications using some of the 300 components that Camel provides.
This document discusses domain-driven design (DDD) concepts for transforming a monolithic application to microservices, including:
1. Classifying applications into areas like lift and shift, containerize, refactor, and expose APIs to prioritize high business value, low complexity projects.
2. Focusing on shorter duration projects from specifications to operations.
3. Designing around business capabilities, processes, and forming teams aligned to capabilities rather than technology.
4. Key DDD concepts like ubiquitous language, bounded contexts, and context maps to decompose the domain model into independently deployable microservices.
Architecting for the Cloud using NetflixOSS - Codemash WorkshopSudhir Tonse
This document provides an overview and agenda for a presentation on architecting for the cloud using the Netflix approach. Some key points:
- Netflix has over 40 million members streaming over 1 billion hours per month of content across over 40 countries.
- Netflix runs on AWS and handles billions of requests per day across thousands of instances globally.
- The presentation will discuss how to build your own platform as a service (PaaS) based on Netflix's open source libraries, including platform services, libraries, and tools.
- The Netflix approach focuses on microservices, automation, and resilience to support rapid iteration on cloud infrastructure.
An in depth overview of Kubernetes and it's various components.
NOTE: This is a fixed version of a previous presentation (a draft was uploaded with some errors)
This document provides an overview of OpenShift Container Platform. It describes OpenShift's architecture including containers, pods, services, routes and the master control plane. It also covers key OpenShift features like self-service administration, automation, security, logging, monitoring, networking and integration with external services.
The document discusses Microservices architecture and compares it to monolithic architecture. It covers topics like infrastructure for Microservices including API gateways, service discovery, event buses. It also discusses design principles like domain-driven design, event sourcing and CQRS. Microservices are presented as a better approach as they allow independent deployments, scale independently and use multiple programming languages compared to monolithic applications.
The document provides an overview of Red Hat OpenShift Container Platform, including:
- OpenShift provides a fully automated Kubernetes container platform for any infrastructure.
- It offers integrated services like monitoring, logging, routing, and a container registry out of the box.
- The architecture runs everything in pods on worker nodes, with masters managing the control plane using Kubernetes APIs and OpenShift services.
- Key concepts include pods, services, routes, projects, configs and secrets that enable application deployment and management.
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
This document discusses best practices for middleware and integration architecture modernization using Apache Camel. It provides an overview of Apache Camel, including what it is, how it works through routes, and the different Camel projects. It then covers trends in integration architecture like microservices, cloud native, and serverless. Key aspects of Camel K and Camel Quarkus are summarized. The document concludes with a brief discussion of the Camel Kafka Connector and pointers to additional resources.
Docker Kubernetes Istio
Understanding Docker and creating containers.
Container Orchestration based on Kubernetes
Blue Green Deployment, AB Testing, Canary Deployment, Traffic Rules based on Istio
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
This document discusses the top 5 use cases and architectures for data in motion in 2022. It describes:
1) The Kappa architecture as an alternative to the Lambda architecture that uses a single stream to handle both real-time and batch data.
2) Hyper-personalized omnichannel experiences that integrate customer data from multiple sources in real-time to provide personalized experiences across channels.
3) Multi-cloud deployments using Apache Kafka and data mesh architectures to share data across different cloud platforms.
4) Edge analytics that deploy stream processing and Kafka brokers at the edge to enable low-latency use cases and offline functionality.
5) Real-time cybersecurity applications that use streaming data
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
Moving to the cloud is hard, and moving Postgres databases to the cloud is even harder. Public cloud or private cloud? Infrastructure as a Service (IaaS), or Platform as a Service (PaaS)? Kubernetes for the application, or for the database and the application? This talk will juxtapose self-managed Kubernetes and container-based database solutions, Postgres deployments on IaaS, and Postgres DBaaS solutions of which EDB’s DBaaS BigAnimal is the latest example.
OpenShift is a Platform-as-a-Service that provides development environments on demand using containers. It automates application lifecycles including build, deploy, and retirement. OpenShift uses containers to package applications and dependencies in a portable way. Red Hat addresses concerns around adopting containers at scale through OpenShift, which provides security, scalability, integration, management and certification capabilities. OpenShift runs on a user's choice of infrastructure and orchestrates applications across nodes using Kubernetes.
Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
Building Cloud-Native App Series - Part 3 of 11
Microservices Architecture Series
AWS Kinesis Data Streams
AWS Kinesis Firehose
AWS Kinesis Data Analytics
Apache Flink - Analytics
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
This document discusses OpenShift Container Platform, a platform as a service (PaaS) that provides a full development and deployment platform for applications. It allows developers to easily manage application dependencies and development environments across basic infrastructure, public clouds, and production servers. OpenShift provides container orchestration using Kubernetes along with developer tools and a user experience to support DevOps practices like continuous integration/delivery.
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
In this session, we will explore key challenges with function interactions and coordination, addressing these problems using Enterprise Integration Patterns (EIP) and modern approaches with the latest innovations from the Apache Camel community:
Apache Camel is the Swiss army knife of integration, and the most powerful integration framework. In this session you will hear about the latest features in the brand new 3rd generation.
Camel K, is a lightweight integration platform that enables Enterprise Integration Patterns to be used natively on any Kubernetes cluster. When used in combination with Knative, a framework that adds serverless building blocks to Kubernetes, and the subatomic execution environment of Quarkus, Camel K can mix serverless features such as auto-scaling, scaling to zero, and event-based communication with the outstanding integration capabilities of Apache Camel.
- Apache Camel 3
- Camel K
- Camel Quarkus
We will show how Camel K works. We’ll also use examples to demonstrate how Camel K makes it easier to connect to cloud services or enterprise applications using some of the 300 components that Camel provides.
This document discusses domain-driven design (DDD) concepts for transforming a monolithic application to microservices, including:
1. Classifying applications into areas like lift and shift, containerize, refactor, and expose APIs to prioritize high business value, low complexity projects.
2. Focusing on shorter duration projects from specifications to operations.
3. Designing around business capabilities, processes, and forming teams aligned to capabilities rather than technology.
4. Key DDD concepts like ubiquitous language, bounded contexts, and context maps to decompose the domain model into independently deployable microservices.
Architecting for the Cloud using NetflixOSS - Codemash WorkshopSudhir Tonse
This document provides an overview and agenda for a presentation on architecting for the cloud using the Netflix approach. Some key points:
- Netflix has over 40 million members streaming over 1 billion hours per month of content across over 40 countries.
- Netflix runs on AWS and handles billions of requests per day across thousands of instances globally.
- The presentation will discuss how to build your own platform as a service (PaaS) based on Netflix's open source libraries, including platform services, libraries, and tools.
- The Netflix approach focuses on microservices, automation, and resilience to support rapid iteration on cloud infrastructure.
An in depth overview of Kubernetes and it's various components.
NOTE: This is a fixed version of a previous presentation (a draft was uploaded with some errors)
This document provides an overview of OpenShift Container Platform. It describes OpenShift's architecture including containers, pods, services, routes and the master control plane. It also covers key OpenShift features like self-service administration, automation, security, logging, monitoring, networking and integration with external services.
The document discusses Microservices architecture and compares it to monolithic architecture. It covers topics like infrastructure for Microservices including API gateways, service discovery, event buses. It also discusses design principles like domain-driven design, event sourcing and CQRS. Microservices are presented as a better approach as they allow independent deployments, scale independently and use multiple programming languages compared to monolithic applications.
The document provides an overview of Red Hat OpenShift Container Platform, including:
- OpenShift provides a fully automated Kubernetes container platform for any infrastructure.
- It offers integrated services like monitoring, logging, routing, and a container registry out of the box.
- The architecture runs everything in pods on worker nodes, with masters managing the control plane using Kubernetes APIs and OpenShift services.
- Key concepts include pods, services, routes, projects, configs and secrets that enable application deployment and management.
Best Practices for Middleware and Integration Architecture Modernization with...Claus Ibsen
This document discusses best practices for middleware and integration architecture modernization using Apache Camel. It provides an overview of Apache Camel, including what it is, how it works through routes, and the different Camel projects. It then covers trends in integration architecture like microservices, cloud native, and serverless. Key aspects of Camel K and Camel Quarkus are summarized. The document concludes with a brief discussion of the Camel Kafka Connector and pointers to additional resources.
Docker Kubernetes Istio
Understanding Docker and creating containers.
Container Orchestration based on Kubernetes
Blue Green Deployment, AB Testing, Canary Deployment, Traffic Rules based on Istio
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
This document discusses the top 5 use cases and architectures for data in motion in 2022. It describes:
1) The Kappa architecture as an alternative to the Lambda architecture that uses a single stream to handle both real-time and batch data.
2) Hyper-personalized omnichannel experiences that integrate customer data from multiple sources in real-time to provide personalized experiences across channels.
3) Multi-cloud deployments using Apache Kafka and data mesh architectures to share data across different cloud platforms.
4) Edge analytics that deploy stream processing and Kafka brokers at the edge to enable low-latency use cases and offline functionality.
5) Real-time cybersecurity applications that use streaming data
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
Moving to the cloud is hard, and moving Postgres databases to the cloud is even harder. Public cloud or private cloud? Infrastructure as a Service (IaaS), or Platform as a Service (PaaS)? Kubernetes for the application, or for the database and the application? This talk will juxtapose self-managed Kubernetes and container-based database solutions, Postgres deployments on IaaS, and Postgres DBaaS solutions of which EDB’s DBaaS BigAnimal is the latest example.
OpenShift is a Platform-as-a-Service that provides development environments on demand using containers. It automates application lifecycles including build, deploy, and retirement. OpenShift uses containers to package applications and dependencies in a portable way. Red Hat addresses concerns around adopting containers at scale through OpenShift, which provides security, scalability, integration, management and certification capabilities. OpenShift runs on a user's choice of infrastructure and orchestrates applications across nodes using Kubernetes.
Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
Learn how AWS can help you process and make better use of your data with meaningful insights.
Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
Learn about real time data processing with Amazon Kinesis.
NoSQL Application Development with JSON and MapR-DBMapR Technologies
NoSQL databases are being used everywhere by startups and Global 2000 companies alike for data environments that require cost-effective scaling. These environments also typically need to represent data in a more flexible way than is practical with relational databases.
This document discusses Spark Streaming and its use for near real-time ETL. It provides an overview of Spark Streaming, how it works internally using receivers and workers to process streaming data, and an example use case of building a recommender system to find matches using both batch and streaming data. Key points covered include the streaming execution model, handling data receipt and job scheduling, and potential issues around data loss and (de)serialization.
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.
Speakers:
Ian Meyers, AWS Solutions Architect
Toby Moore, Chief Technology Officer, Space Ape
Amazon Redshift는 속도가 빠른 페타바이트 규모의 완전관리형 데이터 웨어하우스로, 간편하고 비용 효율적으로 모든 데이터를 기존 비즈니스 인텔리전스 도구를 사용하여 분석할 수 있게 해줍니다. 이 강연에서는 RedShift를 활용해 데이터 웨어하우스를 구축하고 데이터를 분석할 때의 모범사례과 다양한 고려사항에 대해 알아보고, Amazon S3에 있는 엑사바이트 규모의 데이터에 대해 복잡한 쿼리를 실행할 직접 수행할 수 있는 RedShift Spectrum을 실제로 사용할 때 고려사항에 대해 함께 다룰 예정입니다.
연사: 정영준, 아마존 웹서비스 솔루션즈 아키텍트
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
Big Data is everywhere these days. But what is it and how can you use it to fuel your business? Data is as important to organizations as labour and capital, and if organizations can effectively capture, analyze, visualize and apply big data insights to their business goals, they can differentiate themselves from their competitors and outperform them in terms of operational efficiency and the bottom line.
Join this session to understand the different AWS Big Data and Analytics services such as Amazon Elastic MapReduce (Hadoop), Amazon Redshift (Data Warehouse) and Amazon Kinesis (Streaming), when to use them and how they work together.
Reasons to attend:
- Learn how AWS can help you process and make better use of your data with meaningful insights.
- Learn about Amazon Elastic MapReduce and Amazon Redshift, fully managed petabyte-scale data warehouse solutions.
- Learn about real time data processing with Amazon Kinesis.
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Databricks
In this talk, we will present how we analyze, predict, and visualize network quality data, as a spark AI use case in a telecommunications company. SK Telecom is the largest wireless telecommunications provider in South Korea with 300,000 cells and 27 million subscribers. These 300,000 cells generate data every 10 seconds, the total size of which is 60TB, 120 billion records per day.
In order to address previous problems of Spark based on HDFS, we have developed a new data store for SparkSQL consisting of Redis and RocksDB that allows us to distribute and store these data in real time and analyze it right away, We were not satisfied with being able to analyze network quality in real-time, we tried to predict network quality in near future in order to quickly detect and recover network device failures, by designing network signal pattern-aware DNN model and a new in-memory data pipeline from spark to tensorflow.
In addition, by integrating Apache Livy and MapboxGL to SparkSQL and our new store, we have built a geospatial visualization system that shows the current population and signal strength of 300,000 cells on the map in real time.
This document discusses NoSQL databases and when they should be used. It describes what NoSQL databases are, when to consider using one over a relational database, and introduces DynamoDB as an AWS NoSQL solution. Specific topics covered include the differences between relational and NoSQL data models, common use cases for NoSQL databases, and how to access and query DynamoDB tables.
An overview of the Amazon ElastiCache managed service, with examples of how it can be used to increase performance, lower costs and augment other database services and databases to make things faster, easier and less expensive.
O Amazon Redshift é um data warehouse rápido, gerenciado e em escala de petabytes que torna mais simples e econômica a análise de todos os seus dados usando as ferramentas de inteligência de negócios de que você já dispõe. Comece aos poucos, por apenas 0,25 USD por hora, sem compromissos, e aumente a escala até petabytes por 1.000 USD por terabyte por ano, menos de um décimo do custo das soluções tradicionais. Normalmente, os clientes relatam uma compactação de 3x, que reduz seus custos para 333 USD por terabyte não compactado por ano.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
1. The document discusses using MongoDB and data lakes for enterprise data management. It outlines the current issues with relational databases and how MongoDB addresses challenges like flexibility, scalability and performance.
2. Various architectures for enterprise data management with MongoDB are presented, including using it for raw, transformed and aggregated data stores.
3. The benefits of combining MongoDB and Hadoop in a data lake are greater agility, insight from handling different data structures, scalability and low latency for real-time decisions.
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Labs
Redis should be the primary database for three key reasons:
1. It is extremely fast and scales linearly even as more nodes are added. Performance remains highly consistent across scaling tests.
2. It is highly available with techniques like quorum-based replication across nodes rather than shards, in-memory replication for faster performance, and watchdogs to monitor the cluster.
3. It uses CRDTs to provide strong eventual consistency for active-active replication across replicas in under 1 millisecond, solving the conflicts that arise much faster than other approaches.
In this talk from the Dublin Websummit 2014 AWS Technical Evangelist Danilo Poccia discusses NoSQL technology.
Includes an introduction to NoSQL DB and a discussion of when it is time to consider NoSQL.
Danilo also introduces Amazon DynamoDB as a NoSQL solution and talks through several case studies of customers that are using Amazon DynamoDB today.
Traditional data warehouses become expensive and slow down as the volume of your data grows. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it easy to analyze all of your data using existing business intelligence tools for 1/10th the traditional cost. This session will provide an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs. We’ll also cover the recently announced Redshift Spectrum, which allows you to query unstructured data directly from Amazon S3.
This document provides an agenda for a presentation on integrating Apache Cassandra and Apache Spark. The presentation will cover RDBMS vs NoSQL databases, an overview of Cassandra including data model and queries, and Spark including RDDs and running Spark on Cassandra data. Examples will be shown of performing joins between Cassandra and Spark DataFrames for both simple and complex queries.
Add Redis to Postgres to Make Your Microservices Go Boom!Dave Nielsen
Slides for talk delivered at PostgresOpen 2018 in San Francisco https://postgresql.us/events/pgopen2018/schedule/session/538-add-redis-to-postgres-to-make-your-microservice-go-boom/
In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. You¹ll also hear from Dan Wagner, CEO at Civis Analytics, as he discusses why the Civis data science platform was designed on top of Amazon Redshift and the AWS platform in order to help smart organizations bridge their data silos, build 360 degree view of their customer relationships, and identify opportunities for driving their companies forward by leveraging enormous datasets, the power of analytics, and economies of scale on the AWS platform.
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
Apache Spark 2.0 has laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data.
In this introductory part lecture and part hands-on workshop you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas:
What’s new in Spark 2.0
SparkSessions vs SparkContexts
Datasets/Dataframes and Spark SQL
Introduction to Structured Streaming concepts and APIs
Similaire à Big Data Redis Mongodb Dynamodb Sharding (20)
Building Cloud-Native App Series - Part 5 of 11
Microservices Architecture Series
Microservices Architecture,
Monolith Migration Patterns
- Strangler Fig
- Change Data Capture
- Split Table
Infrastructure Design Patterns
- API Gateway
- Service Discovery
- Load Balancer
Building Cloud-Native App Series - Part 1 of 11
Microservices Architecture Series
Design Thinking, Lean Startup, Agile (Kanban, Scrum),
User Stories, Domain-Driven Design
This document provides an overview of microservices architecture, including concepts, characteristics, infrastructure patterns, and software design patterns relevant to microservices. It discusses when microservices should be used versus monolithic architectures, considerations for sizing microservices, and examples of pioneers in microservices implementation like Netflix and Spotify. The document also covers domain-driven design concepts like bounded context that are useful for decomposing monolithic applications into microservices.
The document discusses Hyperledger Fabric, a blockchain framework. It provides an overview of why blockchain is needed to solve reconciliation issues in multi-party environments. It then summarizes key aspects of Hyperledger Fabric such as its architecture, components, and how transactions flow through the network.
1. Microservices architecture breaks down applications into small, independent services that focus on specific business capabilities. This allows services to be developed, deployed and scaled independently.
2. The key characteristics of microservices include being organized around business capabilities, independently deployable, using lightweight protocols and decentralized governance.
3. Microservices provide benefits like scalability, testability and flexibility to change technologies. However, they also add complexity and require new skills around distributed systems.
Microservices Part 4: Functional Reactive ProgrammingAraf Karsh Hamid
ReactiveX is a combination of the best ideas from the Observer pattern, the Iterator pattern, and functional programming. It combines the Observer pattern, Iterator pattern, and functional programming concepts. ReactiveX allows for asynchronous and event-based programming by using the Observer pattern to push data to observers, rather than using a synchronous pull-based approach.
HyperLedger Fabric is a blockchain framework that provides identity management, smart contracts (chaincode), privacy, and high transaction throughput. It uses a modular architecture consisting of peers, chaincode, ordering service, and certificate authority. Peers host the ledger and smart contracts, endorse and validate transactions. The ordering service orders transactions into blocks. Chaincode defines assets and transaction logic on the ledger. Channels provide isolation between different applications or groups of organizations.
The document discusses HyperLedger Fabric, a permissioned blockchain framework. It provides an overview of key Fabric concepts including its architecture, components, transaction flow, and how it differs from other blockchain platforms like Ethereum. The summary is as follows:
[1] HyperLedger Fabric is a permissioned blockchain framework that uses channels and smart contracts called chaincode to allow for private and confidential transactions between specific network members.
[2] It has a modular architecture consisting of peers that host the ledger and chaincode, an ordering service to sequence transactions into blocks, and a certificate authority for identity management.
[3] Transactions in Fabric are validated by endorsing peers running chaincode, ordered into blocks by
Distributed Transactions is a key concept for Micro Services based Apps and Saga Design Pattern helps out over here. However, developers struggle to shift their mindset from CRUD based design to Event Sourcing / CQRS concept. To solve this problem we are introducing the concept of Event Storming and Event Storming Process map.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Communications Mining Series - Zero to Hero - Session 1
Big Data Redis Mongodb Dynamodb Sharding
1. @arafkarsh arafkarsh
ARAF KARSH HAMID
Co-Founder / CTO
MetaMagic Global Inc., NJ, USA
@arafkarsh
arafkarsh
Microservice
Architecture Series
Building Cloud Native Apps
NoSQL Vs. SQL
Redis / MongoDB / DynamoDB
Scalability: Shards and Partitions
Distributed Transactions
Part 4 of 11
2. @arafkarsh arafkarsh 2
Slides are color coded based on the topic colors.
NoSQL Vs. SQL
1
Redis
MongoDB
Dynamo DB
2
Scalability
Sharding &
Partitions
3
Distributed
Transactions 4
3. @arafkarsh arafkarsh
Agile
Scrum (4-6 Weeks)
Developer Journey
Monolithic
Domain Driven Design
Event Sourcing and CQRS
Waterfall
Optional
Design
Patterns
Continuous Integration (CI)
6/12 Months
Enterprise Service Bus
Relational Database [SQL] / NoSQL
Development QA / QC Ops
3
Microservices
Domain Driven Design
Event Sourcing and CQRS
Scrum / Kanban (1-5 Days)
Mandatory
Design
Patterns
Infrastructure Design Patterns
CI
DevOps
Event Streaming / Replicated Logs
SQL NoSQL
CD
Container Orchestrator Service Mesh
6. @arafkarsh arafkarsh
NoSQL Databases
Database Type ACID Query Use Case
Couchbase
Doc Based,
Key Value
Open Source Yes N1QL
Financial Services, Inventory,
IoT
Cassandra Wide Column Open Source No CQL
Social Analytics
Retail, Messaging
Neo4J Graph
Open Source
Commercial
Yes Cypher
AI, Master Data Mgmt
Fraud Protection
Redis Key Value Open Source Yes Many languages Caching, Queuing
Mongo DB Doc Based
Open Source
Commercial
Yes JS
IoT, Feal Time Analytics
Inventory,
Amazon
Dynamo DB
Key Value
Doc based
Vendor Yes DQL
Gamming, Retail, Financial
Services
Source: https://searchdatamanagement.techtarget.com/infographic/NoSQL-database-comparison-to-help-you-choose-the-right-store.
6
7. @arafkarsh arafkarsh
SQL Vs NoSQL
SQL NoSQL
Database Type Relational Non-Relational
Schema Pre-Defined Dynamic Schema
Database Category Table Based
1. Documents
2. Key Value Stores
3. Graph Stores
4. Wide Column Stores
Queries
Complex Queries (Standard SQL for
all Relational Databases)
Need to apply Special Query language for
each type of NoSQL DB.
Hierarchical Storage Not a Good Fit Perfect
Scalability
Scales well for traditional
Applications
Scales well for Modern heavy data-oriented
Application
Query Language
SQL – Standard Language across all
the Databases
Non-Standard Query Language as each of the
NoSQL DB is different.
ACID Support Yes For some of the Database (Ex. MongoDB)
Data Size Good for traditional Applications
Handles massive amount of Data for the
Modern App requirements.
7
8. @arafkarsh arafkarsh
SQL Vs NoSQL (MongoDB)
1. In MongoDB Transactional Properties are scoped at Doc Level.
2. One or More fields can be atomically written in a Single Operation.
3. With Updates to multiple sub documents including nested arrays.
4. Any Error results in the entire operation to Roll back.
5. This is at par with Data Integrity Guarantees provided Traditional Databases.
8
9. @arafkarsh arafkarsh
Multi Table / Doc ACID Transactions
Examples – Systems of Record or Line of Business (LoB) Applications
1. Finance
1. Moving funds between Bank Accounts,
2. Payment Processing Systems
3. Trading Platforms
2. Supply Chain
• Transferring ownership of Goods & Services through Supply
Chains and Booking Systems – Ex. Adding Order and Reducing
inventory.
3. Billing System
1. Adding a Call Detail Record and then updating Monthly Plan.
Source: ACID Transactions in MongoDB
9
11. @arafkarsh arafkarsh
Redis
• Data Structures
• Design Patterns
11
2020 2019 NoSQL Database Model
1 1 Redis Key-Value, Multi Model
2 2 Amazon DynamoDB Multi Model
3 3 Microsoft Cosmos Multi Model
4 4 Memcached Key-Value
In-Memory Databases
12. @arafkarsh arafkarsh
Why do you need In-Memory Databases
12
1 Users 1 Million +
2 Data Volume Terabytes to Petabytes
3 Locality Global
4 Performance Microsecond Latency
5 Request Rate Millions Per Second
6 Access Mobile, IoT, Devices
7 Economics Pay as you go
8 Developer Access Open API
Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg
13. @arafkarsh arafkarsh
Tables / Docs (JSON) – Why Redis is different?
13
• Redis is a Multi data model Key Store
• Commands operate on Keys
• Data types of Keys can change overtime
Source: https://www.youtube.com/watch?v=ELk_W9BBTDU
14. @arafkarsh arafkarsh
Keys, Values & Data Types
14
movie:StarWars “Sold Out”
Key Name Value
String
Hash
List
Set
Sorted Set
Basic Data Types
Key Properties
• Unique
• Binary Safe (Case Sensitive)
• Max Size = 512 MB
Expiration / TTL
• By Default – Keys are retained
• Time in Seconds, Milli Second, Unix Epoch
• Added / Removed from Key SET movie:StarWars ex 5000 (Expires in 5000 seconds)
PEXPIRE movie:StarWars 5 (set for 5 milli seconds)
https://redis.io/commands/set
15. @arafkarsh arafkarsh
Redis – Remote Dictionary Server
15
Distributed In-Memory Data Store
String Standard String data
Hash { A: “John Doe”, B: “New York”, C:USA” }
List [ A -> B -> C -> D. -> E ]
Set { A, B, C, D, E }
Sorted Set { A:10, B:12, C:14:, D:20, E:32 }
Stream … msg1, msg2, msg3
Pub / Sub … msg1, msg2, msg3
https://redis.io/topics/data-types
16. @arafkarsh arafkarsh
Data Type: Hash
16
movie:The-Force-Awakens
Value
J. J. Abrams
L. Kasdan, J. J. Abrams, M. Arndt
Dan Mindel
HGET movie:The-Force-Awakens Director
“J. J. Abrams”
• Field & Value Pairs
• Single Level
• Add and Remove Fields
• Set Operations
• Intersect
• Union
https://redis.io/topics/data-types
https://redis.io/commands#hash
Key Name
Director
Writer
Cinematography
Field
Use Cases
• Session Cache
• Rate Limiting
17. @arafkarsh arafkarsh
Data Type: List
17
movies
Key Name
“Force Awakens, The” “Last Jedi, The” “Rise of Skywalker, The”
LPOP movies
“Force Awakens, The”
LPOP movies
“Last Jedi, The”
RPOP movies
“Rise of Skywalker, The”
RPOP movies
“Last Jedi, The”
• Ordered List (FIFO or LIFO)
• Duplicates Allowed
• Elements added from Left or Right or By Position
• Max 4 Billion elements per List
Type of Lists
• Queues
• Stacks
• Capped List
https://redis.io/topics/data-types
https://redis.io/commands#list
Use Cases
• Communication
• Activity List
18. @arafkarsh arafkarsh
Data Type: Set
18
movies
Member / Element
“Force Awakens, The”
“Last Jedi, The”
“Rise of Skywalker, The”
SMEMBERS movies
“Force Awakens, The”
“Last Jedi, The”
“Rise of Skywalker, The”
• Un-Ordered List of Unique
Elements
• Set Operations
• Difference
• Intersect
• Union
https://redis.io/topics/data-types
https://redis.io/commands#set
Key Name
Use Cases
• Unique Visitors
19. @arafkarsh arafkarsh
Data Type: Sorted Set
19
movies
Value
“Force Awakens, The”
“Last Jedi, The”
“Rise of Skywalker, The”
ZRANGE movies 0 1
“Last Jedi, The”
“Rise of Skywalker, The”
• Ordered List of Unique
Elements
• Set Operations
• Intersect
• Union
https://redis.io/topics/data-types
https://redis.io/commands#set
Key Name
3
1
2
Score
Use Cases
• Leaderboard
• Priority Queues
20. @arafkarsh arafkarsh
Redis: Transactions
20
• Transactions are
• Atomic
• Isolated
• Redis commands are
queue
• All the Queued commands
are executed sequentially
as an Atomic unit
MULTI
SET movie:The-Force-Awakens:Review Good
INCR movie:The-Force-Awakens:Rating
EXEC
21. @arafkarsh arafkarsh
Redis In-Memory Data Store Use cases
21
Machine
Learning
Message
Queues
Gaming
Leaderboards
Geospatial
Session
Store
Media
Streaming
Real-time
Analytics
Caching
22. @arafkarsh arafkarsh
Use Case: Sorted Set – Leader Board
22
• Collection of Sorted Distinct
Entities
• Set Operations and Range
Queries based on Score
value: John
score: 610
value : Jane
score: 987
value : Sarah
score: 1597
value : Maya
score: 144
value : Fred
score: 233
value : Ann
score: 377
Game Scores
ZADD game:1 987 Jane 1597 Sarah 377 Maya 610 John 144
Ann 233 Fred
ZREVRANGE game:1 0 3 WITHSCORES. (Get top 4 Scores)
• Sarah 1597
• Jane 987
• John 610
• Ann 377
Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg
https://redis.io/commands/zadd
23. @arafkarsh arafkarsh
Use Case: Geospatial
23
• Compute distance between
members
• Find all members within a
radius
Source: AWS re:Invent 2020: https://www.youtube.com/watch?v=2WkJeofqIJg
GEOADD cities 87.6298 41.8781 Chicago
GEOADD cities 122.3321 447.6062 Seattle
ZRANGE cities0 -1
• “Chicago”
• “Seattle”
GEODIST cities Chicago Seattle mi
• “1733.4089”
GEORADIUS cities 122.4194 37..7749
1000 mi WITHDIST
• “Seattle”
• “679.4848”
o m for meters
o km for
kilometres
o mi for miles
o ft for feet
https://redis.io/commands/geodist
26. @arafkarsh arafkarsh
MongoDB Docs – Prefer Embedding
Use
Structure
to use
Data
within a
Document
Include
Bounded
Arrays to
have
multiple
records
26
27. @arafkarsh arafkarsh
MongoDB Docs – Embrace Duplication
Field Info
Duplicated
from
Customer
Profile
Address
Duplicated
from
Customer
Profile
27
28. @arafkarsh arafkarsh
Know When Not to Embed
As Item is used outside
of Order, You don’t
need to embed the
whole Object here.
Instead give the Item
Reference ID.
(Not to Embed)
Name is given to
decouple it from Item
(Product) Service.
(Embrace Duplication)
28
30. @arafkarsh arafkarsh
MongoDB – Tips & Best Practices
1. MongoDB Will Abort any Multi Document transaction that runs for more
than 60 seconds.
2. No More than 1000 documents should be modified within a
Transaction.
3. Developers need to try logic to retry the transaction in case transaction
is aborted due to network error.
4. Transactions that affects Multiple Shards incur a greater performance
Cost as operations are coordinated across multiple participating nodes
over the network.
5. Performance will be impacted if a transaction runs against a collection
that is subject to rebalancing.
30
32. @arafkarsh arafkarsh
Amazon DynamoDB Concept
Customer ID Name Category State
Order
Order
Customer
Cart
Payments
Order
Cart
Catalogue
Catalogue
Table
Product ID Name Value Description Image
Item ID Quantity Value Currency
User ID + Item ID
Attributes
1. A single Table holds multiple Entities (Customer, Catalogue, Cart, Order
etc.) aka Items.
2. Item contains a collection of Attributes.
3. Primary Key plays a key role in Performance, Scalability and avoiding Joins
(in a typical RDBMS way).
4. Primary Key contains a Partition Key and an option Sort Key.
5. Item Data Model is JSON, and Attribute can be a field or a Custom Object.
Items
Primary Key
33. @arafkarsh arafkarsh
DynamoDB – Under the Hood
One Single table Multiple Entities with multiple documents (Records in RDBMS style)
1 Org Record
2 Employee Record
1 Org Record
2 Employee Record
1. DynamoDB Structure is JSON (Document Model) – However, it has no resemblance to MongoDB in terms DB
implementation or Schema Design Patterns.
2. Multiple Entities are part of the Single Table and this helps to avoid expensive joins. For Ex. PK = ORG#Magna
will retrieve all the 3 records. 1 Record from Org Entity and 2 Records from Employee Entity.
3. Partition Key helps in Sharding and Horizontal Scalability.
36. @arafkarsh arafkarsh
App Scalability based
on micro services
architecture
Source: The NewStack. Based on the Art of Scalability by By Martin Abbot
& Michael Fisher
36
38. @arafkarsh arafkarsh
Scalability Best Practices : Lessons from
Best Practices Highlights
#1 Partition By Function
• Decouple the Unrelated Functionalities.
• Selling functionality is served by one set of applications, bidding by another, search by yet another.
• 16,000 App Servers in 220 different pools
• 1000 logical databases, 400 physical hosts
#2 Split Horizontally
• Break the workload into manageable units.
• eBay’s interactions are stateless by design
• All App Servers are treated equal and none retains any transactional state
• Data Partitioning based on specific requirements
#3
Avoid Distributed
Transactions
• 2 Phase Commit is a pessimistic approach comes with a big COST
• CAP Theorem (Consistency, Availability, Partition Tolerance). Apply any two at any point in time.
• @ eBay No Distributed Transactions of any kind and NO 2 Phase Commit.
#4
Decouple Functions
Asynchronously
• If Component A calls component B synchronously, then they are tightly coupled. For such systems to
scale A you need to scale B also.
• If Asynchronous A can move forward irrespective of the state of B
• SEDA (Staged Event Driven Architecture)
#5
Move Processing to
Asynchronous Flow
• Move as much processing towards Asynchronous side
• Anything that can wait should wait
#6 Virtualize at All Levels • Virtualize everything. eBay created their on O/R layer for abstraction
#7 Cache Appropriately • Cache Slow changing, read-mostly data, meta data, configuration and static data.
Source: http://www.infoq.com/articles/ebay-scalability-best-practices
38
40. @arafkarsh arafkarsh
CAP Theorem by Eric Allen Brewer
Pick Any 2!! Say NO to 2 Phase Commit
Source: https://en.wikipedia.org/wiki/CAP_theorem | http://en.wikipedia.org/wiki/Eric_Brewer_(scientist)
CAP 12 years later: How the “Rules have changed”
“In a network subject to communication failures, it is
impossible for any web service to implement an atomic
read / write shared memory that guarantees a response
to every request.”
Partition Tolerance
The system continues to operate despite an arbitrary
number of messages being dropped (or delayed) by
the network between nodes.
Consistency
Every read receives the
most recent write or an
error.
Availability
Every request receives a (non-error) response – without
guarantee that it contains the most recent write.
40
41. @arafkarsh arafkarsh
Sharding / Partitioning
Method Scalability Table
Sharding Horizontal Rows Same Schema with
Uniq Rows
Sharding Vertical Columns Different Schema
Partition Vertical Rows Same Schema with
Uniq Rows
1. Optimize the Database
2. Separate Rows or Columns into multiple smaller tables
3. Each table has either Same Schema with Unique Rows
4. Or has a Schema that is subset of the Original
Customer ID Customer
Name
DOB City
1 ABC Bengaluru
2 DEF Tokyo
3 JHI Kochi
4 KLM Pune
Original Table
Customer ID Customer
Name
DOB City
1 ABC Bengaluru
2 DEF Tokyo
Customer ID Customer
Name
DOB City
3 JHI Kochi
4 KLM Pune
Horizontal Sharding - 1
Horizontal Sharding - 2
Customer ID Customer
Name
DOB
1 ABC
2 DEF
3 JHI
4 KLM
Customer ID City
1 Bengaluru
2 Tokyo
3 Kochi
4 Pune
Vertical Sharding - 1 Vertical Sharding - 2
41
42. @arafkarsh arafkarsh
Sharding Scenarios
1. Horizontal Scaling: Single Server is unable to handle the load
even after partitioning the data sets.
2. Data can be partitioned in such a way that specific server(s)
can serve the search query based on the partition. For Ex. In
an e-Commerce Application Searching the data based on
1. Product Type
2. Product Brand
3. Sellers Region (for Local Shipping)
4. Orders based on Year / Months
42
43. @arafkarsh arafkarsh
Geo Partitioning
• Geo-partitioning is the ability to control the location of
data at the row level.
• CockroachDB lets you control which tables are replicated
to which nodes. But with geo-partitioning, you can control
which nodes house data with row-level granularity.
• This allows you to keep customer data close to the user,
which reduces the distance it needs to travel, thereby
reducing latency and improving user experience.
Source: https://www.cockroachlabs.com/blog/geo-partition-data-reduce-latency/
43
45. @arafkarsh arafkarsh
Oracle Sharding and Geo
CREATE SHARDED TABLE customers (
cust_id NUMBER NOT NULL ,
name VARCHAR2(50) ,
address VARCHAR2(250) ,
geo VARCHAR2(20) ,
class VARCHAR2(3) ,
signup_date DATE ,
CONSTRAINT cust_pk PRIMARY KEY(geo, cust_id) )
PARTITIONSET BY LIST (geo)
PARTITION BY CONSISTENT HASH (cust_id)
PARTITIONS AUTO (
PARTITIONSET AMERICA VALUES (‘AMERICA’) TABLESPACE SET tbs1,
PARTITIONSET ASIA VALUES (‘ASIA’) TABLESPACE SET tbs2
);
Primary
Shard
Standby
Shards
Read / Write
Tx / Second
Read Only
Tx / Second
25 25 1.18 Million 1.62 Million
50 50 2.11 Million 3.26 Million
75 75 3.57 Million 5.05 Million
100 100 4.38 Million 6.82 Million
Linear Scalability
Source: https://www.oracle.com/a/tech/docs/sharding-wp-12c.pdf
45
48. @arafkarsh arafkarsh
MongoDB Replication
Application
(Client App Driver)
Replica Set1
(mongos)
RS 2
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Replication
Replication
Heartbeat
Source: MongoDB Replication https://docs.mongodb.com/manual/replication/
Provides redundancy
High Availability.
It provides Fault
Tolerance as
multiple copies of
data on different
database servers
ensures that the loss
of a single database
server will not affect
the Application.
1. Replicate the primary's oplog and
2. Apply the operations to their data
sets such that the secondaries'
data sets reflect the primary's data
set.
3. Secondary apply the operations to
their data sets asynchronously
What Secondary does?
What Primary does?
1. Receives all write operations
mongodb://
mongodb0.example.com:27017,
mongodb1.example.com:27017,
mongodb2.example.com:27017/?
replicaSet=myRepl
Use Secure Connection
mongodb://myDBReader:D1fficultP%40ssw0rd
@mongodb0.example.com:27017
Replica Set Connection Configuration
48
49. @arafkarsh arafkarsh
MongoDB Replication: Automatic Failover
Source: MongoDB Replication https://docs.mongodb.com/manual/replication/
If the Primary is NOT reachable
more than the configured
electionTimeoutMillis (default 10
seconds) then
One of the Secondary will become
the Primary after an election
process.
Most updated Secondary will
become the next Primary.
Election should not take more
than 12 seconds to elect a Primary.
Replica Set1
(mongos)
RS 2
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Heartbeat
Election for new Primary
Replica Set1
(mongos)
Primary
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Heartbeat
Election for new Primary
Replication
The write Operations will be blocked until the new Primary is selected.
The Secondary Replica Set can serve the Read Operations while the election is in progress provided its configured for that.
MongoDB 4.2+ compatible drivers enable retryable writes by default
MongoDB 4.0 and 3.6-compatible drivers must explicitly enable retryable writes by including retryWrites=true in the connection
string.
49
50. @arafkarsh arafkarsh
MongoDB Replication: Arbiter
Application
(Client App Driver)
Replica Set1
(mongos)
RS 2
(mongos)
Arbiter
(mongos)
Secondary Servers
Primary Server
Replication
An Arbiter can be used to save the
cost of adding an additional
Secondary Server.
Arbiter will handle only the election
process to select a Primary.
Source: MongoDB Replication https://docs.mongodb.com/manual/replication/
50
51. @arafkarsh arafkarsh
MongoDB Replication: Secondary Reads
Replica Set1
(mongos)
RS 2
(mongos)
RS 3
(mongos)
Secondary Servers
Primary Server
Replication
Replication
Heartbeat
Source: MongoDB Replication https://docs.mongodb.com/manual/core/read-preference/
Asynchronous replication to secondaries means
that reads from secondaries may return data that
does not reflect the state of the data on the
primary.
Multi-document transactions that contain read
operations must use read preference primary. All
operations in a given transaction must route to
the same member.
Write to Primary and Read from Secondary
Application
(Client App Driver)
Read from
the
Secondary
Write
mongo ‘mongodb://mongodb0,mongodb1,mongodb2/?replicaSet=rsOmega&readPreference=secondary’
$ >
51
52. @arafkarsh arafkarsh
MongoDB – Deploy Replica Set
mongod --replSet “rsOmega” --bind_ip localhost,<hostname(s)|ip address(es)>
$ >
replication:
replSetName: "rsOmega"
net:
bindIp: localhost,<hostname(s)|ip address(es)>
Config File
mongod --config <path-to-replica-config>
$ >
Use Config file to set the Replica Config to each Mongo Instance
Use Command Line to set Replica details to each Mongo Instance
1
Source: MongoDB Replication https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
52
53. @arafkarsh arafkarsh
MongoDB – Deploy Replica Set
mongo
$ >
Initiate the Replica Set
Connect to Mongo DB
2
> rs.initiate( {
_id : "rsOmega",
members: [
{ _id: 0, host: "mongodb0.host.com:27017" },
{ _id: 1, host: "mongodb1.host.com :27017" },
{ _id: 2, host: "mongodb2.host.com :27017" }
]
})
3
Run rs.initiate() on just one and only one mongod instance for
the replica set.
Source: MongoDB Replication https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
53
54. @arafkarsh arafkarsh
MongoDB – Deploy Replica Set
mongo ‘mongodb://mongodb0,mongodb1,mongodb2/?replicaSet=rsOmega’
$ >
> rs.conf()
Show Config
Show the Replica Config
4
Source: MongoDB Replication
https://docs.mongodb.com/manual/tutorial/deploy-replica-set/
> rs.status()
5
Ensure that the replica set has a primary
mongo
$ >
6
Connect to the Replica Set
54
57. @arafkarsh arafkarsh
Distributed Transactions : 2 Phase Commit
2 PC or not 2 PC, Wherefore Art Thou XA?
57
How does 2PC impact scalability?
• Transactions are committed in two phases.
• This involves communicating with every database (XA
Resources) involved to determine if the transaction will commit
in the first phase.
• During the second phase each database is asked to complete
the commit.
• While all of this coordination is going on, locks in all of the data
sources are being held.
• The longer duration locks create the risk of higher contention.
• Additionally, the two phases require more database
processing time than a single phase commit.
• The result is lower overall TPS in the system.
Transaction
Manager
XA Resources
Request to Prepare
Commit
Prepared
Prepare
Phase
Commit
Phase
Done
Source : Pat Helland (Amazon) : Life Beyond Distributed Transactions Distributed Computing : http://dancres.github.io/Pages/
Solution : Resilient System
• Event Based
• Design for failure
• Asynchronous Recovery
• Make all operations idempotent.
• Each DB operation is a 1 PC
58. @arafkarsh arafkarsh
Distributed Tx: SAGA Design Pattern instead of 2PC
58
Long Lived Transactions (LLTs) hold on to DB resources for relatively long periods of time, significantly delaying
the termination of shorter and more common transactions.
Source: SAGAS (1987) Hector Garcia Molina / Kenneth Salem,
Dept. of Computer Science, Princeton University, NJ, USA
T1 T2 Tn
Local Transactions
C1 C2 Cn-1
Compensating Transaction
Divide long–lived, distributed transactions into quick local ones with compensating actions for
recovery.
Travel : Flight Ticket & Hotel Booking Example
BASE (Basic Availability, Soft
State, Eventual Consistency)
Room Reserved
T1
Room Payment
T2
Seat Reserved
T3
Ticket Payment
T4
Cancelled Room Reservation
C1
Cancelled Room Payment
C2
Cancelled Ticket Reservation
C3
59. @arafkarsh arafkarsh
SAGA Design Pattern Features
59
1. Backward Recovery (Rollback)
T1 T2 T3 T4 C3 C2 C1
Order Processing, Banking
Transactions, Ticket Booking
Examples
Updating individual scores in
a Team Game.
2. Forward Recovery with Save Points
T1 (sp) T2 (sp) T3 (sp)
• To recover from Hardware Failures, SAGA needs to be persistent.
• Save Points are available for both Forward and Backward Recovery.
Type
Source: SAGAS (1987) Hector Garcia Molina / Kenneth Salem, Dept. of Computer Science, Princeton University, NJ, USA
60. @arafkarsh arafkarsh
Handling Invariants – Monolithic to Micro Services
60
In a typical Monolithic App
Customer Credit Limit info and
the order processing is part of
the same App. Following is a
typical pseudo code.
Order Created
T1
Order
Microservice
Credit Reserved
T2
Customer
Microservice
In Micro Services world with Event Sourcing, it’s a
distributed environment. The order is cancelled if
the Credit is NOT available. If the Payment
Processing is failed then the Credit Reserved is
cancelled.
Payment
Microservice
Payment Processed
T3
Order Cancelled
C1
Credit Cancelled due to
payment failure
C2
Begin Transaction
If Order Value <= Available
Credit
Process Order
Process Payments
End Transaction
Monolithic 2 Phase Commit
https://en.wikipedia.org/wiki/Invariant_(computer_science)
61. @arafkarsh arafkarsh 61
Use Case : Restaurant – Forward Recovery
Domain
The example focus on a
concept of a Restaurant
which tracks the visit of
an individual or group
to the Restaurant. When
people arrive at the
Restaurant and take a
table, a table is opened.
They may then order
drinks and food. Drinks
are served immediately
by the table staff,
however food must be
cooked by a chef. Once
the chef prepared the
food it can then be
served.
Payment
Billing
Dining
Source: http://cqrs.nu/tutorial/cs/01-design
Soda Cancelled
Table Opened
Juice Ordered
Soda Ordered
Appetizer Ordered
Soup Ordered
Food Ordered
Juice Served
Food Prepared
Food Served
Appetizer Served
Table Closed
Aggregate Root : Dinning Order
Billed Order
T1
Payment CC
T2
Payment Cash
T3
T1 (sp) T2 (sp) T3 (sp)
Event Stream
Aggregate Root : Food Bill
Transaction doesn't rollback if one payment
method is failed. It moves forward to the
NEXT one.
sp
Network
Error
C1 sp
62. @arafkarsh arafkarsh
Local SAGA Features
1. Part of the Micro Services
2. Local Transactions and Compensation
Transactions
3. SAGA State is persisted
4. All the Local transactions are based on
Single Phase Commit (1 PC)
5. Developers need to ensure that
appropriate compensating
transactions are Raised in the event of
a failure.
API Examples
@StartSaga(name=“HotelBooking”)
public void reserveRoom(…) {
}
@EndSaga(name=“HotelBooking”)
public void payForTickets(…) {
}
@AbortSaga(name=“HotelBooking”)
public void cancelBooking(…) {
}
@CompensationTx()
public void cancelReservation(…) {
}
62
63. @arafkarsh arafkarsh
SAGA Execution Container
1. SEC is a separate Process
2. Stateless in nature and Saga state is stored in a
messaging system (Kafka is a Good choice).
3. SEC process failure MUST not affect Saga Execution as
the restart of the SEC must start from where the Saga
left.
4. SEC – No Single Point of Failure (Master Slave Model).
5. Distributed SAGA Rules are defined using a DSL.
63
64. @arafkarsh arafkarsh
Use Case : Travel Booking – Distributed Saga (SEC)
Hotel Booking
Car Booking
Flight Booking
Saga
Execution
Container
Start Saga
{Booking Request}
Payment
End
Saga
Start
Saga
Start Hotel
End Hotel
Start Car
End Car
Start Flight
End Flight
Start Payment
End Payment
Saga Log
End Saga
{Booking Confirmed}
SEC knows the structure of the
distributed Saga and for each
of the Request Which Service
needs to be called and what
kind of Recovery mechanism it
needs to be followed.
SEC can parallelize the calls
to multiple services to
improve the performance.
The Rollback or Roll forward
will be dependent on the
business case.
Source: Distributed Sagas By Catitie McCaffrey, June 6, 2017
64
65. @arafkarsh arafkarsh
Use Case : Travel Booking – Rollback
Hotel Booking
Car Booking
Flight Booking
Saga
Execution
Container
Start Saga
{Booking Request}
Payment
Start
Comp
Saga
End
Comp
Saga
Start Hotel
End Hotel
Start Car
Abort Car
Cancel Hotel
Cancel Flight
Saga Log
End Saga
{Booking Cancelled}
Kafka is a good choice to
implement the SEC log.
SEC is completely STATELESS in
nature. Master Slave model
can be implemented to avoid
the Single Point of Failure.
Source: Distributed Sagas By Catitie McCaffrey, June 6, 2017
65
66. @arafkarsh arafkarsh
Summary: Databases
66
1. DB Sharding / Partition
2. 2 Phase Commit
Doesn’t scale well in cloud environment
3. SAGA Design Pattern
Raise compensating events when the local transaction fails.
4. SAGA Supports Rollbacks & Roll
Forwards
Critical pattern to address distributed transactions.
67. @arafkarsh arafkarsh
Scalability Best Practices : Lessons from
Best Practices Highlights
#1 Partition By Function
• Decouple the Unrelated Functionalities.
• Selling functionality is served by one set of applications, bidding by another, search by yet another.
• 16,000 App Servers in 220 different pools
• 1000 logical databases, 400 physical hosts
#2 Split Horizontally
• Break the workload into manageable units.
• eBay’s interactions are stateless by design
• All App Servers are treated equal and none retains any transactional state
• Data Partitioning based on specific requirements
#3
Avoid Distributed
Transactions
• 2 Phase Commit is a pessimistic approach comes with a big COST
• CAP Theorem (Consistency, Availability, Partition Tolerance). Apply any two at any point in time.
• @ eBay No Distributed Transactions of any kind and NO 2 Phase Commit.
#4
Decouple Functions
Asynchronously
• If Component A calls component B synchronously, then they are tightly coupled. For such systems to
scale A you need to scale B also.
• If Asynchronous A can move forward irrespective of the state of B
• SEDA (Staged Event Driven Architecture)
#5
Move Processing to
Asynchronous Flow
• Move as much processing towards Asynchronous side
• Anything that can wait should wait
#6 Virtualize at All Levels • Virtualize everything. eBay created their on O/R layer for abstraction
#7 Cache Appropriately • Cache Slow changing, read-mostly data, meta data, configuration and static data.
Source: http://www.infoq.com/articles/ebay-scalability-best-practices
68. @arafkarsh arafkarsh 68
100s Microservices
1,000s Releases / Day
10,000s Virtual Machines
100K+ User actions / Second
81 M Customers Globally
1 B Time series Metrics
10 B Hours of video streaming
every quarter
Source: NetFlix: : https://www.youtube.com/watch?v=UTKIT6STSVM
10s OPs Engineers
0 NOC
0 Data Centers
So what do NetFlix think about DevOps?
No DevOps
Don’t do lot of Process / Procedures
Freedom for Developers & be Accountable
Trust people you Hire
No Controls / Silos / Walls / Fences
Ownership – You Build it, You Run it.
69. @arafkarsh arafkarsh 69
50M Paid Subscribers
100M Active Users
60 Countries
Cross Functional Team
Full, End to End ownership of features
Autonomous
1000+ Microservices
Source: https://microcph.dk/media/1024/conference-microcph-2017.pdf
1000+ Tech Employees
120+ Teams
70. @arafkarsh arafkarsh 70
Design Patterns are
solutions to general
problems that
software developers
faced during software
development.
Design Patterns
74. @arafkarsh arafkarsh
References
1. July 15, 2015 – Agile is Dead : GoTo 2015 By Dave Thomas
2. Apr 7, 2016 - Agile Project Management with Kanban | Eric Brechner | Talks at Google
3. Sep 27, 2017 - Scrum vs Kanban - Two Agile Teams Go Head-to-Head
4. Feb 17, 2019 - Lean vs Agile vs Design Thinking
5. Dec 17, 2020 - Scrum vs Kanban | Differences & Similarities Between Scrum & Kanban
6. Feb 24, 2021 - Agile Methodology Tutorial for Beginners | Jira Tutorial | Agile Methodology Explained.
Agile Methodologies
74
75. @arafkarsh arafkarsh
References
1. Vmware: What is Cloud Architecture?
2. Redhat: What is Cloud Architecture?
3. Cloud Computing Architecture
4. Cloud Adoption Essentials:
5. Google: Hybrid and Multi Cloud
6. IBM: Hybrid Cloud Architecture Intro
7. IBM: Hybrid Cloud Architecture: Part 1
8. IBM: Hybrid Cloud Architecture: Part 2
9. Cloud Computing Basics: IaaS, PaaS, SaaS
75
1. IBM: IaaS Explained
2. IBM: PaaS Explained
3. IBM: SaaS Explained
4. IBM: FaaS Explained
5. IBM: What is Hypervisor?
Cloud Architecture
76. @arafkarsh arafkarsh
References
Microservices
1. Microservices Definition by Martin Fowler
2. When to use Microservices By Martin Fowler
3. GoTo: Sep 3, 2020: When to use Microservices By Martin Fowler
4. GoTo: Feb 26, 2020: Monolith Decomposition Pattern
5. Thought Works: Microservices in a Nutshell
6. Microservices Prerequisites
7. What do you mean by Event Driven?
8. Understanding Event Driven Design Patterns for Microservices
76
77. @arafkarsh arafkarsh
References – Microservices – Videos
77
1. Martin Fowler – Micro Services : https://www.youtube.com/watch?v=2yko4TbC8cI&feature=youtu.be&t=15m53s
2. GOTO 2016 – Microservices at NetFlix Scale: Principles, Tradeoffs & Lessons Learned. By R Meshenberg
3. Mastering Chaos – A NetFlix Guide to Microservices. By Josh Evans
4. GOTO 2015 – Challenges Implementing Micro Services By Fred George
5. GOTO 2016 – From Monolith to Microservices at Zalando. By Rodrigue Scaefer
6. GOTO 2015 – Microservices @ Spotify. By Kevin Goldsmith
7. Modelling Microservices @ Spotify : https://www.youtube.com/watch?v=7XDA044tl8k
8. GOTO 2015 – DDD & Microservices: At last, Some Boundaries By Eric Evans
9. GOTO 2016 – What I wish I had known before Scaling Uber to 1000 Services. By Matt Ranney
10. DDD Europe – Tackling Complexity in the Heart of Software By Eric Evans, April 11, 2016
11. AWS re:Invent 2016 – From Monolithic to Microservices: Evolving Architecture Patterns. By Emerson L, Gilt D. Chiles
12. AWS 2017 – An overview of designing Microservices based Applications on AWS. By Peter Dalbhanjan
13. GOTO Jun, 2017 – Effective Microservices in a Data Centric World. By Randy Shoup.
14. GOTO July, 2017 – The Seven (more) Deadly Sins of Microservices. By Daniel Bryant
15. Sept, 2017 – Airbnb, From Monolith to Microservices: How to scale your Architecture. By Melanie Cubula
16. GOTO Sept, 2017 – Rethinking Microservices with Stateful Streams. By Ben Stopford.
17. GOTO 2017 – Microservices without Servers. By Glynn Bird.
78. @arafkarsh arafkarsh
References
78
Domain Driven Design
1. Oct 27, 2012 What I have learned about DDD Since the book. By Eric Evans
2. Mar 19, 2013 Domain Driven Design By Eric Evans
3. Jun 02, 2015 Applied DDD in Java EE 7 and Open Source World
4. Aug 23, 2016 Domain Driven Design the Good Parts By Jimmy Bogard
5. Sep 22, 2016 GOTO 2015 – DDD & REST Domain Driven API’s for the Web. By Oliver Gierke
6. Jan 24, 2017 Spring Developer – Developing Micro Services with Aggregates. By Chris Richardson
7. May 17. 2017 DEVOXX – The Art of Discovering Bounded Contexts. By Nick Tune
8. Dec 21, 2019 What is DDD - Eric Evans - DDD Europe 2019. By Eric Evans
9. Oct 2, 2020 - Bounded Contexts - Eric Evans - DDD Europe 2020. By. Eric Evans
10. Oct 2, 2020 - DDD By Example - Paul Rayner - DDD Europe 2020. By Paul Rayner
79. @arafkarsh arafkarsh
References
Event Sourcing and CQRS
1. IBM: Event Driven Architecture – Mar 21, 2021
2. Martin Fowler: Event Driven Architecture – GOTO 2017
3. Greg Young: A Decade of DDD, Event Sourcing & CQRS – April 11, 2016
4. Nov 13, 2014 GOTO 2014 – Event Sourcing. By Greg Young
5. Mar 22, 2016 Building Micro Services with Event Sourcing and CQRS
6. Apr 15, 2016 YOW! Nights – Event Sourcing. By Martin Fowler
7. May 08, 2017 When Micro Services Meet Event Sourcing. By Vinicius Gomes
79
80. @arafkarsh arafkarsh
References
80
Kafka
1. Understanding Kafka
2. Understanding RabbitMQ
3. IBM: Apache Kafka – Sept 18, 2020
4. Confluent: Apache Kafka Fundamentals – April 25, 2020
5. Confluent: How Kafka Works – Aug 25, 2020
6. Confluent: How to integrate Kafka into your environment – Aug 25, 2020
7. Kafka Streams – Sept 4, 2021
8. Kafka: Processing Streaming Data with KSQL – Jul 16, 2018
9. Kafka: Processing Streaming Data with KSQL – Nov 28, 2019
81. @arafkarsh arafkarsh
References
Databases: Big Data / Cloud Databases
1. Google: How to Choose the right database?
2. AWS: Choosing the right Database
3. IBM: NoSQL Vs. SQL
4. A Guide to NoSQL Databases
5. How does NoSQL Databases Work?
6. What is Better? SQL or NoSQL?
7. What is DBaaS?
8. NoSQL Concepts
9. Key Value Databases
10. Document Databases
11. Jun 29, 2012 – Google I/O 2012 - SQL vs NoSQL: Battle of the Backends
12. Feb 19, 2013 - Introduction to NoSQL • Martin Fowler • GOTO 2012
13. Jul 25, 2018 - SQL vs NoSQL or MySQL vs MongoDB
14. Oct 30, 2020 - Column vs Row Oriented Databases Explained
15. Dec 9, 2020 - How do NoSQL databases work? Simply Explained!
1. Graph Databases
2. Column Databases
3. Row Vs. Column Oriented Databases
4. Database Indexing Explained
5. MongoDB Indexing
6. AWS: DynamoDB Global Indexing
7. AWS: DynamoDB Local Indexing
8. Google Cloud Spanner
9. AWS: DynamoDB Design Patterns
10. Cloud Provider Database Comparisons
11. CockroachDB: When to use a Cloud DB?
81
82. @arafkarsh arafkarsh
References
Docker / Kubernetes / Istio
1. IBM: Virtual Machines and Containers
2. IBM: What is a Hypervisor?
3. IBM: Docker Vs. Kubernetes
4. IBM: Containerization Explained
5. IBM: Kubernetes Explained
6. IBM: Kubernetes Ingress in 5 Minutes
7. Microsoft: How Service Mesh works in Kubernetes
8. IBM: Istio Service Mesh Explained
9. IBM: Kubernetes and OpenShift
10. IBM: Kubernetes Operators
11. 10 Consideration for Kubernetes Deployments
Istio – Metrics
1. Istio – Metrics
2. Monitoring Istio Mesh with Grafana
3. Visualize your Istio Service Mesh
4. Security and Monitoring with Istio
5. Observing Services using Prometheus, Grafana, Kiali
6. Istio Cookbook: Kiali Recipe
7. Kubernetes: Open Telemetry
8. Open Telemetry
9. How Prometheus works
10. IBM: Observability vs. Monitoring
82
83. @arafkarsh arafkarsh
References
83
1. Feb 6, 2020 – An introduction to TDD
2. Aug 14, 2019 – Component Software Testing
3. May 30, 2020 – What is Component Testing?
4. Apr 23, 2013 – Component Test By Martin Fowler
5. Jan 12, 2011 – Contract Testing By Martin Fowler
6. Jan 16, 2018 – Integration Testing By Martin Fowler
7. Testing Strategies in Microservices Architecture
8. Practical Test Pyramid By Ham Vocke
Testing – TDD / BDD
84. @arafkarsh arafkarsh 84
1. Simoorg : LinkedIn’s own failure inducer framework. It was designed to be easy to extend and
most of the important components are plug‐ gable.
2. Pumba : A chaos testing and network emulation tool for Docker.
3. Chaos Lemur : Self-hostable application to randomly destroy virtual machines in a BOSH-
managed environment, as an aid to resilience testing of high-availability systems.
4. Chaos Lambda : Randomly terminate AWS ASG instances during business hours.
5. Blockade : Docker-based utility for testing network failures and partitions in distributed
applications.
6. Chaos-http-proxy : Introduces failures into HTTP requests via a proxy server.
7. Monkey-ops : Monkey-Ops is a simple service implemented in Go, which is deployed into an
OpenShift V3.X and generates some chaos within it. Monkey-Ops seeks some OpenShift
components like Pods or Deployment Configs and randomly terminates them.
8. Chaos Dingo : Chaos Dingo currently supports performing operations on Azure VMs and VMSS
deployed to an Azure Resource Manager-based resource group.
9. Tugbot : Testing in Production (TiP) framework for Docker.
Testing tools
85. @arafkarsh arafkarsh
References
CI / CD
1. What is Continuous Integration?
2. What is Continuous Delivery?
3. CI / CD Pipeline
4. What is CI / CD Pipeline?
5. CI / CD Explained
6. CI / CD Pipeline using Java Example Part 1
7. CI / CD Pipeline using Ansible Part 2
8. Declarative Pipeline vs Scripted Pipeline
9. Complete Jenkins Pipeline Tutorial
10. Common Pipeline Mistakes
11. CI / CD for a Docker Application
85
86. @arafkarsh arafkarsh
References
86
DevOps
1. IBM: What is DevOps?
2. IBM: Cloud Native DevOps Explained
3. IBM: Application Transformation
4. IBM: Virtualization Explained
5. What is DevOps? Easy Way
6. DevOps?! How to become a DevOps Engineer???
7. Amazon: https://www.youtube.com/watch?v=mBU3AJ3j1rg
8. NetFlix: https://www.youtube.com/watch?v=UTKIT6STSVM
9. DevOps and SRE: https://www.youtube.com/watch?v=uTEL8Ff1Zvk
10. SLI, SLO, SLA : https://www.youtube.com/watch?v=tEylFyxbDLE
11. DevOps and SRE : Risks and Budgets : https://www.youtube.com/watch?v=y2ILKr8kCJU
12. SRE @ Google: https://www.youtube.com/watch?v=d2wn_E1jxn4
87. @arafkarsh arafkarsh
References
87
1. Lewis, James, and Martin Fowler. “Microservices: A Definition of This New Architectural Term”, March 25, 2014.
2. Miller, Matt. “Innovate or Die: The Rise of Microservices”. e Wall Street Journal, October 5, 2015.
3. Newman, Sam. Building Microservices. O’Reilly Media, 2015.
4. Alagarasan, Vijay. “Seven Microservices Anti-patterns”, August 24, 2015.
5. Cockcroft, Adrian. “State of the Art in Microservices”, December 4, 2014.
6. Fowler, Martin. “Microservice Prerequisites”, August 28, 2014.
7. Fowler, Martin. “Microservice Tradeoffs”, July 1, 2015.
8. Humble, Jez. “Four Principles of Low-Risk Software Release”, February 16, 2012.
9. Zuul Edge Server, Ketan Gote, May 22, 2017
10. Ribbon, Hysterix using Spring Feign, Ketan Gote, May 22, 2017
11. Eureka Server with Spring Cloud, Ketan Gote, May 22, 2017
12. Apache Kafka, A Distributed Streaming Platform, Ketan Gote, May 20, 2017
13. Functional Reactive Programming, Araf Karsh Hamid, August 7, 2016
14. Enterprise Software Architectures, Araf Karsh Hamid, July 30, 2016
15. Docker and Linux Containers, Araf Karsh Hamid, April 28, 2015
88. @arafkarsh arafkarsh
References
88
16. MSDN – Microsoft https://msdn.microsoft.com/en-us/library/dn568103.aspx
17. Martin Fowler : CQRS – http://martinfowler.com/bliki/CQRS.html
18. Udi Dahan : CQRS – http://www.udidahan.com/2009/12/09/clarified-cqrs/
19. Greg Young : CQRS - https://www.youtube.com/watch?v=JHGkaShoyNs
20. Bertrand Meyer – CQS - http://en.wikipedia.org/wiki/Bertrand_Meyer
21. CQS : http://en.wikipedia.org/wiki/Command–query_separation
22. CAP Theorem : http://en.wikipedia.org/wiki/CAP_theorem
23. CAP Theorem : http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
24. CAP 12 years how the rules have changed
25. EBay Scalability Best Practices : http://www.infoq.com/articles/ebay-scalability-best-practices
26. Pat Helland (Amazon) : Life beyond distributed transactions
27. Stanford University: Rx https://www.youtube.com/watch?v=y9xudo3C1Cw
28. Princeton University: SAGAS (1987) Hector Garcia Molina / Kenneth Salem
29. Rx Observable : https://dzone.com/articles/using-rx-java-observable
Notes de l'éditeur
Distributed Transactions and Multi-Document Transactions
Starting in MongoDB 4.2, the two terms are synonymous. Distributed transactions refer to multi-document transactions on sharded clusters and replica sets. Multi-document transactions (whether on sharded clusters or replica sets) are also known as distributed transactions starting in MongoDB 4.2.
http://microservices.io/articles/scalecube.html
Sharding in the context of databases is the process of splitting very large databases into smaller parts, or shards. As experience can tell us, some statements that we issue to our database can take a consid‐ erable amount of time to execute. During these statements’ execu‐ tion, the database becomes locked and unavailable for the application. This means that we are introducing a period of down‐ time to our users.
Pat Helland (Amazon) : Life beyond distributed transactions… http://adrianmarriott.net/logosroot/papers/LifeBeyondTxns.pdf
In computer science, an invariant is a condition that can be relied upon to be true during execution of a program, or during some portion of it. It is a logical assertion that is held to always be true during a certain phase of execution