Scaling DataStax in Docker

•Télécharger en tant que PPTX, PDF•

2 j'aime•1,779 vues

DataStax

Running DataStax Enterprise in Docker Environment

Technologie

Joel Jacobson
Scaling DataStax in Docker

How it started
© DataStax, All Rights Reserved. 2
Internal project at dotCloud
Pivoted to Docker Inc.
Execution using libcontainer
Huge adoption

What is Docker?
and why is it important?

3 key concepts
© DataStax, All Rights Reserved. 4
Images
Registries
Containers

Example Dockerfile image
© DataStax, All Rights Reserved. 5

Why are containers important?
© DataStax, All Rights Reserved. 6
Speeding up application development
Better resource utilization
Mobility
Faster provisioning
Microservices

Why are containers important?
© DataStax, All Rights Reserved. 7
WEB UI BILLINGCUSTOMER
MYSQL
EXT SERVICE
DB ADAPTER
PAYMENTS
SERVICE X
SERVICE YREST API
EXT SERVICE

Why are containers important?
© DataStax, All Rights Reserved. 8
WEB UI
BILLING
REST API
CUSTOMER
REST API
CASSANDRA SPARKSOLR
PAYMENTS
REST API
SERVICE X
REST API
SERVICE Y
REST API
EXT SERVICEEXT SERVICE

Why are containers important?
© DataStax, All Rights Reserved. 9

Why are containers important?
© DataStax, All Rights Reserved. 10

Why are containers important?
© DataStax, All Rights Reserved. 12
Build once, deploy anywhere
Flexibility for sharing binaries and libraries across applications
Process of managing, maintaing and deploying turn key
Officially supported since DSE 4.8

DSE processes
© DataStax, All Rights Reserved. 13
Core DSE JVM
One or more Spark executor processes
Single Spark worker process
Multiple processes for the Hadoop stack
Ad-hoc process (Spark job server, SparkSQL, CLI etc.)
OpsCenter agent

DataStax Enterprise configuration
© DataStax, All Rights Reserved. 14
Cassandra configuration (seeds,
cluster_name etc)
Where to manage Cassandra data
Optimal JVM heap size
Optimal garbage collector

DataStax Enterprise configuration
© DataStax, All Rights Reserved. 15
Default capability limits of Docker break mlockall
Add –XX:+AlwaysPreTouch to the JVM arguments
ulimits inherited from Docker daemon
Disable swap on host OS

Networking
© DataStax, All Rights Reserved. 16
Default networking (via Linux bridge) not recommended
Instead use docker run –net=host
Use pipework or weave for consistent IP addresses

Storage
© DataStax, All Rights Reserved. 17
Everything in /var/lib/cassandra;
commitlog
saved_caches
data directories
Use supported filesystem

Storage
© DataStax, All Rights Reserved. 18
Data volumes can be shared and reused amoung containers
Changes are made directly
Changes to a volume will not be included when you update an image
Data volumes persist if container is deleted

Storage
© DataStax, All Rights Reserved. 19
docker run –v <some root dir>/<dse_image_name>-data:/data –v
<some root dir>/<dse_image_name>-conf:/conf –v <some root
dir>/<dse_image_name>-logs:/logs –d <dse_image_name>

Futures
© DataStax, All Rights Reserved. 21
Splitting up DSE processes into
separate containers
Integration with Kubernetes, Mesos
Deployment model on
public/private clouds

Summary
© DataStax, All Rights Reserved. 22
Configure OS and JVM
Map storage volumes
Avoid bridge/NAT
networking
Test. Test. Test.

Links and information
© DataStax, All Rights Reserved. 24
Datastax.com
http://www.datastax.com/wp-
content/uploads/resources/DataStax-WP-
Best_Practices_Running_DSE_Within_Docker.
pdf
github.com/joeljacobson/dse-docker
academy.datastax.com

Recommandé

Ruby Driver Explained: DataStax Webinar May 5th 2015DataStax

Guaranteeing Storage Performance by Mike Tutkowskibuildacloud

Building Scalable, Real Time Applications for Financial Services with DataStaxDataStax

CloudStack and BigDataSebastien Goasguen

Boyan Krosnov - Building a software-defined cloud - our experienceShapeBlue

Cassandra on Docker @ Walmart LabsDataStax Academy

Wido den Hollander - building highly available cloud with Ceph and CloudStackShapeBlue

Big Data on Cloud Native PlatformSunil Govindan

Recommandé

Ruby Driver Explained: DataStax Webinar May 5th 2015DataStax

Guaranteeing Storage Performance by Mike Tutkowskibuildacloud

Building Scalable, Real Time Applications for Financial Services with DataStaxDataStax

CloudStack and BigDataSebastien Goasguen

Boyan Krosnov - Building a software-defined cloud - our experienceShapeBlue

Cassandra on Docker @ Walmart LabsDataStax Academy

Wido den Hollander - building highly available cloud with Ceph and CloudStackShapeBlue

Big Data on Cloud Native PlatformSunil Govindan

mParticle's Journey to Scylla from CassandraScyllaDB

Introducing DataStax Enterprise 4.7DataStax

Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath

Build public private cloud using openstackFramgia Vietnam

Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...Cloud Native Day Tel Aviv

State of the Container EcosystemVinay Rao

Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell

Dag Sonstebo - CloudStack usage serviceShapeBlue

Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy

Paul Angus - CloudStack Container ServiceShapeBlue

Introduction to Container Storage Interface (CSI)Idan Atias

Cloudian HyperStore 'Forever Live' Storage PlatformCloudian

Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...DevOpsDays Tel Aviv

Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...OpenStack

Stratoscale Latest and GreatestZach Lanksbury

Keeping your application’s latency SLAs no matter whatScyllaDB

Design Choices for Cloud Data PlatformsAshish Mrig

KubeCon_NA_2021Alkin Tezuysal

How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.

Building a Digital BankDataStax

Cassandra and Docker Lessons LearnedDataStax Academy

Contenu connexe

Tendances

mParticle's Journey to Scylla from CassandraScyllaDB

Introducing DataStax Enterprise 4.7DataStax

Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath

Build public private cloud using openstackFramgia Vietnam

Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...Cloud Native Day Tel Aviv

State of the Container EcosystemVinay Rao

Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell

Dag Sonstebo - CloudStack usage serviceShapeBlue

Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy

Paul Angus - CloudStack Container ServiceShapeBlue

Introduction to Container Storage Interface (CSI)Idan Atias

Cloudian HyperStore 'Forever Live' Storage PlatformCloudian

Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...DevOpsDays Tel Aviv

Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...OpenStack

Stratoscale Latest and GreatestZach Lanksbury

Keeping your application’s latency SLAs no matter whatScyllaDB

Design Choices for Cloud Data PlatformsAshish Mrig

KubeCon_NA_2021Alkin Tezuysal

How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.

Tendances (20)

mParticle's Journey to Scylla from Cassandra

Introducing DataStax Enterprise 4.7

Adam Dagnall: Advanced S3 compatible storage integration in CloudStack

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...

Build public private cloud using openstack

Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...

State of the Container Ecosystem

Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS

Dag Sonstebo - CloudStack usage service

Cisco: Cassandra adoption on Cisco UCS & OpenStack

Paul Angus - CloudStack Container Service

Introduction to Container Storage Interface (CSI)

Cloudian HyperStore 'Forever Live' Storage Platform

Developing the Stratoscale System at Scale - Muli Ben-Yehuda, Stratoscale - D...

Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...

Stratoscale Latest and Greatest

Keeping your application’s latency SLAs no matter what

Design Choices for Cloud Data Platforms

KubeCon_NA_2021

How to Protect Big Data in a Containerized Environment

En vedette

Building a Digital BankDataStax

Cassandra and Docker Lessons LearnedDataStax Academy

Bucket List Item #1246Fernand Galiana

Introduction To DockerHamilton Turner

CBD NOW - First Digital Only Bank UAECommercial Bank of Dubai

Cassandra MetricsChris Lohfink

Everyday Bank: A Journey to Digital TransformationBackbase

The Journey to Digital Transformation with Touch BankBackbase

Cassandra and dockerBen Bromhead

Cassandra Tutorialmubarakss

Cassandra via-dockerChris Ballance

Designing the future bank for the digital eraPol Navarro

DataStax: Dockerizing Cassandra on Modern LinuxDataStax Academy

mBank - the most design-driven digital bank in the world - NetFinance, Miami ...Nordea

Docker Container OrchestrationFernand Galiana

Building blocks of e-commerce sitesTO THE NEW | Technology

Building a Digital Transformation RoadmapEarley Information Science

Cassandra Compression and Performance EvaluationSchubert Zhang

Developing a Roadmap for Digital TransformationJohn Sinke

Digital Transformation: What it is and how to get thereEconsultancy

En vedette (20)

Building a Digital Bank

Cassandra and Docker Lessons Learned

Bucket List Item #1246

Introduction To Docker

CBD NOW - First Digital Only Bank UAE

Cassandra Metrics

Everyday Bank: A Journey to Digital Transformation

The Journey to Digital Transformation with Touch Bank

Cassandra and docker

Cassandra Tutorial

Cassandra via-docker

Designing the future bank for the digital era

DataStax: Dockerizing Cassandra on Modern Linux

mBank - the most design-driven digital bank in the world - NetFinance, Miami ...

Docker Container Orchestration

Building blocks of e-commerce sites

Building a Digital Transformation Roadmap

Cassandra Compression and Performance Evaluation

Developing a Roadmap for Digital Transformation

Digital Transformation: What it is and how to get there

Similaire à Scaling DataStax in Docker

Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB

01282016 Aerospike-Docker webinarAerospike, Inc.

002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...Neo4j

Using Docker For DevelopmentLaura Frank Tacho

VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld

EDB Postgres with ContainersEDB

Best practices: running high-performance databases on KubernetesMariaDB plc

There's More to Docker than the Container: The Docker Platform - Kendrick Col...{code} by Dell EMC

Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres DayEDB

Five Lessons in Distributed Databasesjbellis

Hadoop Technical PresentationErwan Alliaume

DDN Product Update from SC13inside-BigData.com

ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataAltinity Ltd

OpenEBS Technical Workshop - KubeCon San Diego 2019MayaData Inc

OpenEBS CAS SDC India - 2018OpenEBS

Zero-to-Hero: Running Postgres in KubernetesEDB

Operating Kubernetes at Scale (Australia Presentation)Mesosphere Inc.

All Things Containers - Docker, Kubernetes, Helm, Istio, GitOps and moreAll Things Open

Oracle RAC and Docker: The Why and HowSeth Miller

Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain

Similaire à Scaling DataStax in Docker (20)

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS

01282016 Aerospike-Docker webinar

002 Introducing Neo4j 5 for Administrators - NODES2022 AMERICAS Beginner 2 - ...

Using Docker For Development

VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...

EDB Postgres with Containers

Best practices: running high-performance databases on Kubernetes

There's More to Docker than the Container: The Docker Platform - Kendrick Col...

Zero-to-hero: Running Postgres in Kubernetes, Enterprise Postgres Day

Five Lessons in Distributed Databases

Hadoop Technical Presentation

DDN Product Update from SC13

ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data

OpenEBS Technical Workshop - KubeCon San Diego 2019

OpenEBS CAS SDC India - 2018

Zero-to-Hero: Running Postgres in Kubernetes

Operating Kubernetes at Scale (Australia Presentation)

All Things Containers - Docker, Kubernetes, Helm, Istio, GitOps and more

Oracle RAC and Docker: The Why and How

Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...

Plus de DataStax

Is Your Enterprise Ready to Shine This Holiday Season?DataStax

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax

Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax

Best Practices for Getting to Production with DataStax Enterprise GraphDataStax

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax

Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax

Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax

Introduction to Apache Cassandra™ + What’s New in 4.0DataStax

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax

Designing a Distributed Cloud Database for DummiesDataStax

How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax

How to Evaluate Cloud Databases for eCommerceDataStax

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax

Datastax - The Architect's guide to customer experience (CX)DataStax

An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax

Plus de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Best Practices for Getting to Production with DataStax Enterprise Graph

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | Better Together: Apache Cassandra and Apache Kafka

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Introduction to Apache Cassandra™ + What’s New in 4.0

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Designing a Distributed Cloud Database for Dummies

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Evaluate Cloud Databases for eCommerce

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Datastax - The Architect's guide to customer experience (CX)

An Operational Data Layer is Critical for Transformative Banking Applications

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Dernier

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Tech Trends Report 2024 Future Today Institute.pdfhans926745

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

A Domino Admins Adventures (Engage 2024)Gabriella Davis

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Artificial Intelligence: Facts and MythsJoaquim Jorge

Partners Life - Insurer Innovation Award 2024The Digital Insurer

How to convert PDF to text with Nanonetsnaman860154

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

GenAI Risks & Security Meetup 01052024.pdflior mazor

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

08448380779 Call Girls In Friends Colony Women Seeking Men

Powerful Google developer tools for immediate impact! (2023-24 C)

Axa Assurance Maroc - Insurer Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Driving Behavioral Change for Information Management through Data-Driven Gree...

Tech Trends Report 2024 Future Today Institute.pdf

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

A Domino Admins Adventures (Engage 2024)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Artificial Intelligence: Facts and Myths

Partners Life - Insurer Innovation Award 2024

How to convert PDF to text with Nanonets

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

GenAI Risks & Security Meetup 01052024.pdf

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Scaling DataStax in Docker

1. Joel Jacobson Scaling DataStax in Docker

3. What is Docker? and why is it important?

8. Why are containers important? © DataStax, All Rights Reserved. 8 WEB UI BILLING REST API CUSTOMER REST API CASSANDRA SPARKSOLR PAYMENTS REST API SERVICE X REST API SERVICE Y REST API EXT SERVICEEXT SERVICE

11. DataStax Enterprise in Docker

12. Why are containers important? © DataStax, All Rights Reserved. 12 Build once, deploy anywhere Flexibility for sharing binaries and libraries across applications Process of managing, maintaing and deploying turn key Officially supported since DSE 4.8

13. DSE processes © DataStax, All Rights Reserved. 13 Core DSE JVM One or more Spark executor processes Single Spark worker process Multiple processes for the Hadoop stack Ad-hoc process (Spark job server, SparkSQL, CLI etc.) OpsCenter agent

14. DataStax Enterprise configuration © DataStax, All Rights Reserved. 14 Cassandra configuration (seeds, cluster_name etc) Where to manage Cassandra data Optimal JVM heap size Optimal garbage collector

15. DataStax Enterprise configuration © DataStax, All Rights Reserved. 15 Default capability limits of Docker break mlockall Add –XX:+AlwaysPreTouch to the JVM arguments ulimits inherited from Docker daemon Disable swap on host OS

18. Storage © DataStax, All Rights Reserved. 18 Data volumes can be shared and reused amoung containers Changes are made directly Changes to a volume will not be included when you update an image Data volumes persist if container is deleted

19. Storage © DataStax, All Rights Reserved. 19 docker run –v <some root dir>/<dse_image_name>-data:/data –v <some root dir>/<dse_image_name>-conf:/conf –v <some root dir>/<dse_image_name>-logs:/logs –d <dse_image_name>

20. DSE Docker Demo

23. Useful Information

24. Links and information © DataStax, All Rights Reserved. 24 Datastax.com http://www.datastax.com/wp- content/uploads/resources/DataStax-WP- Best_Practices_Running_DSE_Within_Docker. pdf github.com/joeljacobson/dse-docker academy.datastax.com

25. Thank you

Notes de l'éditeur

Hi, I’m Joel, I like cats.
Dotcloud were a paas provider who built Docker to automate the deployment containers Docker containers use an execution environment called libcontainer, which is an interface to various Linux kernel isolation features, like namespaces and cgroups. Docker gives you this level of abstraction. Namespaces and cgroups are two of the main kernel technologies most of the new trend on software containerization Docker rides on. To put it simple, cgroups are a metering and limiting mechanism, they control how much of a system resource (CPU, memory) you can use. On the other hand, namespaces limit what you can see. Thanks to namespaces processes have their own view of the system’s resources. This architecture allows for multiple containers to be run in complete isolation from one another while sharing the same Linux kernel. Because a Docker container instance doesn’t require a dedicated OS, it is much more portable and lightweight than a virtual machine.
I would like to spend a few minutes discussing what docker is, most of you would have at least heard of it, and I’d like to talk about why it is important.
An image is the build component of a container. It is a read-only template from which one or more container instances can be launched. Conceptually, it’s similar to an AMI. Registries are used to store images. Registries can be local or remote. When we launch a container, Docker first searches the local registry for the image. If it’s not found locally, then it searches a public remote registry, called DockerHub. Finally, a container is a running instance of an image. Docker uses containers to execute and run the software contained in the image
Here is an example docker Dockerfile, which includes all of the instructions for building the Docker images. Take the time to get this right from the beginning.
Developers can add new application features more quickly by taking advantage of automated building, testing, integration, and packaging - at the speed of containers. Idle containers don’t take up computing, memory, and I/O resources. You can move workload between private and public clouds more quickly. Instead of moving gigabytes between clouds, you can move megabytes. Containerized applications can boot and restart in seconds, compared to minutes for virtual machines Instead of building one application (monolithic architecture), developers build a suite of components, called microservices, which come together over the network. Each component is written in the best programming language for the task, and each component can be deployed and scaled independently of one another.
At the core of the application is the business logic, which is implemented by modules that define services, domain objects, and events. Surrounding the core are adapters that interface with the external world. Examples of adapters include database access components, messaging components that produce and consume messages, and web components that either expose APIs or implement a UI. Despite having a logically modular architecture, the application is packaged and deployed as a monolith.
Many organizations, such as eBay, and Netflix, have adopted a Microservices archtecture pattern. Instead of building a single, monolithic application, the idea is to split your application into set of smaller, interconnected services. Each microservice is a mini-application that has its own architecture consisting of business logic along with various adapters. Some microservices would expose an API that’s consumed by other microservices or by the application’s clients. Other microservices might implement a web UI. At runtime, each instance is often a cloud VM or a Docker container.
Looking at the evolutions of deployment and application. 1 day to 15 minutes to 10 seconds. Only one host OS to manage. Smalll learining curcve.
Rise of the container between 2013 – 2015; spearheaded by docker.
A typical DSE node runs the following processes on a single instance within the cluster: A single core DSE JVM – including Apache Cassandra, integrated DSE Search, and Spark Master (for HA) One or more Spark executor processes A single Spark Worker process Multiple processes for the integrated Hadoop stack Multiple processes which may be started in an adhoc manner (e.g. Spark Job server, SparkSQL CLI, etc.) A single OpsCenter agent responsible for monitoring all processes on that DSE instance Container 2 - All the JVMs running on a single DSE node (uniformly deployed across the each machine within the cluster) The OpsCenter daemon is (logically) separate from the cluster and there is usually one7 instance for the entire deployment8.
To provide cluster specific configuration, the following environment variables should be provided via the Docker run command: a. CLUSTER_NAME: the name of the cluster to create/connect to b. SEEDS:thecomma-separatedlistofseedIPaddresses, e.g. SEEDS=127.0.0.2,127.0.0.3
mlockall to prevent swapping and page faults. The simplest workaround is to add -XX:+AlwaysPreTouch to the JVM arguments and disable swap on the host OS. All containers by default inherit ulimits from the Docker daemon. DSE containers should have them set to unlimited or reasonably high values (for e.g. for mem_locked_memory and max_memory_size). *Check*
Docker’s default networking (via Linux bridge) is not recommended for the production use as it slows down networking considerably, up to 50% Development and testing benefit from running DSE clusters on a single Docker host and for such scenarios the default networking is just fine Instead, use the host networking (docker run --net=host) or a plugin that can manage IP ranges across clusters of hosts. The host networking limits the number of DSE nodes per a Docker host to one, but this is the recommended configuration to use in production. Using Docker doesn’t mean have it all on a host – think about the disks! . Use pipework or Weave if consistent IP address allocation is needed.
Data volumes are required for the commitlog, saved_caches, and data directories (everything in /var/lib/cassandra). The data volume must use a supported file system (usually xfs or ext4).
A data volume is a specially-designated directory within one or more containers that bypasses the Union filesystem. Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization. Data volumes can be shared and reused among containers. Changes to a data volume are made directly. Changes to a data volume will not be included when you update an image. Data volumes persist even if the container itself is deleted.
All of this works great for test/dev/prop environments.
Deploying DSE within Docker isn’t trivial, but with adequate guidance and pre-production validation, it’s not that difficult. As the container ecosystem evolves, it is expected that future DSE releases will have additional guidelines to make the most of DSE installations under Docker. Some future areas that DataStax is investigating are: Further splitting up of DSE processes into separate containers (e.g. running Spark executors and DSE core JVM within a single container, and all other DSE processes within a separate containers) Integration of container based deployment with workload management infrastructure components such as Kubernetes, Mesos, etc. Enabling the deployment model on a variety of public and private clouds
using volumes for the data storage is a must for durability and performance avoiding the bridge/NAT networking and run containers with --net=host. This provides the simplest way to connect to the outside world and guarantees a stable IP address to the guest. Host networking also has the lowest overhead performance-wise so your cluster should perform nearly as well as it does on bare metal.
DataStax acknowledges that containers have rapidly become one of the building blocks, guidelines and examples to reduce the amount of time required to run DSE in Docker.