Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

•Download as PPTX, PDF•

9 likes•6,607 views

Learning is an analytic process of exploring the past in order to predict the future. Hence, being able to travel back in time to create features is critical for machine learning projects to be successful. To enable this, we built a time machine that computes features for any arbitrary time in the recent past for offline experimentation. We also built a real-time stream processing system to capture the interests of members during different times of the day and to quickly adapt to changes in the collective interests of members as it happens in case of real-world events. Building the time machine for offline experimentation and the real-time infrastructure for online recommendations with Apache Spark (Streaming) and Apache Cassandra empowered us to both scale up the data size by an order of magnitude and train and validate the models in less time. We will delve into the architecture, use case details, data models used for cassandra and share our learnings. About the Speakers Prasanna Padmanabhan Engineering Manager, Netflix Prasanna leads the Data Systems for Personalization team at Netflix. His primary focus is on building various big data infrastructure components that help their algorithmic engineers to innovate faster and improve personalization for Netflix members. In the past, he has built distributed data systems that leverages both batch and stream processing. Roopa Tangirala Engineering Manager, Netflix Roopa Tangirala is an experienced engineering leader with extensive background in databases, be they distributed or relational. She manages the database engineering team at Netflix responsible for operating cloud persistent and semipersistent runtime stores for Netflix, which includes Cassandra, Elasticsearch, Dynomite and MySQL databases, by ensuring data availability, durability, and scalability to meet the growing business needs.

Netflix Recommendations using
Spark + Cassandra
Prasanna Padmanabhan
Roopa Tangirala

Turn on Netflix and the absolute best
content for you would automatically start
playing

Netflix Recommendations

Netflix Recommendations

Ranking
Everything is a Recommendation
Rows
Over 80% of what
members watch
comes from our
recommendations
Recommendations
are driven by
Machine Learning
Algorithms

Data Driven
Offline Experiment
using Historical
Data
Online
A/B Testing
Rollout Feature to
ALL members
Success Success
Fail
Algorithmic Page
Generation
Trending Now

Offline Experimentation

Algorithmic Page Generation
Personalizing the ordering of rows
on the homepage

Algorithmic Page Generation
Without Algorithmic Page Generation With Algorithmic Page Generation
Diversity of the Page
Affinity for specific rows
Drawbacks

Algorithmic Page Generation
Production

Algorithmic Page Generation
Production Variant 1

Algorithmic Page Generation
Production Variant 1 Variant 2
Row Distribution
TV/Movie Ratio

Algorithmic Page Generation
Production Variant 1 Variant 2
Evaluate best variant
based on the plays
Actual
Plays:

Algorithmic Page Generation
Production Variant 1 Variant 2
Evaluate best variant
based on the plays
Actual
Plays:

Algorithmic Page Generation
Production Variant 1 Variant 2
Evaluate best variant
based on the plays
Actual
Plays:

Variant 2
Algorithmic Page Generation
Production Variant 1
Evaluate best variant
based on the plays
Actual
Plays:

Offline Experiment Architecture
Member
Selection
Runs once a day
Ratings
Service
S3
Snapshot Snapshot
Store
Snapshot
Forklift
Viewing
History
Service
MyList
Service
Data
Snapshots
Evaluate
Metrics
Generate
Pages
… …
A/B Test

Data Model - Requirements
• Need for historical service data
• Optimize for Batch Writes and Point Reads

Data Model
20161009_1001
20161009_1002
DATE_MEMBER_ID
MyList
BLOB
MyList
BLOB
R
O
W
S
COLUMN
COLUMN FAMILY:
MYLIST

Data Model
20161009_1001
20161009_1002
DATE_MEMBER_ID
ViewingData
BLOB
ViewingData
BLOB
R
O
W
S
COLUMN
COLUMN FAMILY:
VIEWING-HISTORY

Data Model
20161009_1001_0
20161009_1001_1
DATE_MEMBERID_IDX
ViewingData
BLOB
ViewingData
BLOB
R
O
W
S
COLUMN
20161009_1001_2
ViewingData
BLOB
COLUMN FAMILY:
VIEWING-HISTORY

Online A/B Testing

Trending Now
Videos that are Trending and
Personalized for you

Trending Now
It’s 7 PM on a Monday

Trending Now
It’s 10 PM on a Saturday

Trending Now
Pokeman

Fast Feedback Loop

Trending Now - Data Infrastructure
Impression
Service
Viewing
History
Service
UI
Online
Services
Trends
Store
Compute
Trends
Model
Training
Captures
videos shown
in view port
Captures
videos
played by
members
Publish
Models
Viewing
History
Service
Ratings. .. .

State Management in Cassandra
Video Number of Plays
Stranger Things 100
Narcos 200
Orange is the new Black 300

State Management in Cassandra
Trends
Store
State
Present
?
Compute Trends
Yes
No
Init State from
Cassandra
Load State
Update
State
Read
Events

Data Model - Requirements
• Trending data is for a specific interval of time
• Optimize for Batch Writes and Batch Reads

Data Model
101_METADATA
102_METADATA
VIDEOID_METADATA
Plays
BLOB
Plays
BLOB
R
O
W
S
COLUMNS
103_METADATA
Plays
BLOB
COLUMN FAMILY:
Interval 1,
Interval 2
…
Interval N
Impressions
BLOB
Impressions
BLOB
Impressions
BLOB

Roopa Tangirala
Engineering Manager @ Netflix
Twitter - @roopatangirala

FORKLIFTER

ARCHITECTURE
SOURCE TARGET

USE CASES

APACHE THRIFT CQL

DEMO

WHY NOT DSE SPARK?

SCALABILITY

COST EFFECTIVENESS

LESSONS LEARNT

TTL HANDLING
• TTL Reading And Writing is Asymmetric -
CASSANDRA 12216
• Thrift Column TTL vs CQL Row TTL

1
6
5
4
3
2
PARTITION DIFFERENCES
500000
200000
425k450k475k
200k175k150k125k
500k

TUNING
• spark.cassandra.connection.keep_alive_m
s
• spark.cassandra.connection.timeout_ms
• spark.driver.maxResultSize

OOM EXCEPTIONS
Spark.executor.memory
spark.cassandra.input.split.size_in_mb

WRITES SPEED SPARK
• cassandra.output.batch.size.bytes
• cassandra.output.batch.size.rows
• cassandra.output.concurrent.writes
• cassandra.output.throughput_mb_per_sec

Write Timeouts
cassandra.output.throughput_mb_per_sec

QUESTIONS?

More Related Content

What's hot

Introduction to Apache Kafka

Introduction to Apache Kafka

Introduction to Apache Kafka

Past, Present & Future of Recommender Systems: An Industry Perspective

Past, Present & Future of Recommender Systems: An Industry Perspective

Past, Present & Future of Recommender Systems: An Industry Perspective

Justin Basilico

Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.

Introduction to Kafka Streams

Introduction to Kafka Streams

Introduction to Kafka Streams

ksqlDB로 실시간 데이터 변환 및 스트림 처리

ksqlDB로 실시간 데이터 변환 및 스트림 처리

ksqlDB로 실시간 데이터 변환 및 스트림 처리

In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.

(BDT318) How Netflix Handles Up To 8 Million Events Per Second

(BDT318) How Netflix Handles Up To 8 Million Events Per Second

(BDT318) How Netflix Handles Up To 8 Million Events Per Second

Amazon Web Services

Personalized Page Generation for Browsing Recommendations

Personalized Page Generation for Browsing Recommendations

Personalized Page Generation for Browsing Recommendations

Justin Basilico

Introduction to Kafka Cruise Control

Introduction to Kafka Cruise Control

Introduction to Kafka Cruise Control

Slides from my talk at Cassandra Summit 2016 on troubleshooting Cassandra. This is a reprise of my popular talk from last summit, reorganized, expanded, and updated for Cassandra 3.0. In it I share the secrets I've learned in four years of supporting hundreds of customers using Apache Cassandra and DataStax Enterprise. Be sure to check out presenter notes for additional tips and links to further resources.

Cassandra Troubleshooting 3.0

Cassandra Troubleshooting 3.0

Cassandra Troubleshooting 3.0

Recommendations for Building Machine Learning Software

Recommendations for Building Machine Learning Software

Recommendations for Building Machine Learning Software

Justin Basilico

(Alex Mironov, Booking.com) Kafka Summit SF 2018 Since its original introduction at Booking.com, Apache Kafka and overall concept of real-time data streaming have come a long way from being a complicated novelty to a common tool, used by a multitude of internal users ranging in their importance from the ad-hoc consumers to business-critical services powering up our property search engine. Over the course of this talk we’ll dive deep into how a relatively small team of SREs is successfully managing a multi-cluster, multi-tenant setup of Kafka and its surrounding ecosystem capable of transporting millions of messages per day. We’ll discuss challenges they faced along their way while building this platform and take a close look not only at application but also at architectural-level decisions they made to overcome them. Surely, we will also review what kind of tooling and automation team is using to stay sane during the day and sleep well during the night.

Data Streaming Ecosystem Management at Booking.com

Data Streaming Ecosystem Management at Booking.com

Data Streaming Ecosystem Management at Booking.com

Apache Flink internals

Apache Flink internals

Apache Flink internals

Catalyst is becoming one of the most important components of Apache Spark, as it underpins all the major new APIs in Spark 2.0 and later versions, from DataFrames and Datasets to Streaming. At its core, Catalyst is a general library for manipulating trees. In this talk, Yin explores a modular compiler frontend for Spark based on this library that includes a query analyzer, optimizer, and an execution planner. Yin offers a deep dive into Spark SQL’s Catalyst optimizer, introducing the core concepts of Catalyst and demonstrating how developers can extend it. You’ll leave with a deeper understanding of how Spark analyzes, optimizes, and plans a user’s query.

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

Jitney, Kafka at Airbnb

Jitney, Kafka at Airbnb

Jitney, Kafka at Airbnb

Spark at Zillow

Spark at Zillow

Spark at Zillow

Steven Hoelscher

Cohort Analysis at Scale

Cohort Analysis at Scale

Cohort Analysis at Scale

At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences. What are we storing? Multi Source – Multi Channel Problem Data Representation and Nested Schema Evolution Performance Trade Offs with Various formats Go over anti-patterns used (String FTW) Data Manipulation using UDFs Writer Worries and How to Wipe them Away Staging Tables FTW Datalake Replication Lag Tracking Performance Time!

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta Lake

Netflix is the world’s largest streaming service, with 80 million members in over 250 countries. Netflix uses machine learning to inform nearly every aspect of the product, from the recommendations you get, to the boxart you see, to the decisions made about which TV shows and movies are created. Given this scale, we utilized Apache Spark to be the engine of our recommendation pipeline. Apache Spark enables Netflix to use a single, unified framework/API – for ETL, feature generation, model training, and validation. With pipeline framework in Spark ML, each step within the Netflix recommendation pipeline (e.g. label generation, feature encoding, model training, model evaluation) is encapsulated as Transformers, Estimators and Evaluators – enabling modularity, composability and testability. Thus, Netflix engineers can build our own feature engineering logics as Transformers, learning algorithms as Estimators, and customized metrics as Evaluators, and with these building blocks, we can more easily experiment with new pipelines and rapidly deploy them to production. In this talk, we will discuss how Apache Spark is used as a distributed framework we build our own algorithms on top of to generate personalized recommendations for each of our 80+ million subscribers, specific techniques we use at Netflix to scale, and the various pitfalls we’ve found along the way.

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...

The Netflix experience is driven by a number of Machine Learning algorithms: personalized ranking, page generation, search, similarity, ratings, etc. On the 6th of January, we simultaneously launched Netflix in 130 new countries around the world, which brings the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this post, we highlight the four most interesting challenges we’ve encountered in making our algorithms operate globally and, most importantly, how this improved our ability to connect members worldwide with stories they'll love.

Recommending for the World

Recommending for the World

Recommending for the World

Slides of my presentation at CIKM2018 about version 2 of the GRU4Rec algorithm, a recurrent neural network based algorithm for the session-based recommendation task. We discuss sampling strategies and introduce additional sampling to the algorithm. We also redesign the loss function to cope with additional sampling. The resulting BPR-max loss function is able to efficiently handle many negative samples without encountering the vanishing gradient problem. We also introduce constrained embeddings which speeds up the conversion of item representations and reduces memory usage by a factor of 4. These improvements increase offline measures up to 52%. In the talk we also discuss online A/B test and the implications of long time observations. Most of these observations are exclusive to this talk and are not in the paper. You can access the preprint version of the paper on arXiv: https://arxiv.org/abs/1706.03847 The code is available on GitHub: https://github.com/hidasib/GRU4Rec

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...

Meetup: Streaming Data Pipeline Development In this interactive session, Tim will lead participants through how to best build streaming data pipelines. He will cover how to build applications from some common use cases and highlight tips, tricks, best practices and patterns. He will show how to build the easy way and then dive deep into the underlying open source technologies including Apache NiFi, Apache Flink, Apache Kafka and Apache Iceberg. If you wish to follow along, please download open source projects beforehand. You can also download this helpful streaming platform: https://docs.cloudera.com/csp-ce/latest/installation/topics/csp-ce-installing-ce.html All source code and slides will be shared for those interested in building their own FLaNK Apps. https://www.flankstack.dev/ You can join the meeting virtually here: https://cloudera.zoom.us/j/91603330726 Speaker - Tim Spann Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

Meetup: Streaming Data Pipeline Development

Meetup: Streaming Data Pipeline Development

Meetup: Streaming Data Pipeline Development

What's hot (20)

Introduction to Apache Kafka

Introduction to Apache Kafka

Introduction to Apache Kafka

Past, Present & Future of Recommender Systems: An Industry Perspective

Past, Present & Future of Recommender Systems: An Industry Perspective

Past, Present & Future of Recommender Systems: An Industry Perspective

Introduction to Kafka Streams

Introduction to Kafka Streams

Introduction to Kafka Streams

ksqlDB로 실시간 데이터 변환 및 스트림 처리

ksqlDB로 실시간 데이터 변환 및 스트림 처리

ksqlDB로 실시간 데이터 변환 및 스트림 처리

(BDT318) How Netflix Handles Up To 8 Million Events Per Second

(BDT318) How Netflix Handles Up To 8 Million Events Per Second

(BDT318) How Netflix Handles Up To 8 Million Events Per Second

Personalized Page Generation for Browsing Recommendations

Personalized Page Generation for Browsing Recommendations

Personalized Page Generation for Browsing Recommendations

Introduction to Kafka Cruise Control

Introduction to Kafka Cruise Control

Introduction to Kafka Cruise Control

Cassandra Troubleshooting 3.0

Cassandra Troubleshooting 3.0

Cassandra Troubleshooting 3.0

Recommendations for Building Machine Learning Software

Recommendations for Building Machine Learning Software

Recommendations for Building Machine Learning Software

Data Streaming Ecosystem Management at Booking.com

Data Streaming Ecosystem Management at Booking.com

Data Streaming Ecosystem Management at Booking.com

Apache Flink internals

Apache Flink internals

Apache Flink internals

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

A Deep Dive into Spark SQL's Catalyst Optimizer with Yin Huai

Jitney, Kafka at Airbnb

Jitney, Kafka at Airbnb

Jitney, Kafka at Airbnb

Spark at Zillow

Spark at Zillow

Spark at Zillow

Cohort Analysis at Scale

Cohort Analysis at Scale

Cohort Analysis at Scale

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta Lake

Massive Data Processing in Adobe Using Delta Lake

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...

Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...

Recommending for the World

Recommending for the World

Recommending for the World

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...

Meetup: Streaming Data Pipeline Development

Meetup: Streaming Data Pipeline Development

Meetup: Streaming Data Pipeline Development

Viewers also liked

At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have. About the Speaker Dikang Gu Software Engineer, Facebook I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

Cassandra databases at Spotify hold all sorts of interesting data sets. Quite obviously, we would like to allow our data scientists tap these data sets. Recent developments in the offerings of cloud vendors allowed us to engineer systems that answer this use case in an unprecedented way. In this talk we'll present how we turned the process of exporting data from Cassandra clusters into a trivially parallelizible problem. Using just a few basic cloud products we've managed to dump our largest clusters containing terabytes of data in the order of minutes. About the Speaker Emilio Del Tessandoro Software Engineer, Spotify Emilio Del Tessandoro is a software engineer working on tooling and automation for the Spotify storage infrastructure. He is interested in theoretical computer science with a focus on algorithms and scalable systems.

Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...

Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...

Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...

Multi-Region Cassandra Clusters

Multi-Region Cassandra Clusters

Multi-Region Cassandra Clusters

GumGum relies heavily on Cassandra for storing different kinds of metadata. Currently GumGum reaches 1 billion unique visitors per month using 3 Cassandra datacenters in Amazon Web Services spread across the globe. This presentation will detail how we scaled out from one local Cassandra datacenter to a multi-datacenter Cassandra cluster and all the problems we encountered and choices we made while implementing it. How did we architect multi-region Cassandra in AWS? What were our experiences in implementing multi-datacenter Cassandra? How did we achieve low latency with multi-region Cassandra and the Datastax Driver? What are the different Cassandra use cases at GumGum? How did we integrate our Cassandra with Spark?

GumGum: Multi-Region Cassandra in AWS

GumGum: Multi-Region Cassandra in AWS

GumGum: Multi-Region Cassandra in AWS

DataStax Academy

Cassandra @ Yahoo Japan | Cassandra Summit 2016

Cassandra @ Yahoo Japan | Cassandra Summit 2016

Cassandra @ Yahoo Japan | Cassandra Summit 2016

Yahoo!デベロッパーネットワーク

We have been offering many internet services and smart phone applications for over 20 years in Japan, and Cassandra has been used by our services since 2010. In this presentation, I will explain some issues and solutions about Cassandra, and our next generation infrastructure for Cassandra. About the Speaker Satoshi Konno Technical Manager, Yahoo Japan Corporation Satoshi Konno is a software engineer with 20 years of experience. He has worked in Yahoo Japan as a programmer for 10 years and in their NoSQL team for the past 4 years and he is currently in a computer science doctoral course studying distributed computing.

Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016

Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016

Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016

Optimizing Cassandra in AWS

Optimizing Cassandra in AWS

Optimizing Cassandra in AWS

It is a new world, where secure configuration is no longer optional, and you must reduce your attack surface. Going forward, many Oracle E-Business Suite security features will now be turned on by default. To further assist you with deploying Oracle E-Business Suite securely, Oracle is now providing a secure configuration management console. Under certain conditions, access to Oracle E-Business Suite will be limited until your Oracle Applications DBA or system administrator corrects or acknowledges the errors and warnings in the console. Come to this session to learn about the new secure configuration management console and guidelines for auditing, monitoring, and securing your Oracle E-Business Suite environment and sensitive data.

OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...

OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...

OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...

Lessons learnt at building recommendation services at industry scale

Lessons learnt at building recommendation services at industry scale

Lessons learnt at building recommendation services at industry scale

Netflix stores 98 percent of data related with streaming services: right from bookmarks, viewing history to billing and payment information. These services / applications simply desire highly available and scalable persistence solution to keep themselves running efficiently in a normal and disastrous situation. How does Netflix plan for capacity for it's new as well as existing services? In this talk, Arun Agrawal, Senior Software Engineer and Ajay Upadhyay, Cloud Data Architect @Netflix will talk about the capacity planning and capacity forecasting in cassandra world. We will take you through the science behind forecasting the short and long term usage and auto-scaling adequate capacity well before C* clusters reach their limit. This guarantees highly scalable and available persistence solution meeting our SLAs @ Netflix. About the Speakers ajay upadhyay Senior Database Engineer, Netflix Responsible for persistent layer at Netflix, part of CDE [Cloud Database Engineering] team. Working with application team, suggesting and guiding them with the best practices for various persistent layers provided by CDE team. Arun Agrawal Senior Software Engineer, Netflix Arun Agrawal is part of Cloud Database Engineering where they provide CAAS (Cassandra as a service). Ensuring smooth operations of service and finding innovative ways to reduce the management overheads of having CAAS.

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...

Developing an end-to-end big data application right from data ingestion, data enrichment, and visualisation is a very cumbersome task. In this talk, I will demonstrate how to use Apache Mesos, Cassandra, Apache Spark and Docker to build a scalable, fault tolerant, responsive data platform. This talk is a collection of different recipe's that will help the developer to understand Mesos ecosystem projects and Apache Spark.Choosing the right technologies and tools during the development phase has a major impact on the success of the whole project. Apache Mesos provides the best cluster management system, Marathon gives the feature for long-running applications and Cassandra provides fully fault tolerance distributed data storage solution . About the Speaker Rahul Kumar Technical Lead, Sigmoid Rahul Kumar working as a Technical Lead with Sigmoid, He developed various real-time data analytics applications using Apache Hadoop, Mesos ecosystem projects, Akka and Apache Spark.

Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...

Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...

Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...

Dynomite @ Redis Conference 2016

Dynomite @ Redis Conference 2016

Dynomite @ Redis Conference 2016

Ioannis Papapanagiotou

OSCON TALK: Becoming Friends with Cassandra and Spark

OSCON TALK: Becoming Friends with Cassandra and Spark

OSCON TALK: Becoming Friends with Cassandra and Spark

Cassandra is the dominant data store used at Netflix and it's health is critical to many of its services. In this talk we will share details of the recent redesign of our health monitoring system and how we leveraged a reactive stream processing system to give us a real-time view our entire fleet while dramatically improving accuracy and reducing false alarms in our alerting. About the Speaker Jason Cacciatore Senior Software Engineer, Netflix Jason Cacciatore is a Senior Software Engineer at Netflix, where he's been working for the past several years. He's interested in stateful distributed systems and has a diverse background in technology. In his spare time he enjoys spending time with his wife and two sons, reading non-fiction, and watching Netflix documentaries.

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)

Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate

Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate

Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate

Big Data Engineering, Faculty of Engineering, Dhurakij Pundit University

Apache Camel is a very popular integration library that works very well with microservice architecture. This talk introduces you to Apache Camel and how you can easily get started with Camel on your computer. Then we cover how to create new Camel projects from scratch as micro services which you can boot using Camel or Spring Boot, or other micro containers such as Jetty or fat JARs. We then take a look at what options you have for monitoring and managing your Camel microservices using tooling such as Jolokia, and hawtio web console. The second part of this talk is about running Camel in the cloud. We start by showing you how you can use the Maven Docker Plugin to create a docker image of your Camel application and run it using docker on a single host. Then kubernetes enters the stage and we take a look at how you can deploy your docker images on a kubernetes cloud platform, and how thenfabric8 tooling can make this much easier for the Java developers. At the end of this talk you will have learned about and seen in practice how to take a Java Camel project from scratch, turn that into a docker image, and how you can deploy those docker images in a scalable cloud platform based on Google's kubernetes.

Microservices with Apache Camel

Microservices with Apache Camel

Microservices with Apache Camel

Securing Cassandra

Securing Cassandra

Securing Cassandra

Dynomite is a thin, distributed dynamo layer for different storage engines and protocols. Currently at Netflix, we are focusing on using Redis as the storage engine. Dynomite supports multi-datacenter replication and is designed for high availability. In the age of high scalability and big data, Dynomite’s design goal is to turn single-server datastore solutions into peer-to-peer, linearly scalable, clustered systems while still preserving the native client/server protocols of the datastores, e.g., Redis protocol. In this talk, we are going to present Dynomite recent features, and the Dyno client. Both projects are open source and available to the community.

Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...

Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...

Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...

Несколько месяцев назад компания "Яндекс" совершила маленькую революцию, открыв свою внутреннюю систему хранения и аналитики больших данных ClickHouse в opensource для всех желающих. ClickHouse стабильно показывает очень высокие результаты на тестах производительности запросов, часто догоняя и обгоняя лидеров рынка аналитических RDBMS, включая HP Vertica. Высокие результаты и авторитет "Яндекса" привлекают к этой системе заслуженное внимание разработчиков и архитекторов. Вместе с тем, архитектура ClickHouse довольно существенно отличается от привычных архитектур RDBMS, в ClickHouse отсутствует многое из привычной функциональности, есть ряд "неудобных" ограничений. Поэтому разработка новых и миграция существующих решений сопровождается значительными сложностями. В докладе рассматриваются основные архитектурные особенности ClickHouse, отличия от традиционных RDBMS или NoSQL баз данных, и обсуждаются способы решения типичных задач, возникающих при разработке аналитических систем на ClickHouse.

Переезжаем на Yandex ClickHouse / Александр Зайцев (LifeStreet)

Переезжаем на Yandex ClickHouse / Александр Зайцев (LifeStreet)

Переезжаем на Yandex ClickHouse / Александр Зайцев (LifeStreet)

Viewers also liked (20)

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016

Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...

Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...

Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...

Multi-Region Cassandra Clusters

Multi-Region Cassandra Clusters

Multi-Region Cassandra Clusters

GumGum: Multi-Region Cassandra in AWS

GumGum: Multi-Region Cassandra in AWS

GumGum: Multi-Region Cassandra in AWS

Cassandra @ Yahoo Japan | Cassandra Summit 2016

Cassandra @ Yahoo Japan | Cassandra Summit 2016

Cassandra @ Yahoo Japan | Cassandra Summit 2016

Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016

Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016

Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016

Optimizing Cassandra in AWS

Optimizing Cassandra in AWS

Optimizing Cassandra in AWS

OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...

OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...

OOW16 - Ready or Not: Applying Secure Configuration to Oracle E-Business Suit...

Lessons learnt at building recommendation services at industry scale

Lessons learnt at building recommendation services at industry scale

Lessons learnt at building recommendation services at industry scale

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...

Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...

Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...

Realtime Data Pipeline with Spark Streaming and Cassandra with Mesos (Rahul K...

Dynomite @ Redis Conference 2016

Dynomite @ Redis Conference 2016

Dynomite @ Redis Conference 2016

OSCON TALK: Becoming Friends with Cassandra and Spark

OSCON TALK: Becoming Friends with Cassandra and Spark

OSCON TALK: Becoming Friends with Cassandra and Spark

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)

Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate

Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate

Practical Data Mining with RapidMiner Studio 7 : A Basic and Intermediate

Microservices with Apache Camel

Microservices with Apache Camel

Microservices with Apache Camel

Securing Cassandra

Securing Cassandra

Securing Cassandra

Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...

Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...

Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...

Переезжаем на Yandex ClickHouse / Александр Зайцев (LifeStreet)

Переезжаем на Yandex ClickHouse / Александр Зайцев (LifeStreet)

Переезжаем на Yandex ClickHouse / Александр Зайцев (LifeStreet)

Similar to Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.

Scaling the Netflix API

Scaling the Netflix API

Scaling the Netflix API

Daniel Jacobson

Maintaining the Front Door to Netflix : The Netflix API

Maintaining the Front Door to Netflix : The Netflix API

Maintaining the Front Door to Netflix : The Netflix API

Daniel Jacobson

Netflix is the world’s leading Internet television network with over 48 million members in more than 40 countries enjoying more than one billion hours of TV shows and movies per month, including original series. Netflix uses machine learning to deliver a personalized experience to each one of our 48 million users. In this talk you will hear about the machine learning algorithms that power almost every part of the Netflix experience, including some of our recent work on distributed Neural Networks on AWS GPUs. You will also get an insight into the innovation approach that includes offline experimentation and online AB testing. Finally, you will learn about the system architectures that enable all of this at a Netflix scale.

Machine Learning at Netflix Scale

Machine Learning at Netflix Scale

Machine Learning at Netflix Scale

The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.

Scaling the Netflix API - From Atlassian Dev Den

Scaling the Netflix API - From Atlassian Dev Den

Scaling the Netflix API - From Atlassian Dev Den

Daniel Jacobson

The term "scale" for engineering often is used to discuss systems and their ability to grow with the needs of its users. This is clearly an important aspect of scaling, but there are many other areas in which an engineering organization needs to scale to be successful in the long term. This presentation discusses some of those other areas and details how Netflix (and specifically the API team) addresses them.

Scaling the Netflix API - OSCON

Scaling the Netflix API - OSCON

Scaling the Netflix API - OSCON

Daniel Jacobson

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems

Mining unstructured data for valuable information has historically been frustrating and difficult. This session will walk through practical examples of how multiple AWS services can be leveraged to provide extremely flexible, scalable, and available systems to successfully analyze massive amounts of data. Come learn how an application was adapted to leverage Elastic MapReduce and Amazon Kinesis to collect and analyze terabytes of web log data a day. Learn how Amazon Redshift can be used to clean up and visualize data and how AWS CloudFormation enables this analytical framework to be deployed in multiple regions while honoring privacy laws.

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014

Amazon Web Services

Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...

Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...

Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...

Amazon Web Services Korea

Maintaining the Front Door to Netflix

Maintaining the Front Door to Netflix

Maintaining the Front Door to Netflix

Benjamin Schmaus

Maintaining the Netflix Front Door - Presentation at Intuit Meetup

Maintaining the Netflix Front Door - Presentation at Intuit Meetup

Maintaining the Netflix Front Door - Presentation at Intuit Meetup

Daniel Jacobson

One of the trickiest problems with microservices is dealing with data as it becomes spread across many different bounded contexts. An event architecture and event-streaming platform like Kafka provide a respite to this problem. Event-first thinking has a plethora of other advantages too, pulling in concepts from event sourcing, stream processing, and domain-driven design. In this talk, Ben and Cornelia will tackle how to do the following: ● Transform the data monolith to microservices ● Manage bounded contexts for data fields that overlap ● Use event architectures that apply streaming technologies like Kafka to address the challenges of distributed data Speakers: Cornelia Davis, Author & VP, Technology, Pivotal Ben Stopford, Author & Technologist, Office of CTO, Confluent

Microservices, Events, and Breaking the Data Monolith with Kafka

Microservices, Events, and Breaking the Data Monolith with Kafka

Microservices, Events, and Breaking the Data Monolith with Kafka

Success in free-to-play gaming requires knowing what your players love most. The faster you can respond to players' behavior, the better your chances of success. Learn how mobile game company GREE, with over 150 million users worldwide, built a real-time analytics pipeline for their games using Amazon Kinesis, Amazon Redshift, and Amazon DynamoDB. They walk through their analytics architecture, the choices they made, the challenges they overcame, and the benefits they gained. Also hear how GREE migrated to the new system while keeping their games running and collecting metrics.

(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...

(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...

(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...

Amazon Web Services

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...

Wireframes for UX Revamp of the Product

Wireframes for UX Revamp of the Product

Wireframes for UX Revamp of the Product

DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS

DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS

DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS

Amazon Web Services

According to the Standish group, 64 percent of features in systems are rarely-or never-used. How does this happen? Today, the work of eliciting the customers' true needs, which remains elusive, can be enhanced using data-driven requirements techniques. Brandon Carlson introduces data collection approaches and analysis techniques you can employ on your projects right away. Find out how to instrument existing applications and develop new requirements based on operational profiles of the current system. Learn to use A/B testing-a technique for trying out and analyzing alternative implementations-on your current system to determine which new features will deliver the most business value. With these tools at hand, you can help users and business stakeholders decide the best approaches and best new features to meet their real needs. Now is the time to take the guesswork out of requirements and get "Just the facts, Ma'am."

Data Collection and Analysis for Better Requirements: Just the Facts, Ma'am

Data Collection and Analysis for Better Requirements: Just the Facts, Ma'am

Data Collection and Analysis for Better Requirements: Just the Facts, Ma'am

Modern Data Architectures for Real Time Analytics & Engagement

Modern Data Architectures for Real Time Analytics & Engagement

Modern Data Architectures for Real Time Analytics & Engagement

Amazon Web Services

How to increase engagement and conversions

How to increase engagement and conversions

How to increase engagement and conversions

Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you’ll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...

Amazon Web Services

Netflix Recommendations Feature Engineering with Time Travel

Netflix Recommendations Feature Engineering with Time Travel

Netflix Recommendations Feature Engineering with Time Travel

Similar to Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016 (20)

Scaling the Netflix API

Scaling the Netflix API

Scaling the Netflix API

Maintaining the Front Door to Netflix : The Netflix API

Maintaining the Front Door to Netflix : The Netflix API

Maintaining the Front Door to Netflix : The Netflix API

Machine Learning at Netflix Scale

Machine Learning at Netflix Scale

Machine Learning at Netflix Scale

Scaling the Netflix API - From Atlassian Dev Den

Scaling the Netflix API - From Atlassian Dev Den

Scaling the Netflix API - From Atlassian Dev Den

Scaling the Netflix API - OSCON

Scaling the Netflix API - OSCON

Scaling the Netflix API - OSCON

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems

PhD Defense: Dynamic Generation of Personalized Hybrid Recommender Systems

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014

(ARC303) Panning for Gold: Analyzing Unstructured Data | AWS re:Invent 2014

Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...

Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...

Gam301 Real-Time Game Analytics with Amazon Redshift, Amazon Kinesis, and Ama...

Maintaining the Front Door to Netflix

Maintaining the Front Door to Netflix

Maintaining the Front Door to Netflix

Maintaining the Netflix Front Door - Presentation at Intuit Meetup

Maintaining the Netflix Front Door - Presentation at Intuit Meetup

Maintaining the Netflix Front Door - Presentation at Intuit Meetup

Microservices, Events, and Breaking the Data Monolith with Kafka

Microservices, Events, and Breaking the Data Monolith with Kafka

Microservices, Events, and Breaking the Data Monolith with Kafka

(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...

(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...

(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...

Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...

Wireframes for UX Revamp of the Product

Wireframes for UX Revamp of the Product

Wireframes for UX Revamp of the Product

DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS

DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS

DevOps on AWS: Advanced Techniques for Amazon EC2 Deployments on AWS

Data Collection and Analysis for Better Requirements: Just the Facts, Ma'am

Data Collection and Analysis for Better Requirements: Just the Facts, Ma'am

Data Collection and Analysis for Better Requirements: Just the Facts, Ma'am

Modern Data Architectures for Real Time Analytics & Engagement

Modern Data Architectures for Real Time Analytics & Engagement

Modern Data Architectures for Real Time Analytics & Engagement

How to increase engagement and conversions

How to increase engagement and conversions

How to increase engagement and conversions

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...

Introduction to Real-time, Streaming Data and Amazon Kinesis: Streaming Data ...

Netflix Recommendations Feature Engineering with Time Travel

Netflix Recommendations Feature Engineering with Time Travel

Netflix Recommendations Feature Engineering with Time Travel

More from DataStax

Be a holiday hero—not a sorry statistic. View this on-demand webinar to learn how to drive revenue, business growth, customer satisfaction, and loyalty during the holiday season, and achieve operational excellence (and sanity!) at the same time. You’ll also hear real-world stories of companies that have experienced Black Friday nightmares—and learn how they turned things back around. View webinar: https://pages.datastax.com/20191003-NAM-Webinar-IsYourEnterpriseReadytoShinethisHolidaySeason_1-Registration-LP.html Explore all DataStax webinars: www.datastax.com/webinars

Is Your Enterprise Ready to Shine This Holiday Season?

Is Your Enterprise Ready to Shine This Holiday Season?

Is Your Enterprise Ready to Shine This Holiday Season?

Data resiliency and availability are mission-critical for enterprises today—yet we live in a world where outages are an everyday occurrence. Whether the problem is a single server failure or losing connectivity to an entire data center, if your applications aren’t designed to be fault tolerant, recovery from an outage can be painful and slow. Watch this on-demand webinar to look at best practices for developing fault-tolerant applications with DataStax Drivers for Apache Cassandra and DataStax Enterprise (DSE). View recording: https://youtu.be/NT2-i3u5wo0 Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

To simplify deploying and managing modern applications, enterprises have been combining the benefits of hyperconverged infrastructure (HCI) with the performance and scale of a NoSQL database — and the results have been remarkable. With this combination, IT organizations have experienced more agility, improved reliability, and better application performance. Watch this on-demand webinar where you’ll learn specifically how VMware HCI with DataStax Enterprise (DSE) and Apache Cassandra™ are transforming the enterprise. View recording: https://youtu.be/FCLGHMIB0L4 Explore all DataStax Webinars: https://www.datastax.com/resources/webinars

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

A distributed graph database is the most powerful means of discovering and leveraging the relationships in your data. With the right techniques combined with the right enterprise graph features, you can build modern applications at scale for real-time use-cases. But how exactly should you manage and model your data for a distributed graph database? And how can you leverage the relationships in that data? Watch this on-demand webinar as our graph expert answers those questions and shares tips and insights into creating production apps with distributed graph data. View recording: https://youtu.be/TSs_qPnhOas

Best Practices for Getting to Production with DataStax Enterprise Graph

Best Practices for Getting to Production with DataStax Enterprise Graph

Best Practices for Getting to Production with DataStax Enterprise Graph

Data management may be the hardest part of making the transition to the cloud, but enterprises including Intuit and Macy’s have figured out how to do it right. So what do they know that you might not? Join Robin Schumacher, Chief Product Officer at DataStax as he explores best practices for defining and implementing data management strategies for the cloud. He outlines a four-step journey that will take you from your first deployment in the cloud through to a true intercloud implementation and walk through a real-world use case where a major retailer has evolved through the four phases over a period of four years and is now benefiting from a highly resilient multi-cloud deployment. View webinar: https://youtu.be/RrTxQ2BAxjg

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

In this webinar, you will leverage free and open source tools as well as enterprise-grade utilities developed by DataStax to get a solid grasp on the performance of a masterless distributed database like Cassandra. You’ll also get the opportunity to walk through DataStax Enterprise Insights dashboards and see exactly how to identify performance bottlenecks. View Recording: https://youtu.be/McZg_MMzVjI

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

In this webinar, you’ll also be introduced to DataStax Apache Kafka Connector, and get a brief demonstration of this groundbreaking technology. You’ll directly experience how this tool can help you stream data from Kafka topics into DataStax Enterprise versions of Cassandra. The future of your organization won’t wait. Register now to reserve your spot in this exciting new webinar. Youtube: https://youtu.be/HmkNb8twUNk

Webinar | Better Together: Apache Cassandra and Apache Kafka

Webinar | Better Together: Apache Cassandra and Apache Kafka

Webinar | Better Together: Apache Cassandra and Apache Kafka

No matter how diligent your organization is at driving toward efficiency, databases are complex and it’s easy to make mistakes on your way to production. The good news is, these mistakes are completely avoidable. In this webinar, Jeff Carpenter shares with you exactly how to get started in the right direction — and stay on the path to a successful database launch. View recording: https://youtu.be/K9Zj3bhjdQg Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Apache Cassandra has been a driving force for applications that scale for over 10 years. This open-source database now powers 30% of the Fortune 100.Now is your chance to get an inside look, guided by the company that’s responsible for 85% of the code commits.You won’t want to miss this deep dive into the database that has become the power behind the moment — the force behind game-changing, scalable cloud applications - Patrick McFadin, VP Developer Relations at DataStax, is going behind the Cassandra curtain in an exclusive webinar. View recording: https://youtu.be/z8fLn8GL5as Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Introduction to Apache Cassandra™ + What’s New in 4.0

Introduction to Apache Cassandra™ + What’s New in 4.0

Introduction to Apache Cassandra™ + What’s New in 4.0

In this webinar, we’ll discuss how an Active Everywhere database—a masterless architecture where multiple servers (or nodes) are grouped together in a cluster—provides a consistent data fabric between on-premises data centers and public clouds, enabling enterprises to effortlessly scale their hybrid cloud deployments and easily transition to the new hybrid cloud world, without changes to existing applications. View recording: https://youtu.be/ob6tr-9YiF4

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

The European Union’s General Data Protection Regulation (GDPR) has sweeping effects on how enterprises manage their data. Without the right policies and safeguards in place, a tiny data mishap could end up turning into a catastrophic mistake. Join Datastax and our partner Thales eSecurity for a live webinar to learn how GDPR effects impact data management and the various ways enterprises can both comply and thrive in a hybrid cloud environment. View recording: https://youtu.be/QZ48_qkK9PU Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Join Designing a Distributed Cloud Database for Dummies—the webinar. The webinar “stars” industry vet Patrick McFadin, best known among developers for his seven years at Apache Cassandra, where he held pivotal community roles. Register for the webinar today to learn: why you need distributed cloud databases, the technology you need to create the best used experience, the benefits of data autonomy and much more. View the recording: https://youtu.be/azC7lB0QU7E To explore all DataStax webinars: https://www.datastax.com/resources/webinars

Designing a Distributed Cloud Database for Dummies

Designing a Distributed Cloud Database for Dummies

Designing a Distributed Cloud Database for Dummies

Most enterprises understand the value of hybrid cloud. In fact, your enterprise is already working in a multi-cloud or hybrid cloud environment, whether you know it or not. View this SlideShare to gain a greater understanding of the requirements of a geo-distributed cloud database in hybrid and multi-cloud environments. View recording: https://youtu.be/tHukS-p6lUI Explore all DataStax webinars: https://www.datastax.com/resources/webinars

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

View these slides to discover the advantages of a distributed cloud database designed for hybrid cloud along with examples of how companies are delivering innovative and personalized ecommerce experiences. We'll discuss the sources of common data challenges and the hidden impact they have on business, the database requirements for improved customer experiences and innovative application delivery, and how leading organizations such as eBay, Sony, Macy’s, and Comcast are transforming the eCommerce experience with DataStax Enterprise 6. View recording: https://youtu.be/4UXrJ3xtmGg Explore all DataStax webinars: https://www.datastax.com/resources/webinars

How to Evaluate Cloud Databases for eCommerce

How to Evaluate Cloud Databases for eCommerce

How to Evaluate Cloud Databases for eCommerce

Today’s customers want experiences that are contextual, always on, and above all — delightful. To be able to provide this, enterprises need a distributed, hybrid cloud-ready database that can easily crunch massive volumes of data from disparate sources while offering data autonomy and operational simplicity. Don’t miss this webinar, where you’ll learn how DataStax Enterprise 6 maintains hybrid cloud flexibility with all the benefits of a distributed cloud database, delivers all the advantages of Apache Cassandra with none of the complexities, doubles performance, and provides additional capabilities around robust transactional analytics, graph, search, and more. View recording: https://youtu.be/tuiWAt2jwBw Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Today’s Right-Now Economy means employees and customers alike expect applications to be always on, real time, and contextual. But how do you manage applications that collect data from a variety of sources, at cloud scale, and provide instant insights? And, can you embrace the public cloud while still retaining control of your data? Join us to hear from Microsoft Cloud Architect and Azure Global Black Belt Ron Abellera to learn how an enterprise-ready hybrid cloud data layer can help to accelerate time to market and scale linearly, ensure continuous availability, and achieve data autonomy with a hybrid cloud strategy. View webinar recording: https://youtu.be/_-GqmAk5C_I Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Welcome to the Right-Now Economy. To win in the Right-Now Economy, your enterprise needs to be able to provide delightful, always-on, instantaneously responsive applications via a data layer that can handle data rapidly, in real time, and at cloud scale. Don’t miss our upcoming webinar in which Forrester Principal Analyst Brendan Witcher will discuss why a singular, contextual, 360-degree view of the customer in real-time is critical to CX success and how companies are using data to deliver real-time personalization and recommendations. View recording: https://youtu.be/e6prezfIGMY Explore all DataStax webinars: https://www.datastax.com/resources/webinars

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Datastax - The Architect's guide to customer experience (CX)

Datastax - The Architect's guide to customer experience (CX)

Datastax - The Architect's guide to customer experience (CX)

Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. Join this webinar, to hear leading experts from DataStax, discuss how DataStax Enterprise, the data management platform trusted by 9 out of the top 15 global banks, enables innovation and industry transformation. They’ll cover how the right data management platform can help break down data silos and modernize old systems of record as an operational data layer that scales to meet the distributed, real-time, always available demands of the enterprise. Register now to learn how the right data management platform allows you to power innovative banking applications, gain instant insight into comprehensive customer interactions, and beat fraud before it happens. Video: https://youtu.be/319NnKEKJzI Explore all DataStax webinars: https://www.datastax.com/resources/webinars

An Operational Data Layer is Critical for Transformative Banking Applications

An Operational Data Layer is Critical for Transformative Banking Applications

An Operational Data Layer is Critical for Transformative Banking Applications

Customer expectations are changing fast, while customer-related data is pouring in at an unprecedented rate and volume. How can you contextualize and analyze all this customer data in real time to meet increasingly demanding customer expectations? Join Mike Rowland, Director and National Practice Leader for CX Strategy at West Monroe Partners, and Kartavya Jain, Product Marketing Manager at DataStax, for an in-depth conversation about how customer experience frameworks, driven by Design Thinking, can help enterprises: understand their customers and their needs, define their strategy for real-time CX, create value from contextual and instant insights.

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?

Is Your Enterprise Ready to Shine This Holiday Season?

Is Your Enterprise Ready to Shine This Holiday Season?

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Best Practices for Getting to Production with DataStax Enterprise Graph

Best Practices for Getting to Production with DataStax Enterprise Graph

Best Practices for Getting to Production with DataStax Enterprise Graph

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | Better Together: Apache Cassandra and Apache Kafka

Webinar | Better Together: Apache Cassandra and Apache Kafka

Webinar | Better Together: Apache Cassandra and Apache Kafka

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Introduction to Apache Cassandra™ + What’s New in 4.0

Introduction to Apache Cassandra™ + What’s New in 4.0

Introduction to Apache Cassandra™ + What’s New in 4.0

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Designing a Distributed Cloud Database for Dummies

Designing a Distributed Cloud Database for Dummies

Designing a Distributed Cloud Database for Dummies

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Evaluate Cloud Databases for eCommerce

How to Evaluate Cloud Databases for eCommerce

How to Evaluate Cloud Databases for eCommerce

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Datastax - The Architect's guide to customer experience (CX)

Datastax - The Architect's guide to customer experience (CX)

Datastax - The Architect's guide to customer experience (CX)

An Operational Data Layer is Critical for Transformative Banking Applications

An Operational Data Layer is Critical for Transformative Banking Applications

An Operational Data Layer is Critical for Transformative Banking Applications

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Recently uploaded

Announcing Codolex 2.0 from GDK Software

Announcing Codolex 2.0 from GDK Software

Announcing Codolex 2.0 from GDK Software

ADR, or Architecture Decision Record, is a valuable tool in software development for several reasons. It provides a centralized location for documenting and tracking architectural decisions, aiding both current and future team members. ADRs enhance communication among team members by documenting the rationale behind architectural decisions, especially beneficial during onboarding of new team members or when revisiting decisions. They serve as a knowledge base, enabling teams to learn from past decisions and refine their decision-making process. Additionally, ADRs contribute to transparency by helping stakeholders understand the reasons behind specific architectural choices. As with any other tool or process, introducing them into an organization can face several obstacles, and overcoming these challenges is crucial for successful implementation. In this talk I go through some common problems and our way of solving them.

Architecture decision records - How not to get lost in the past

Architecture decision records - How not to get lost in the past

Architecture decision records - How not to get lost in the past

Papp Krisztián

In today's dynamic e-commerce landscape, the payment gateway emerges as a linchpin, ensuring smooth and secure transactions between buyers and sellers. In this discourse, we delve into the meticulous process of devising test cases tailored for scrutinizing payment gateways. Crafting precise test cases for payment gateways is a quintessential responsibility for testers operating within the service industry. This article meticulously explores pivotal scenarios integral to how to test payment gateways, coupled with essential guidelines for drafting effective test cases.

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

kalichargn70th171

8257 interfacing 2 in microprocessor for btech students

8257 interfacing 2 in microprocessor for btech students

8257 interfacing 2 in microprocessor for btech students

Investing in AI transformation today The modern business advantage: Uncovering deep insights with AI Organizations around the world have come to recognize AI as the transformative technology that enables them to gain real business advantage. AI’s ability to organize vast quantities of data allows those who implement it to uncover deep business insights, augment human expertise, drive operational efficiency, transform their products, and better serve their customers

Microsoft AI Transformation Partner Playbook.pdf

Microsoft AI Transformation Partner Playbook.pdf

Microsoft AI Transformation Partner Playbook.pdf

Willy Marroquin (WillyDevNET)

+971565801893 Mtp-Kit (500MG) Prices » Dubai [(+971565801893**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Leen Whatsapp +971565801893 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971565801893''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971565801893' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Clinic in Abu Dhabi, United Arab Emirates.+971565801893

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

The fast-paced world of Agile development just got a turbo boost! Dive into this engaging talk on "Innovation at Speed: ChatGPT & the AI Transformation in Agile Development." Over the last few months, AI, including notable tools like Google Bard and Microsoft Co-Pilot, has been leveraged to supercharge Agile workflows. In this session, attendees will get an overview of the current AI landscape and its impact on development. They will learn why AI is the game-changer needed to embrace in the Agile lifecycle, offering that elusive X factor to multiply productivity. Real-world examples will be presented, and the challenges of implementing AI tools like ChatGPT within corporate environments will be discussed.

Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment

Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment

Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment

VictorSzoltysek

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

Bert Jan Schrijver

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

We specialize in Psychic Readings, Psychic Love Spells, Binding Love Spells, Obsession Spells, Voodoo Spells, Lottery Spells, Marriage Spells, Black Magic Spells, Palm Readings & much more. Are you depressed? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? Do u need to solve any relationship problem? Contact the powerful spells caster chief kule with love spells that work overnight and love spells that really work. Have you found yourself infatuated with a special someone you think could be the one? Are you looking for a spell to provide them with a nudge in the right direction? Or maybe the spell you cast didn’t achieve the results you were hoping for? Whether you’re new or versed in the ways of spell casting, we’re here to help. Today we’re going to provide you with a detailed guide on the types of love spells to cast. Not only that but there’s something for those who wish to find outside advice from more advanced spell casters. We’re also going to provide you with the top sites available to help you with your dilemma. Let’s begin our journey by educating ourselves on love magic and what a real love caster looks like. Love Magic and Love Casters Love magic made its first appearance back in Ancient Egypt and has been an active practice since. This type of magic is a branch of traditional magic and can be practiced in various ways. Typically the more common use of love magic is through the work of spells, but other methods look like Charms Rituals-LOVE Potions-Dolls and even Amulets If you are interested in becoming a love caster, be prepared for what’s to come. A genuine love caster knows that the art of love casting is no easy feat and shouldn’t be done casually. You should know that not only does it require you to be gifted spiritually, but you must be ready to serve others. Someone who is considered a real love caster has experience in all manner of spells, no matter the difficulty. Training yourself in attraction, commitment, and marriage spells is an excellent place to start. But this by no means will make you a professional. Practice your craft and expand your knowledge; understand that you will possess the ability to help others in time truly. Types of Love Spells What better way to start broadening your experiences with love spells than by learning more about them? These spells work like just about any other spell. Simply apply your intention, use a medium (sigils, mantras, candles, or charm bags), and top it off with establishing the belief that you will receive what you want. So what kind of spells are available and which ones suit your needs the best? Let’s take a look at the many options you have at your disposal.

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in Harare+277-882-255-28 abortion pills for sale in Harare

%in Harare+277-882-255-28 abortion pills for sale in Harare

%in Harare+277-882-255-28 abortion pills for sale in Harare

This presentation covers the following topics: What is logging? The purpose of logging: Debugging The purpose of logging: Security The purpose of logging: Stats & analytics Traditional logging Traditional logging: Advantages Traditional logging: Disadvantages The solution: Large-scale logging Large-scale logging: Core principles Large-scale logging: Solution types Large-scale logging: Cloud vs on-prem Large-scale logging: Operational complexity Large-scale logging: Security Large-scale logging: Costs Large-scale logging: On-prem comparison - Elasticsearch - Grafana Loki - VictoriaLogs On-prem comparison: Setup and operation On-prem comparison: Costs On-prem comparison: Full-text search support On-prem comparison: How to efficiently query 100TB of logs? On-prem comparison: Integration with CLI tools VictoriaLogs for large-scale logging VictoriaLogs demo instance - Ingestion rate: 3600 messages / minute - The number of log messages: 1.1 billion - Uncompressed log messages’ size: 1.5TB - Compressed log messages’ size: 23GB - Compression ratio: 47x - Memory usage: 150MB VictoriaLogs CLI integration demo - Which errors have occurred in all the apps during the last hour? - How many errors have occurred during the last hour? - Which apps generated the most of errors during the last hour? - The number of per-minute errors for the last 10 minutes - Status codes for the last hour - Non-200 status codes for the last week - Top client IPs for the last 4 weeks with 404 and 500 response status codes - Per-month stats for the given IP across all the logs Large-scale logging solution MUST provide excellent CLI integration VictoriaLogs: (temporary) drawbacks VictoriaLogs: Recap - Easy to setup and operate - The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch and Grafana Loki) - Fast full-text search - Excellent integration with traditional command-line tools for log analysis - Accepts logs from popular log shippers (Filebeat, Fluentbit, Logstash, Vector, Promtail, Grafana Agent) - Open source and free to use!

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024

VictoriaMetrics

VTU technical seminar 8Th Sem on Scikit-learn

VTU technical seminar 8Th Sem on Scikit-learn

VTU technical seminar 8Th Sem on Scikit-learn

AmarnathKambale

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Define the academic and professional writing..pdf

Define the academic and professional writing..pdf

Define the academic and professional writing..pdf

PearlKirahMaeRagusta1

We specialize in Psychic Readings, Psychic Love Spells, Binding Love Spells, Obsession Spells, Voodoo Spells, Lottery Spells, Marriage Spells, Black Magic Spells, Palm Readings & much more. Are you depressed? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? Do u need to solve any relationship problem? Contact the powerful spells caster chief kule with love spells that work overnight and love spells that really work. Have you found yourself infatuated with a special someone you think could be the one? Are you looking for a spell to provide them with a nudge in the right direction? Or maybe the spell you cast didn’t achieve the results you were hoping for? Whether you’re new or versed in the ways of spell casting, we’re here to help. Today we’re going to provide you with a detailed guide on the types of love spells to cast. Not only that but there’s something for those who wish to find outside advice from more advanced spell casters. We’re also going to provide you with the top sites available to help you with your dilemma. Let’s begin our journey by educating ourselves on love magic and what a real love caster looks like. Love Magic and Love Casters Love magic made its first appearance back in Ancient Egypt and has been an active practice since. This type of magic is a branch of traditional magic and can be practiced in various ways. Typically the more common use of love magic is through the work of spells, but other methods look like Charms Rituals-LOVE Potions-Dolls and even Amulets If you are interested in becoming a love caster, be prepared for what’s to come. A genuine love caster knows that the art of love casting is no easy feat and shouldn’t be done casually. You should know that not only does it require you to be gifted spiritually, but you must be ready to serve others. Someone who is considered a real love caster has experience in all manner of spells, no matter the difficulty. Training yourself in attraction, commitment, and marriage spells is an excellent place to start. But this by no means will make you a professional. Practice your craft and expand your knowledge; understand that you will possess the ability to help others in time truly. Types of Love Spells What better way to start broadening your experiences with love spells than by learning more about them? These spells work like just about any other spell. Simply apply your intention, use a medium (sigils, mantras, candles, or charm bags), and top it off with establishing the belief that you will receive what you want. So what kind of spells are available and which ones suit your needs the best? Let’s take a look at the many options you have at your disposal.

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

tonesoftg

The title is not connected to what is inside

The title is not connected to what is inside

The title is not connected to what is inside

shinachiaurasa2

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

Recently uploaded (20)

Announcing Codolex 2.0 from GDK Software

Announcing Codolex 2.0 from GDK Software

Announcing Codolex 2.0 from GDK Software

Architecture decision records - How not to get lost in the past

Architecture decision records - How not to get lost in the past

Architecture decision records - How not to get lost in the past

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

8257 interfacing 2 in microprocessor for btech students

8257 interfacing 2 in microprocessor for btech students

8257 interfacing 2 in microprocessor for btech students

Microsoft AI Transformation Partner Playbook.pdf

Microsoft AI Transformation Partner Playbook.pdf

Microsoft AI Transformation Partner Playbook.pdf

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment

Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment

Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview

%in Harare+277-882-255-28 abortion pills for sale in Harare

%in Harare+277-882-255-28 abortion pills for sale in Harare

%in Harare+277-882-255-28 abortion pills for sale in Harare

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024

VTU technical seminar 8Th Sem on Scikit-learn

VTU technical seminar 8Th Sem on Scikit-learn

VTU technical seminar 8Th Sem on Scikit-learn

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Define the academic and professional writing..pdf

Define the academic and professional writing..pdf

Define the academic and professional writing..pdf

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

tonesoftg

The title is not connected to what is inside

The title is not connected to what is inside

The title is not connected to what is inside

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

1. Netflix Recommendations using Spark + Cassandra Prasanna Padmanabhan Roopa Tangirala

2. Turn on Netflix and the absolute best content for you would automatically start playing

3. Netflix Recommendations

4. Netflix Recommendations

5. Ranking Everything is a Recommendation Rows Over 80% of what members watch comes from our recommendations Recommendations are driven by Machine Learning Algorithms

6. Data Driven Offline Experiment using Historical Data Online A/B Testing Rollout Feature to ALL members Success Success Fail Algorithmic Page Generation Trending Now

7. Offline Experimentation

8. Algorithmic Page Generation Personalizing the ordering of rows on the homepage

9. Algorithmic Page Generation Without Algorithmic Page Generation With Algorithmic Page Generation Diversity of the Page Affinity for specific rows Drawbacks

10. Algorithmic Page Generation Production

11. Algorithmic Page Generation Production Variant 1

12. Algorithmic Page Generation Production Variant 1 Variant 2 Row Distribution TV/Movie Ratio

13. Algorithmic Page Generation Production Variant 1 Variant 2 Evaluate best variant based on the plays Actual Plays:

14. Algorithmic Page Generation Production Variant 1 Variant 2 Evaluate best variant based on the plays Actual Plays:

15. Algorithmic Page Generation Production Variant 1 Variant 2 Evaluate best variant based on the plays Actual Plays:

16. Variant 2 Algorithmic Page Generation Production Variant 1 Evaluate best variant based on the plays Actual Plays:

17. Offline Experiment Architecture Member Selection Runs once a day Ratings Service S3 Snapshot Snapshot Store Snapshot Forklift Viewing History Service MyList Service Data Snapshots Evaluate Metrics Generate Pages … … A/B Test

18. Data Model - Requirements • Need for historical service data • Optimize for Batch Writes and Point Reads

19. Data Model 20161009_1001 20161009_1002 DATE_MEMBER_ID MyList BLOB MyList BLOB R O W S COLUMN COLUMN FAMILY: MYLIST

20. Data Model 20161009_1001 20161009_1002 DATE_MEMBER_ID ViewingData BLOB ViewingData BLOB R O W S COLUMN COLUMN FAMILY: VIEWING-HISTORY

21. Data Model 20161009_1001_0 20161009_1001_1 DATE_MEMBERID_IDX ViewingData BLOB ViewingData BLOB R O W S COLUMN 20161009_1001_2 ViewingData BLOB COLUMN FAMILY: VIEWING-HISTORY

22. Online A/B Testing

23. Trending Now Videos that are Trending and Personalized for you

24. Trending Now It’s 7 PM on a Monday

25. Trending Now It’s 10 PM on a Saturday

26. Trending Now Pokeman

27. Fast Feedback Loop

28. Trending Now - Data Infrastructure Impression Service Viewing History Service UI Online Services Trends Store Compute Trends Model Training Captures videos shown in view port Captures videos played by members Publish Models Viewing History Service Ratings. .. .

29. State Management in Cassandra Video Number of Plays Stranger Things 100 Narcos 200 Orange is the new Black 300

30. State Management in Cassandra Trends Store State Present ? Compute Trends Yes No Init State from Cassandra Load State Update State Read Events

31. Data Model - Requirements • Trending data is for a specific interval of time • Optimize for Batch Writes and Batch Reads

32. Data Model 101_METADATA 102_METADATA VIDEOID_METADATA Plays BLOB Plays BLOB R O W S COLUMNS 103_METADATA Plays BLOB COLUMN FAMILY: Interval 1, Interval 2 … Interval N Impressions BLOB Impressions BLOB Impressions BLOB

33. Roopa Tangirala Engineering Manager @ Netflix Twitter - @roopatangirala

35. ARCHITECTURE SOURCE TARGET

37.

38. APACHE THRIFT CQL

39.

41.

42. WHY NOT DSE SPARK?

43.

44. SCALABILITY

45. COST EFFECTIVENESS

46. LESSONS LEARNT

47. TTL HANDLING • TTL Reading And Writing is Asymmetric - CASSANDRA 12216 • Thrift Column TTL vs CQL Row TTL

48. 1 6 5 4 3 2 PARTITION DIFFERENCES 500000 200000 425k450k475k 200k175k150k125k 500k

49. TUNING • spark.cassandra.connection.keep_alive_m s • spark.cassandra.connection.timeout_ms • spark.driver.maxResultSize

50. OOM EXCEPTIONS Spark.executor.memory spark.cassandra.input.split.size_in_mb

51. WRITES SPEED SPARK • cassandra.output.batch.size.bytes • cassandra.output.batch.size.rows • cassandra.output.concurrent.writes • cassandra.output.throughput_mb_per_sec

52. Write Timeouts cassandra.output.throughput_mb_per_sec

53.

Editor's Notes

Good afternoon everyone. My name is Prasanna. I lead the Data Systems for Personalization team at Netflix. Our team builds the Machine Learning infrastructure that powers Netflix recommendations and I have Roopa with me who leads the Cloud Database Engineering team at Netflix. Today, we are going to talk about a few use cases where we use Spark + Cassandra in our data pipelines and share some of the learnings from it.
At Netflix, we aspire to a day when our members can turn on Netflix and the absolute best content for them has already started playing for them. While we know we are far away from realizing this dream, it sets a vision for us to improve the recommendations that span our service. So, where do we use recommendations in our service.
Our journey of building the recommendations systems started with predicting the rating that our members will give for a video and based on that recommend appropriate videos
That later evolved into creating meaningful grouping of videos and being able to personalize the videos within each group.
Today, we have multitude of algorithms for doing recommendations. Not only are the videos within a row personalized for you, but the rows themselves are personalized for you. Our 80% of what our members watch come from the videos that are recommended to them, which are driven by machine learning algorithms. So how do we improve these algorithms to realize our grand vision
Just like everything else at Netflix, we follow a data driven approach to improve our recommendations. Once we have an idea, we run an offline experiment using historical data to see if this new idea would have made better recommendations. If it did, we would deploy it to an online A/B test to see it performs well in Production too. We look at various metrics such as Viewing hours, Member Retention and member satisfaction to evaluate the success of an A/B test. If the A/B test is a success, we would rollout that feature to ALL members. And If not, go back to the whiteboard, come up with a better idea and start over the offline experiment. For the rest of my talk, I’m going to take one use case of Offline Experimentation and one use case for an Online A/B and walk through how we use Spark and Cassandra to help improve recommendations
As we saw earlier, Offline Experiment is a step prior to doing an A/B test. It helps us decide if an idea is even worth doing an A/B test.
Let’s take the use case of Algorithmic Page generation for Offline Experimentation. How can we personalize the ordering of rows on the homepage for each member.
We initially used to have a rule based approach of page generation. For example, the rules could specify that the 1st row be Continue Watching, the 2nd row be Top Picks and so on and so forth. The drawback for this approach is that it does not take into account the diversity of the page nor the affinity of our members to specific rows. Algorithmic Page generation addresses these issues by personalize the row and the ordering of the rows on the home page based on our member’s viewing patterns, diversity of the page and many more attributes.
Let’s take an example to see how we evaluate different pages algorithmically. Say this is a page that a member sees based on the current Production algorithm.
Variant 1 is a new page that was generated with a new algorithm
And Variant 2 is another page that was generated with another new algorithm. We first look for some of the basic things like how is the Row distribution (for ex: how many members see a Continue Watching row) and how is the TV/Movie Ratio (Does one variant over index on say TV shows)
More importantly, we look at the actual videos that were played by the member and find the best page that could have made those videos easily discoverable. In this case, say the member played Hot Rod, The Short Game and Family Guy.
We can see that HotRod was recommended in all the 3 versions of the page, except that Production and Variant 2 recommended that video much higher in the page.
Similarly, Short Game too was recommended in all the 3 versions of the page, except that Variant 2 again recommended that video much higher in the page.
We also look at negative samples. Family Guy was a video that was played, but not recommended in any version of the page (probably our members searched for it). We typically consider this as a fail in our recommendations. Given this data, we would choose Variant 2 as the winning page algorithm as it would likely surfaces the videos that would be played by our members much higher in the page.
Now lets look at the offline experimentation architecture that made this possible. The most critical requirement for building an offline experiment is to provide an ability to travel back in time and be able to generate the page our members would have seen, if they used our service at a given time in the past. We built this ability to time travel by snapshotting data of our various online services and use that snapshot data to generate the experimental page. The first step in building the snapshot infrastructure is to select the set of members for whom we need to Snapshot data. Snapshotting data for all our members would be an expensive operation. Rather, we select a stratified set of our members based on member’s tenure, their viewing patterns, the devices they use etc. Once we have the set of users, the next step is to snapshot data of various online services such as Ratings, Viewing History, MyList that help improve personalization. As you folks might be aware, Netflix embraces a fine grained Service Oriented architecture for our cloud based deployment model. These snapshot data are then stored in S3 in nested parquet format for both space and time efficiency. Many of our offline experiments run inside Spark and they can directly consume the snapshot data from S3. However, for Algorithmic Page generation, we need to consume this snapshot data for one member at a time. This is because we are reusing our existing online systems, which generates the page for a live user request to also generate the experimental page given the snapshot data. S3 is not suited to do random seeks of the data stored in it. Alternatively, We know Cassandra is well suited for this use case. To that end, we used Spark to read the snapshot data from S3 and write that data into Cassandra. We used the Spark Cassandra connector, which took care of the nitty gritty details of connecting to all the cassandra nodes in the ring, maintaining the connection pool, doing retries and optimizing the reads/writes to Cassandra. Once the data is available in Cassandra, we will now be able to get the state of netflix data services for any given member and a timestamp in the past. We can then generate the experimental pages for this member based on the new algorithm and evaluate the metrics needed to see which of those page algorithms could have done better recommendations and if there is a clear winner, we would deploy it to a A/B test.
Before we look into the data models that we used in Cassandra. There were 2 requirements that we needed to address when building the data model: The need for storing historical data from various data services such as Ratings, Viewing History. This is the core for building the time machine Optimize the data model for batch writes that happen from Spark and for Point reads from the online systems during Page generation
Here is the data model that we used for storing our member’s MyList data for Offline Experiment. So yes, the obvious thing is to have different column families for different data services. Date and MemberId concatenated together formed the Row Key. Column name was a static string and its value being a blog of the MyList data for that member. With this data model, a query to get a member’s MyList data for a given timestamp in the past would translate to a point query read, which is very efficient in Cassandra.
However, a similar data model for storing Viewing History would not work. This is because the viewing history data for a member could be very big and would become a wide row, causing heap pressure which inturn would affect latencies
To avoid the issues of having a wide row, we divided the rows into a predefined set of shards. In this case, Date MemberId and Shard Index becomes the row key and the Viewing data blob was the column value.
Now lets focus on the next use case of how did we use Spark (Spark Streaming to be precise) + Cassandra for an Online A/B test.
The Trending Now row captures the video that are Trending, but personalized for you.
Here is a screenshot of my Trending Now row when its 7 pm on a Monday, when my daughter takes control of our remote.
Here is a screenshot of my Trending Now row when its 10 pm on a Saturday. Its ME time and the ACTION finally begins 
Oh yeah, Pokemon’s impact was seen on Netflix too
The key to building a Trending Now row is to have a Fast Feedback loop. Netflix is supported in 1000s of devices with each sending various types of data that help improve personalization such as a Play Event or the fact that a video is recommended to a member and not played and so on so forth. We built various data systems that captures these data In real time. Once we have the required data for personalization, we built several Sparl Streaming applications that can read these data in real time and compute the trends data, all in real-time. The Trends data is fed as i/p to our recommender systems which then looks at a member’s taste and personalized
Lets do a little more deeper dive into the architecture. We capture all user interactions within our service. For ex: the videos that are recommended to our members and shown in their view port is captured by the Impression Service. Similarly, the videos that are played by our members are captured by the Viewing History Service. Both these data services sends those events into Kafka. Spark Streaming jobs consume these events from Kafka and compute the Trends data required for building the Trending Now row. This trends data is persisted into Cassandra. Again, we use the Spark Cassandra connector in the Spark Streaming job to batch write all the Trends data into Cassandra. The one thing that we had to configure in the connector was the connection timeout, which is different for a Streaming job. The Trends data is then combined with data from services such as Viewing History and Ratings and fed as input to the Model Training job. The output of which is a model that is consumed by online services to do the Personalized Trending Now recommendations for the next time interval.
We also use Cassandra for managing the state of our spark streaming jobs. So what is a State in Spark Streaming? Think of it as a simple Key Value pair that gets updated continuously as events happen in real time. Let’s say we simply want to the count the number of times a video is played. In that case, videoId and the count forms the state. Spark provides a way to bootstrap the state when your streaming job is restarted or started for the first time.
For Trending now, as we read events from Kafka, we first checks if the State is Present. If It did, we use the data from the existing state and the new event that was read from Kafka, perform computations and update the new state back into Cassandra. If not, it would load the State data from Cassandra and load into Spark as part of bootstrapping.
Lets look into the data model that we used for Trending Now. The 2 main requirements to address is 1) Trends data is applicable only a specific interval of time and that 2) we should optimize for both batch writes and batch reads as model training happens inside Spark
It’s kind’a obvious that we had to create separate column families for separate interval of times primarily because given the time interval, we would need only the data from that interval. VideoId along with some metadata such as Country, Timezone formed the row key. We had 2 columns that contained the Plays and Impressions data for each video.
With that, I would like to now introduce Roopa who will walk us through few more use cases for using Spark + Cassandra and our learning's from it. You just saw two main data driven spark cassandra use cases. Once Prassan’s ab testing is a success , it needs to be rolled out to ALL members and there is a huge growth in dataset which needs a bigger Cassandra cluster.
You want to move bulk dataset from one cluster to another. How would you go about doing it in a fast and quick way? Meet forklift- why is it called forklift? We are moving data across the clusters! Lets look into the architecture in detail now
Meson is Netflix built - general purpose workflow orchestration and scheduling framework. It manages the lifecycle of several ML pipelines that execute workloads across heterogeneous systems. This same framework was used for the forklift too. Spark Cassandra Connector This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications. Mesos provides task isolation and excellent abstraction of CPU, memory, storage, and other compute resources. Meson leverages these features to achieve scale and fault tolerance for its tasks. Spark jobs submitted from Meson share the same Mesos slaves to run the tasks.
What makes this extension different? The nodes are not being doubled which cassandra can do well for you, instead they are being increased by few percentage. We don’t use vnodes, so the only option for adding capacity to the cluster is either doubling or creating a new cluster and populating the data. We are taking about clusters having hunderds of nodes and doubling always does not work. Forklift comes in very handy for this type of use case.
We were very early adaptors of Cassandra and started using it in 0.5 version in production. So of course we were using thrift and all our streaming microservices which were built over the years were based on thrift’s schemaless design and used to access cassandra. With the advent of CQL there are apps which want to use the richer datamodel of CQL and migrate to cql for better performance. Forklifter plays a great job with the migration since you can map the datamodel and transform from source to destination in the relevant format.
This is another use case we use forklift for, where for certain big clusters, instead of replacing nodes one at a time that would take weeks, we create a new cluster in trusty and forklift the data after the dual writes are enabled.
Performance
We get good support from datastax and one of the options was using DSE spark instead of running datastax spark connector talking to cassandra. But performance of our cassandra clusters was a concern since these clusters are used in the path of streaming serving all the members watch great streaming content, have very strict SLA’s. Running spark along with cassandra would constraint the limited resources we have in AWS and was a big concern.
Cassandra is statefull, and if we had spark and cassandra running together, its not easy to scale up the cluster when you are running into resource constrains. With spark running seperately we can scale up and down the spark cluster with out affecting cassandra
Running the spark and cassandra clusters are cost effective too, since we can use the instances from the shared pool and release them when the job is complete.
Can lead to NPE - An `TTL` of 0 when written becomes a `null` in C* When read, this `TTL` becomes a `null`  The `null` cannot be written back to C* as `TTL` Fixed in 3.1- we used a workaround of translating the data when reading from source and writing to destination. -------- In thrift you could define the column level TTL and different columns could have different ttl’s. In cql there is a row TTL and there is no way to define column TTL in a single mutation. SO when you are copying data from thrift to CQL you would need to split the writes into multiple mutations by batching.
Input.split.size_in_mb uses a internal system table in C* ( >= 2.1.5) to determine the size of the data in C*. The table is called system.size_estimates is not meant to be absolutely accurate so there will be some inaccuracy with smaller tables and split sizes. When you use spark cassandra connector cassandraTable() function to load data from Cassandra to Spark it will automatically create Spark partitions aligned to the Cassandra partition key. It will try to create an appropriate number of partitions by estimating the size of the table and dividing this by the parameter spark.cassandra.input.split.size_in_mb (64mb by default). (One instance where the default will need changing is if you have a small source table – in that case use withReadConf() to override the parameter.)
conf spark.cassandra.connection.keep_alive_ms, 5000 =900000 , Period of time to keep unused connections open -- conf spark.cassandra.connection.timeout_ms default 5000, - we put 50 Maximum period of time to attempt connecting to a node -- conf spark.driver.maxResultSize=default 1G but we had it 4g Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.
This usually means that the size of the partitions you are attempting to create are larger than the executor's heap can handle. Remember that all of the executors run in the same JVM so the size of the data is multiplied by the number of executor slots. increase the heap size of the executors spark.executor.memory or shrink the size of the partitions by decreasing spark.cassandra.input.split.size_in_mb
cassandra.output.batch.size.bytes Default = 1024. Maximum total size of the batch in bytes. Overridden by spark.cassandra.output.batch.size.rows cassandra.output.batch.size.rows (default: auto – batch size determined by size.byts): Number of rows per single batch. The default is 'auto' which means the connector will adjust the number of rows based on the amount of data in each row cassandra.output.concurrent.writes (default: 5) Maximum number of batches executed in parallel by a single Spark task cassandra.output.throughput_mb_per_sec (default: unlimited): Maximum write throughput allowed per single core in MB/s. Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability
Spark is able to issue write requests much more quickly than Cassandra can handle them. This can lead to GC issues and build up of hints. - Version 1.2 higher -cassandra.output.throughput_mb_per_sec - Allows you to control the amount of data written to C* per Spark core per second. If this is the case with your application, older versions- try lowering the number of concurrent writes and the current batch size using the following options. spark.cassandra.output.batch.size.rows spark.cassandra.output.concurrent.writes
I would like you to leave you all with this thought - Spark Cassandra connector library makes it very easy to create spark applications that need access to Cassandra!! We have used in and seen, you all do too. If you are excited about the ML algorithms and how we can go back to making the very first desire, of customer clicking Netflix and their favourite movie or TV show starts playing or you are super excited about the scale and challenges in providing persistence as a service, do talk to Pransanna or Me as we are always looking for great talent to join our teams!!