C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard

•

3 j'aime•11,219 vues

Building analytics systems is an increasingly common requirement for BI teams inside companies both big and small, and a feat made even more challenging when analytic results have to be produced in real-time. In this presentation the team from MarkedUp Analytics will show you techniques for leveraging Cassandra, Hadoop, and Hive to build a manageable and scalable analytics system capable of handling a wide range of business cases and needs.

Technologie Business

Real Time Analytics with Cassandra, Hive, and Solr

Real Time Analytics with Cassandra, Hive, and Solr
Aaron Stannard, Founder & CEO of MarkedUp

Powerful analytics tools for native apps
Understand your
audience.
Gain valuable data on
your users.
Monitor your
app’s health.
Log errors and crashes
remotely.
Drive
more sales.
Better data = more
revenue.

Real time analytics isn’t inherently
superior or necessary.

Building your own real-time
analytics service with Cassandra
and DataStax Enterprise

Analytics Schema Strategy
•  All
row
keys
should
be

predictable
(not
always
possible)

•  U8lize
physical
sortability
of

columns

•  Use
predictably
sortable
data

types
for
column
names

(integers,
dates)

•  Learn
to
love
composite
keys

•  Batch
muta8ons
are
your
friend

•  Use
distributed
counters
for
real-‐
8me
metrics

•  Use
TTL
for
automa8on
data

expira8on
(if
necessary)

Time Series Schema 1: Bounded Number of Unknowns

Time Series Schema 2: Unbounded Number of Unknowns

Adding Hive and Hadoop to the Mix
Mo’ data, mo’ problems

When is Hadoop necessary?
•  Large volumes of data (100GB+)
•  Queries require retrospective / historical analysis
•  Need consistent results
•  Need to perform multi-stage analysis
•  Speed isn’t a concern (Hadoop is sloooooooooow)

Hadoop on easy mode: Hive
•  SQL abstraction on top of Hadoop (more familiar)
•  Easier to deploy and test
•  Simplifies data warehousing
•  Easy to automatically import data from Cassandra
•  DSE eliminates need for HDFS

Hive Syntax
Query: count the number items where “key” is greater than
100
RDBMS> select key, count(1) from kv1
where key > 100 group by key;
Hive> select key, count(1) from kv1
where key > 100 group by key;

Hive Tips and Tricks
•  Don’t write data from Hive back to a hot Cassandra column family
•  If writing data from Hive to Cassandra, use dedicated column
families
•  You can write to multiple places on a single Hive read (table, CSV
file, etc…)
•  Use sampling to test Hive queries on scaled-down data sets

How do you count millions of
distinct items in real-time?

•  Solr:
Lucene-‐based
indexing
engine

•  Part
of
Apache
Founda8on

•  Full-‐text
search

•  Faceted
search

•  Distributed

•  Integrates
well
with
Cassandra

Questions or Comments?
aaron@markedup.com

hMps://markedup.com/

Contenu connexe

Tendances

963

Annu Ahmed

Impala turbocharge your big data access

Ophir Cohen

Open source big data landscape and possible ITS applications

SoftwareMill

Quark Virtualization Engine for Analytics

DataWorks Summit/Hadoop Summit

Data Analysis on AWS

Paolo latella

Jon Bratseth (VP Architect) @ Verizon Media: The big data world has mature technologies for offline analysis and learning from data, but have lacked options for making data-driven decisions in real time. When it is sufficient to consider a single data point model servers such as TensorFlow serving can be used but in many cases you want to consider many data points to make decisions. This is a difficult engineering problem combining state, distributed algorithms and low latency, but solving it often makes it possible to create far superior solutions when applying machine learning. This talk will explain why this is a hard problem, show the advantages of solving it, and introduce the open source Vespa.ai platform which is used to implement such solutions in some of the largest scale problems in the world including the world's third largest ad serving system.

Big data serving: Processing and inference at scale in real time

Itai Yaffe

Customers using Amazon S3 at large scale benefit greatly from storage management features. Storage lifecycle policies help them reduce storage costs. Cross-region replication makes it easier to copy data between AWS regions for compliance or disaster recovery. Event notifications allow automatic initiation of processes on objects as they arrive, or capture information about objects and log it for security purposes. In this session, you'll learn about these features, and also several new storage management features in Amazon S3 that give users unmatched visibility into what data they are storing and how that data is being used. These new features make it simpler to analyze usage by users, apps, or organizations, to highlight anomalies, and to optimize business process workflows. They also help identify opportunities to reduce costs, improve performance, and archive infrequently used data. In addition, they can provide insight into who is accessing data stored in S3. As part of this talk, AWS customer Pinterest shows how they have been able to leverage many of the new S3 storage management features to reduce their storage costs significantly by moving a large amount of their data from S3 Standard to S3 Standard – Infrequent Access storage.

AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage a...

Amazon Web Services

This session will cover a series of problems that are adequately solved with Apache Spark, as well as those that are require additional technologies to implement correctly. Here’s an example outline of some of the topics that will be covered in the talk: Problems that are perfectly solved with Apache Spark: 1) Analyzing a large set of data files. 2) Doing ETL of a large amount of data. 3) Applying Machine Learning & Data Science to a large dataset. 4) Connecting BI/Visualization tools to Apache Spark to analyze large datasets internally. By Vida Ha at Spark Summit East 2016.

Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...

Databricks

Real time analytics using Hadoop and Elasticsearch

Abhishek Andhavarapu

Join us for a for a Amazon Kinesis tutorial webinar. In this session we will provide a reference architecture and instructions for building a system that performs real-time sliding-windows analysis over streaming clickstream data. We will use Amazon Kinesis for managed ingestion of streaming data at scale with the ability to replay past data, and run sliding-window computation using Apache Storm. We’ll demonstrate in the webinar on how to build the system and deploy on AWS and walkthrough all the steps from ingestion, processing, and storing to visualizing of the data in real-time.

AWS Webcast - Amazon Kinesis and Apache Storm

Amazon Web Services

NoSQL is an important part of many big data strategies. Attend this session to learn how Amazon DynamoDB helps you create fast ingest and response data sets. We demonstrate how to use DynamoDB for batch-based query processing and ETL operations (using a SQL-like language) through integration with Amazon EMR and Hive. Then, we show you how to reduce costs and achieve scalability by connecting data to Amazon ElasticCache for handling massive read volumes. We’ll also discuss how to add indexes on DynamoDB data for free-text searching by integrating with Elasticsearch using AWS Lambda and DynamoDB streams. Finally, you’ll find out how you can take your high-velocity, high-volume data (such as IoT data) in DynamoDB and connect it to a data warehouse (Amazon Redshift) to enable BI analysis.

(BDT313) Amazon DynamoDB For Big Data

Amazon Web Services

Introducing Kafka Connect and Implementing Custom Connectors

Itai Yaffe

"Amgen discovers, develops, manufactures, and delivers innovative human therapeutics, helping millions of people in the fight against serious illnesses. In 2014, Amgen implemented a solution to offload ETL data across a diverse data set (U.S. pharmaceutical prescriptions and claims) using Amazon EMR. The solution has transformed the way Amgen delivers insights and reports to its sales force. To support Amgen’s entry into a much larger market, the ETL process had to scale to eight times its existing data volume. We used Amazon EC2, Amazon S3, Amazon EMR, and Amazon Redshift to generate weekly sales reporting metrics. This session discusses highlights in Amgen's journey to leverage big data technologies and lay the foundation for future growth: benefits of ETL offloading in Amazon EMR as an entry point for big data technologies; benefits and challenges of using Amazon EMR vs. expanding on-premises ETL and reporting technologies; and how to architect an ETL offload solution using Amazon S3, Amazon EMR, and Impala."

(BDT316) Offloading ETL to Amazon Elastic MapReduce

Amazon Web Services

Developing high frequency indicators using real time tick data on apache supe...

Zekeriya Besiroglu

Using Data Lakes

Amazon Web Services

Cloud native data platform

Li Gao

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud

Jaipaul Agonus

Working with big volumes of data is a complicated task, but it's even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS. Learn how you can leverage AWS services like Amazon RDS, AWS CloudFormation, Auto Scaling, Amazon S3, Amazon Glacier, and Amazon Elastic MapReduce to perform highly performant, reliable, real-time big data analytics while saving time, effort, and money. Gain insight from two years of real-time analytics successes and failures so you don't have to go down this path on your own.

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014

Amazon Web Services

Big Data A La Carte Menu

Venkatesh Balakumar

ML on Big Data: Real-Time Analysis on Time Series

Sigmoid

Tendances (20)

963

Impala turbocharge your big data access

Open source big data landscape and possible ITS applications

Quark Virtualization Engine for Analytics

Data Analysis on AWS

Big data serving: Processing and inference at scale in real time

AWS re:Invent 2016: How Amazon S3 Storage Management Helps Optimize Storage a...

Not Your Father's Database: How to Use Apache Spark Properly in Your Big Data...

Real time analytics using Hadoop and Elasticsearch

AWS Webcast - Amazon Kinesis and Apache Storm

(BDT313) Amazon DynamoDB For Big Data

Introducing Kafka Connect and Implementing Custom Connectors

(BDT316) Offloading ETL to Amazon Elastic MapReduce

Developing high frequency indicators using real time tick data on apache supe...

Using Data Lakes

Cloud native data platform

Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud

(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014

Big Data A La Carte Menu

ML on Big Data: Real-Time Analysis on Time Series

Similaire à C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard

Cloudera Impala: The Open Source, Distributed SQL Query Engine for Big Data. The Cloudera Impala project is pioneering the next generation of Hadoop capabilities: the convergence of fast SQL queries with the capacity, scalability, and flexibility of a Apache Hadoop cluster. With Impala, the Hadoop ecosystem now has an open-source codebase that helps users query data stored in Hadoop-based enterprise data hubs in real time, using familiar SQL syntax. This talk will begin with an overview of the challenges organizations face as they collect and process more data than ever before, followed by an overview of Impala from the user's perspective and a dive into Impala's architecture. It concludes with stories of how Cloudera's customers are using Impala and the benefits they see.

Incredible Impala

Gwen (Chen) Shapira

First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA

Tomas Cervenka

Big Data Developers Moscow Meetup 1 - sql on hadoop

bddmoscow

Technologies for Data Analytics Platform

N Masahiro

Review this webinar to learn about Amazon DynamoDB. DynamoDB is a highly scalable, fully managed NoSQL database service. Built for consistent single-digit millisecond latency and high availability, DynamoDB is a great fit for gaming, ad-tech, mobile, and many other applications. Reasons to review: • Learn the fundamentals of DynamoDB • Understand how to design for common access patterns • Discover best practices • Hear how others uses DynamoDB to build their business Who should review: • Software Developers • Database Administrators • Solution Architects • Technical Decision Makers

AWS Webcast - Build high-scale applications with Amazon DynamoDB

Amazon Web Services

Are You Ready? Stepping Up To The Big Data Challenge In 2016 - Learn why Testing is pivotal to the success of your Big Data Strategy. According to a new report by analyst firm IDG, 70% of enterprises have either deployed or are planning to deploy big data projects and programs this year due to the increase in the amount of data they need to manage. The growing variety of new data sources is pushing organizations to look for streamlined ways to manage complexities and get the most out of their data-related investments. The companies that do this correctly are realizing the power of big data for business expansion and growth. Learn why testing your enterprise's data is pivotal for success with big data and Hadoop. Learn how to increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your data - all with one data testing tool.

Testing Big Data: Automated Testing of Hadoop with QuerySurge

RTTS

Avoiding big data antipatterns

grepalex

HBase and Hadoop at Urban Airship

dave_revell

We will start from understanding how Real-Time Analytics can be implemented on Enterprise Level Infrastructure and will go to details and discover how different cases of business intelligence be used in real-time on streaming data. We will cover different Stream Data Processing Architectures and discus their benefits and disadvantages. I'll show with live demos how to build Fast Data Platform in Azure Cloud using open source projects: Apache Kafka, Apache Cassandra, Mesos. Also I'll show examples and code from real projects.

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...

Fwdays

Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...

WebExpo

Apache Eagle - Monitor Hadoop in Real Time

DataWorks Summit/Hadoop Summit

10 Big Data Technologies you Didn't Know About

Jesus Rodriguez

SQL Engines for Hadoop - The case for Impala

markgrover

The session covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.

Big data solutions in Azure

Mostafa

Building Big Data Solutions with Azure Data Lake.10.11.17.pptx

thando80

This presentation deck will cover specific services such as Amazon S3, Kinesis, Redshift, Elastic MapReduce, and DynamoDB, including their features and performance characteristics. It will also cover architectural designs for the optimal use of these services based on dimensions of your data source (structured or unstructured data, volume, item size and transfer rates) and application considerations - for latency, cost and durability. It will also share customer success stories and resources to help you get started.

AWS Webcast - Managing Big Data in the AWS Cloud_20140924

Amazon Web Services

Need for Time series Database

Pramit Choudhary

Apache HBase Workshop

Valerii Moisieienko

The presentation covers how to get started to build big data solutions in Azure. Azure provides different Hadoop clusters for Hadoop ecosystem. The session covers the basic understanding of HDInsight clusters including: Apache Hadoop, HBase, Storm and Spark. The session covers how to integrate with HDInsight in .NET using different Hadoop integration frameworks and libraries. The session is a jump start for engineers and DBAs with RDBMS experience who are looking for a jump start working and developing Hadoop solutions. The session is a demo driven and will cover the basics of Hadoop open source products.

Building Big data solutions in Azure

Mostafa

SQL Server Konferenz 2014 - SSIS & HDInsight

Tillmann Eitelberg

Similaire à C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard (20)

Incredible Impala

First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA

Big Data Developers Moscow Meetup 1 - sql on hadoop

Technologies for Data Analytics Platform

AWS Webcast - Build high-scale applications with Amazon DynamoDB

Testing Big Data: Automated Testing of Hadoop with QuerySurge

Avoiding big data antipatterns

HBase and Hadoop at Urban Airship

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...

Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...

Apache Eagle - Monitor Hadoop in Real Time

10 Big Data Technologies you Didn't Know About

SQL Engines for Hadoop - The case for Impala

Big data solutions in Azure

Building Big Data Solutions with Azure Data Lake.10.11.17.pptx

AWS Webcast - Managing Big Data in the AWS Cloud_20140924

Need for Time series Database

Apache HBase Workshop

Building Big data solutions in Azure

SQL Server Konferenz 2014 - SSIS & HDInsight

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft

DataStax Academy

DataStax Enterprise (DSE) Graph is a built to manage, analyze, and search highly connected data. DSE Graph, built on NoSQL Apache Cassandra delivers continuous uptime along with predictable performance and scales for modern systems dealing with complex and constantly changing data. Download DataStax Enterprise: Academy.DataStax.com/Download Start free training for DataStax Enterprise Graph: Academy.DataStax.com/courses/ds332-datastax-enterprise-graph

Introduction to DataStax Enterprise Graph Database

DataStax Academy

DataStax Enterprise Advanced Replication supports one-way distributed data replication from remote database clusters that might experience periods of network or internet downtime. Benefiting use cases that require a 'hub and spoke' architecture. Learn more at http://www.datastax.com/2016/07/stay-100-connected-with-dse-advanced-replication Advanced Replication docs – https://docs.datastax.com/en/latest-dse/datastax_enterprise/advRep/advRepTOC.html

Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra

DataStax Academy

Cassandra on Docker @ Walmart Labs

DataStax Academy

Cassandra 3.0 Data Modeling

DataStax Academy

Cassandra Adoption on Cisco UCS & Open stack

DataStax Academy

Data Modeling is the one of the first things to sink your teeth into when trying out a new database. That's why we are going to cover this foundational topic in enough detail for you to get dangerous. Data Modeling for relational databases is more than a touch different than the way it's approached with Cassandra. We will address the quintessential query-driven methodology through a couple of different use cases, including working with time series data for IoT. We will also demo a new tool to get you bootstrapped quickly with MovieLens sample data. This talk should give you the basics you need to get serious with Apache Cassandra.

Data Modeling for Apache Cassandra

DataStax Academy

Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice. In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.

Coursera Cassandra Driver

DataStax Academy

Production Ready Cassandra

DataStax Academy

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python

DataStax Academy

Cassandra @ Sony: The good, the bad, and the ugly part 1

DataStax Academy

Cassandra @ Sony: The good, the bad, and the ugly part 2

DataStax Academy

Standing Up Your First Cluster

DataStax Academy

Real Time Analytics with Dse

DataStax Academy

Introduction to Data Modeling with Apache Cassandra

DataStax Academy

Cassandra Core Concepts

DataStax Academy

Enabling Search in your Cassandra Application with DataStax Enterprise

DataStax Academy

Bad Habits Die Hard

DataStax Academy

Advanced Data Modeling with Apache Cassandra

DataStax Academy

Advanced Cassandra

DataStax Academy

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft

Introduction to DataStax Enterprise Graph Database

Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra

Cassandra on Docker @ Walmart Labs

Cassandra 3.0 Data Modeling

Cassandra Adoption on Cisco UCS & Open stack

Data Modeling for Apache Cassandra

Coursera Cassandra Driver

Production Ready Cassandra

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python

Cassandra @ Sony: The good, the bad, and the ugly part 1

Cassandra @ Sony: The good, the bad, and the ugly part 2

Standing Up Your First Cluster

Real Time Analytics with Dse

Introduction to Data Modeling with Apache Cassandra

Cassandra Core Concepts

Enabling Search in your Cassandra Application with DataStax Enterprise

Bad Habits Die Hard

Advanced Data Modeling with Apache Cassandra

Advanced Cassandra

Dernier

Microsoft's Threat Matrix for Kubernetes helps organizations understand the attack surface a Kubernetes deployment introduces to their environments. This ensures that adequate detections and mitigations are in place. By covering over 40 different attacker techniques, defenders can learn about Kubernetes-specific mitigations and controls to deploy to their environments. In this session, we will explore the MS-TA9013 Host Path Mount technique, which is commonly used by attackers to perform privilege escalation in a Kubernetes cluster. Attendees will learn how attackers and defenders can: * Escape the container's host volume mount to gain persistence on an underlying node * Move laterally from the underlying node into the customer's cloud environment * Analyze Kubernetes audit logs to detect pods deployed with a hostPath mount * Deploy an admission controller that prevents new pods from using a hostPath mount

Breaking the Kubernetes Kill Chain: Host Path Mount

Puma Security, LLC

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

Scaling API-first – The story of a global engineering organization

Radu Cotescu

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Sara Mae O’Brien Scott and Tatiana Baquero Cakici, Senior Consultants at Enterprise Knowledge (EK), presented “AI Fast Track to Search-Focused AI Solutions” at the Information Architecture Conference (IAC24) that took place on April 11, 2024 in Seattle, WA. In their presentation, O’Brien-Scott and Cakici focused on what Enterprise AI is, why it is important, and what it takes to empower organizations to get started on a search-based AI journey and stay on track. The presentation explored the complexities of enterprise search challenges and how IA principles can be leveraged to provide AI solutions through the use of a semantic layer. O’Brien-Scott and Cakici showcased a case study where a taxonomy, an ontology, and a knowledge graph were used to structure content at a healthcare workforce solutions organization, providing personalized content recommendations and increasing content findability. In this session, participants gained insights about the following: Most common types of AI categories and use cases; Recommended steps to design and implement taxonomies and ontologies, ensuring they evolve effectively and support the organization’s search objectives; Taxonomy and ontology design considerations and best practices; Real-world AI applications that illustrated the value of taxonomies, ontologies, and knowledge graphs; and Tools, roles, and skills to design and implement AI-powered search solutions.

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Enterprise Knowledge

The presentation explores the development and application of artificial intelligence (AI) from its inception to its current status in the modern world. The term "artificial intelligence" was first coined by John McCarthy in 1956 to describe efforts to develop computer programs capable of performing tasks that typically require human intelligence. This concept was first introduced at a conference held at Dartmouth College, where programs demonstrated capabilities such as playing chess, proving theorems, and interpreting texts. In the early stages, Alan Turing contributed to the field by defining intelligence as the ability of a being to respond to certain questions intelligently, proposing what is now known as the Turing Test to evaluate the presence of intelligent behavior in machines. As the decades progressed, AI evolved significantly. The 1980s focused on machine learning, teaching computers to learn from data, leading to the development of models that could improve their performance based on their experiences. The 1990s and 2000s saw further advances in algorithms and computational power, which allowed for more sophisticated data analysis techniques, including data mining. By the 2010s, the proliferation of big data and the refinement of deep learning techniques enabled AI to become mainstream. Notable milestones included the success of Google's AlphaGo and advancements in autonomous vehicles by companies like Tesla and Waymo. A major theme of the presentation is the application of generative AI, which has been used for tasks such as natural language text generation, translation, and question answering. Generative AI uses large datasets to train models that can then produce new, coherent pieces of text or other media. The presentation also discusses the ethical implications and the need for regulation in AI, highlighting issues such as privacy, bias, and the potential for misuse. These concerns have prompted calls for comprehensive regulations to ensure the safe and equitable use of AI technologies. Artificial intelligence has also played a significant role in healthcare, particularly highlighted during the COVID-19 pandemic, where it was used in drug discovery, vaccine development, and analyzing the spread of the virus. The capabilities of AI in healthcare are vast, ranging from medical diagnostics to personalized medicine, demonstrating the technology's potential to revolutionize fields beyond just technical or consumer applications. In conclusion, AI continues to be a rapidly evolving field with significant implications for various aspects of society. The development from theoretical concepts to real-world applications illustrates both the potential benefits and the challenges that come with integrating advanced technologies into everyday life. The ongoing discussion about AI ethics and regulation underscores the importance of managing these technologies responsibly to maximize their their benefits while minimizing potential harms.

Artificial Intelligence: Facts and Myths

Joaquim Jorge

The Raspberry Pi 5 was announced on October 2023. This new version of the popular embedded device comes with a new iteration of Broadcom’s VideoCore GPU platform, and was released with a fully open source driver stack, developed by Igalia. The presentation will discuss some of the major changes required to support this new Video Core iteration, the challenges we faced in the process and the solutions we provided in order to deliver conformant OpenGL ES and Vulkan drivers. The talk will also cover the next steps for the open source Raspberry Pi 5 graphics stack. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://eoss24.sched.com/event/1aBEx

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Igalia

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

BooK Now Call us at +918448380779 to hire a gorgeous and seductive call girl for sex. Take a Delhi Escort Service. The help of our escort agency is mostly meant for men who want sexual Indian Escorts In Delhi NCR. It should be noted that any impersonator will get 100 attention from our Young Girls Escorts in Delhi. They will assume the position of reliable allies. VIP Call Girl With Original Photos Book Tonight +918448380779 Our Cheap Price 1 Hour not available 2 Hours 5000 Full Night 8000 TAG: Call Girls in Delhi, Noida, Gurgaon, Ghaziabad, Connaught Place, Greater Kailash Delhi, Lajpat Nagar Delhi, Mayur Vihar Delhi, Chanakyapuri Delhi, New Friends Colony Delhi, Majnu Ka Tilla, Karol Bagh, Malviya Nagar, Saket, Khan Market, Noida Sector 18, Noida Sector 76, Noida Sector 51, Gurgaon Mg Road, Iffco Chowk Gurgaon, Rajiv Chowk Gurgaon All Delhi Ncr Free Home Deliver

08448380779 Call Girls In Civil Lines Women Seeking Men

Delhi Call girls

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Delhi Call girls

Real Time Object Detection Using Open CV

Khem

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

Histor y of HAM Radio presentation slide

vu2urc

Handwritten Text Recognition for manuscripts and early printed texts

Maria Levchenko

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

Explore 'The Codex of Business: Writing Software for Real-World Solutions,' a compelling SlideShare presentation that delves into digital transformation in healthcare. Discover through a detailed case study how Agile methodologies empower healthcare providers to develop, iterate, and refine digital solutions that address real-world challenges. Learn how strategic planning, user feedback, and continuous improvement drive success in deploying technologies that enhance patient care and operational efficiency. Ideal for healthcare professionals, IT specialists, and digital transformation advocates seeking actionable insights and practical examples of technology making a real difference.

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Malak Abu Hammad

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Rafal Los

With more memory available, system performance of three Dell devices increased, which can translate to a better user experience Conclusion When your system has plenty of RAM to meet your needs, you can efficiently access the applications and data you need to finish projects and to-do lists without sacrificing time and focus. Our test results show that with more memory available, three Dell PCs delivered better performance and took less time to complete the Procyon Office Productivity benchmark. These advantages translate to users being able to complete workflows more quickly and multitask more easily. Whether you need the mobility of the Latitude 5440, the creative capabilities of the Precision 3470, or the high performance of the OptiPlex Tower Plus 7010, configuring your system with more RAM can help keep processes running smoothly, enabling you to do more without compromising performance.

Boost PC performance: How more available memory can improve productivity

Principled Technologies

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount

Automating Google Workspace (GWS) & more with Apps Script

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Scaling API-first – The story of a global engineering organization

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Artificial Intelligence: Facts and Myths

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

What Are The Drone Anti-jamming Systems Technology?

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Real Time Object Detection Using Open CV

A Year of the Servo Reboot: Where Are We Now?

Histor y of HAM Radio presentation slide

Handwritten Text Recognition for manuscripts and early printed texts

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

The Codex of Business Writing Software for Real-World Solutions 2.pptx

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Boost PC performance: How more available memory can improve productivity

C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard

1. Real Time Analytics with Cassandra, Hive, and Solr

2. Real Time Analytics with Cassandra, Hive, and Solr Aaron Stannard, Founder & CEO of MarkedUp

3. Powerful analytics tools for native apps Understand your audience. Gain valuable data on your users. Monitor your app’s health. Log errors and crashes remotely. Drive more sales. Better data = more revenue.

5. Do we really need real-time analytics?

7. Real time analytics isn’t inherently superior or necessary.

9. Building your own real-time analytics service with Cassandra and DataStax Enterprise

10. Cassandra Setup on EC2

11. Write Strategy

12. Read Strategy

13. Analytics Schema Strategy •  All row keys should be predictable (not always possible) •  U8lize physical sortability of columns •  Use predictably sortable data types for column names (integers, dates) •  Learn to love composite keys •  Batch muta8ons are your friend •  Use distributed counters for real-‐ 8me metrics •  Use TTL for automa8on data expira8on (if necessary)

14. Time Series Schema 0: All Knowns

15. Time Series Schema 1: Bounded Number of Unknowns

16. Time Series Schema 2: Unbounded Number of Unknowns

17. Schema Tips

18. Adding Hive and Hadoop to the Mix Mo’ data, mo’ problems

19. When is Hadoop necessary? •  Large volumes of data (100GB+) •  Queries require retrospective / historical analysis •  Need consistent results •  Need to perform multi-stage analysis •  Speed isn’t a concern (Hadoop is sloooooooooow)

20. Hadoop on easy mode: Hive •  SQL abstraction on top of Hadoop (more familiar) •  Easier to deploy and test •  Simplifies data warehousing •  Easy to automatically import data from Cassandra •  DSE eliminates need for HDFS

21. C* to Hive

22. Hive Syntax Query: count the number items where “key” is greater than 100 RDBMS> select key, count(1) from kv1 where key > 100 group by key; Hive> select key, count(1) from kv1 where key > 100 group by key;

23. Hive Tips and Tricks •  Don’t write data from Hive back to a hot Cassandra column family •  If writing data from Hive to Cassandra, use dedicated column families •  You can write to multiple places on a single Hive read (table, CSV file, etc…) •  Use sampling to test Hive queries on scaled-down data sets

24. How do you count millions of distinct items in real-time?

25. •  Solr: Lucene-‐based indexing engine •  Part of Apache Founda8on •  Full-‐text search •  Faceted search •  Distributed •  Integrates well with Cassandra

26. Solr Index Setup

27. Solr Search

28. Questions or Comments? aaron@markedup.com hMps://markedup.com/

C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard

Similaire à C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard (20)

Plus de DataStax Academy

Plus de DataStax Academy (20)

Dernier

Dernier (20)

C* Summit 2013: High Throughput Analytics with Cassandra by Aaron Stannard