17. Dados Primitivas Column Family chave nome : valor nome : valor nome :valor chave nome : valor nome : valor nome : valor chave nome : valor nome: valor chave nome : valor nome : valor nome : valor nome : valor
18.
19. Usuários Column Family Usuários alice nome : Allison senha : * bob nome : Roberto senha : * eve nome : Evelyn senha : * chuck nome : Carlos senha : * site : datastax.com
20. Estado Column Family Tweets LMNO timestamp : 175695372 corpo : estou tão equivicado? user : carlos AXML timestamp : 125695372 corpo: recebeu a minha mensagem? user : alice DEXDL timestamp : 155695372 corpo: ouvi o que diz. user : eve BADFO timestamp : 135695372 corpo : manda-me a senha. user : mallory
21.
22. Quem Segue Alice? Column Family Followers alice bob : mallory : trent :
23. Quem é seguido por Alice? Column Family Following alice bob : carlos : david :
24. UserLine do Alice Column Family UserLine alice 125695372 : AXML 126695372 : XCVL 127695372 : XENY bob 125795372 : SDFG 126895372 : XCVN eve 125694372 : FDHL arthur 125600000 : AXML 125800000 : XCVL 127900000 : XENY
25. TimeLine do Alice Column Family Timeline bob 125795372 : SDFG 126895372 : XCVN eve 1279900000 : FDHL arthur 125600000 : AXML 125800000 : XCVL 127900000 : XENY alice 125795372 : SDFG 125800000 : XCVL 1279900000 : FDHL
26. Bob da um novo estado Column Family Tweets LUMP timestamp : 200000000 body : Oi Alice user : bob AXML timestamp : 125695372 body : recebeu a minha mensagem? user : alice DEXDL timestamp : 155695372 body : ouvi o que diz. user : eve BADFO timestamp : 135695372 body : manda-me a senha! user : mallory
27. UserLine e TimeLine Column Family UserLine Column Family TimeLine bob 127695372 : XENY 126695372 : XCVL 200000000 : LUMP alice 125795372 : SDFG 125800000 : XCVL 1279900000 : FDHL 200000000 : LUMP 125695372 : AXML bob 125795372 : SDFG 126895372 : XCVN 200000000 : LUMP eve 1279900000 : FDHL 200000000 : LUMP
28. Usando o Timeline Column Family Tweets alice LMNO timestamp : 175695372 body : am I always the third wheel? user : chuck AXML timestamp : 125695372 body : bob did you get my message? user : alice DEXDL timestamp : 155695372 body : I heard what you said user : eve BADFO timestamp : 135695372 body : send me your password! user : mallory
----- Meeting Notes (10/21/11 14:57) ----- Para compensar nos creamos um outro nó e escrivamos a ele a informação de duplo. Talves podemos tambem ler a data deste nó.
----- Meeting Notes (10/21/11 14:57) ----- Quando precisamos ----- Meeting Notes (10/21/11 15:28) ----- Até que nos creamos mas um nó para sirvilr os clientes que querm ler mas dados.
----- Meeting Notes (10/21/11 15:28) ----- Até que alguma coisa mais falha. E se a falha e o mestre, temos uma problema. Nos temos que pesquisar. É o mestre realmente falhado? Talves e só occupado, o rede
Apache Cassandra is a flexible solution for modeling all types of data and scenarios. It is a Big Table based data store. Cassandra ’ s adaptation of BigTable has given it a flexible data model that fits many use cases while maintaining the scalability that the Amazon Dynamo based architecture affords.
Filhera de colunas In a column based database the smallest logical unit of data is called a ‘ column ’ (slide here will animate the creation of a column by combining a nome and a valor into a single element). Not to be confused with the traditional RDBMS style column, this element consists of both a nome, and a valor. Columns can be combined together to create rows (animation). These rows are assigned a key. A key combined with a set of columns constitutes a row. Column families consist of a set of rows (animation), and some metadata. Column families determine how columns are sorted in rows. Most non-trivial use cases of Apache Cassandra leverage this sorting mechanism to their advantage to create indexes and materialized views of data stored in the system.
A widely used example application for Cassandra is Twissandra. Twissandra is an example implementation that demonstrates how you might build a social media application on Cassandra.
To build this model in Cassandra we first need to build a column family to hold our user data. This column family looks much like a table that you would see in any relational database. We include the nome of the user, email, and a few other bits of information all keyed by the usernome. Note that most of the columns are the same between the rows, but they can be different (animate adding the rows). In this column family we really don ’ t leverage the sorting of the columns in any meaningful way.
We also need a column family to hold the data from the user ’ s activity on the website. These status updates are rows that consist of a randomly generated key, a timestamp, a body, and a user alias (animate the creation of this column family). (2:25)
For the model to be useful we need to be able to query data from Cassandra. Queries in this system answer one of four questions. Who is following a user? Who does a user follow? What are the status updates for a user in the order they were entered? And finally, give me all of the status updates of a user and those people he follows in the order in which they were added to the system.
To answer the question: Who is following a user we create a ‘ Followers ’ column family? We create a column family that consists of a key representing the user, and a set of columns that have the nomes of all of the followers of this user. (animation)
The model for the ‘ Following ’ column family is exactly the same as the model for the ‘ Followers ’ . We have a key representing the user, and for ever other individual he follows we add a user id representing the people he is following.
The ‘ UserLine ’ answers this question: in a given time frame what status updates have I made. To construct this column family we will again use the user ’ s nome as the key for this row. As this user updates his status we insert a new column into this row. The new column will consist of a timestamp (for the nome) and a unique identifier as a reference to the status update.
Finally the ‘ Timeline ’ helps us to answer the question: give me the status updates for a user, and everyone he follows. This is modeled exactly like the userline, except the tweets inserted into this column family will consist not only of the user ’ s tweets, but also the tweets of everyone who is followed.
In a live system when Bob logs into the Twissandra website and enters a new Status update. A new row is added to the Status column family.
Subsequently the timestamp and identifier for that tweet are inserted into Bob ’ s row in the UserLine. After Bob ’ s userline is updated the tweet is propagated to his timeline and the time lines of each of his followers. In this example Alice, Eve, and Bob receive updates in their timelines.
Over time Alice ’ s timeline may have hundreds of thousands of entries. To query her timeline efficiently Cassandra allows us to take a slice of entries rather than all of the items in the key. After retrieving the entries in the timeline we can request the status updates in timestamp order from the Statuses Column Family in bulk.
The data model in Apache Cassandra is both simple and powerful. Hopefully this tutorial helps you to understand the role of Column Families in Apache Cassandra. How they can be used to both model data in your application, and supply query results beyond simply getting and setting valors in the database.
X compression (Pavel) X leveldb (Ben) X memtable thresholds (Jonathan) X performance (Jonathan) - slice reads 25% (ArrayBackedSortedColumns) - 2498 - Network performance (1788) - arena allocation -> writes - misc (Jonathan) - windows service - hinted handoff improvements - multi-threaded compaction - node replacement - new CQL clients
Pavel Yaskevitch
1. Novas regras para nos quando nos creamos novas filheras em memtables. Na passado todas as novas allocações forem o tamanho dos dados. E agora são de um tamanho fixo. Isto reduza a fragmentação no ‘ Heap ’ No Passandra Collecção do lixo ia reclamar velhos archivos de dados no disque. No 1.0 ela vai libertar os archivos immediatamente depois que são utilizados.
Cassandra 1.0 optimizes slice reads by using a lighter-weight data structure for representing a row fragment from a read, than for a row fragment in a memtable into which we are accumulating updates. This results in about a 25% improvement in throughput. With named reads we can make a bigger improvement by only deserializing the most recent versions of the requested columns. This sounds obvious, and in fact Cassandra did this all the way back in 0.3. The trick is dealing with data that arrives in the “ wrong ” order because of network partitions, node failures, and so forth. Early Cassandra couldn ’ t do this, so we removed the optimization in 0.4, and finally added it back correctly for 1.0.
Hinted handoff is intended to be more reliable, although it is still a best effort system. You could still loose your hints They are now stored as serialized mutations in the hints table rather than requiring a seek into the data directory to reply the hint itself. Node Replacement was previously a two part process. We have now added a switch that you can use on the startup of a node to replace a node directly. 0.8 introduced concurrent compaction, in 1.0 we introduced multi-threaded compaction where multiple rows can be compacted by a single thread.