7. 3 配置 Thrift Server 监听的地址
直接留空即可:
<ThriftAddress></ThriftAddress>
4 配置集群的名称
每一个集群的名称都应该是不用的
<ClusterName>gpcuster.cnblogs.com</ClusterName>
5 开启节点自动加入集群的功能
<AutoBootstrap>true</AutoBootstrap>
6 配置数据的备份数
<ReplicationFactor>3</ReplicationFactor>
7 调节 Memory 和 Disk 的性能
需要根据实际的情况来配置,可以参考 Wiki。
2 运行 Cassandra
在每一台节点上,运行 bin/cassandra。如果看到:INFO - Starting up
server gossip,说明启动成功。
8. 3 查看集群运行情况
当所有的节点都运行起来以后,我们可以通过 JMX 查看运行状况:
总结
在 Windows 环境和 Linux 环境下部署 Cassandra 基本都是类似的。只不过在
Linux 环境下 bin 目录中的脚本都能在 Linux 环境下运行,而 Windows 环境下只
有 2 个脚本可以运行。
谈谈 Cassandra 的客户端
最近试用了一段时间 Cassandra,将 Oracle 中的数据导入进来,遇到了问题然后解决问题,收获挺大。
在这个过程中,除了设计一个合理的数据模型,再就是使用 Cassandra API 进行交互了。
Cassandra 在设计的时候,就是支持 Thrift 的,这意味着我们可以使用多种语言开发。
对于 Cassandra 的开发本身而言,这是使用 Thrift 的好处:支持多语言。坏处也是显而易见的:Thrift
API 功能过于简单,不具备在生产环境使用的条件。
在 Cassandra Wiki 页面上,也有基于 Thrift API 开发的更加高级的 API,各个语言都有,具体信息可
以参考:http://wiki.apache.org/cassandra/ClientExamples。
这次只谈谈下面两类 Java 的客户端:
1 Thrift Java API
2 hector
9. Thrift Java API
这个是 Cassandra 自带的最简单的一类 API,这个文件在 apache-cassandra-0.5.1.jar 中包含了。可
以直接使用。我们也可以自己安装一个 Thrift,然后通过 cassandra.thrift 文件自动生成。
如果你要使用 Cassandra,那么我们必须要了解 Thrift API,毕竟所有的其他更加高级的 API 都是基于
这个来包装的。
提供的功能
插入数据
插入数据需要指定 keyspace,ColumnFamily, Column,Key,Value,timestamp 和数据同步级
别。(如何需要了 Cassandra 的解数据模型,可以参考《大话 Cassandra 数据模型》)
/**
* Insert a Column consisting of (column_path.column, value, timestamp) at the given
column_path.column_family and optional
* column_path.super_column. Note that column_path.column is here required, since a
SuperColumn cannot directly contain binary
* values -- it can only contain sub-Columns.
*
* @param keyspace
* @param key
* @param column_path
* @param value
* @param timestamp
* @param consistency_level
*/
public void insert(String keyspace, String key, ColumnPath column_path, byte[] value,
long timestamp, int consistency_level) throws InvalidRequestException,
UnavailableException, TimedOutException, TException;
/**
* Insert Columns or SuperColumns across different Column Families for the same row
key. batch_mutation is a
* map<string, list<ColumnOrSuperColumn>> -- a map which pairs column family names
with the relevant ColumnOrSuperColumn
* objects to insert.
*
* @param keyspace
* @param key
* @param cfmap
10. * @param consistency_level
*/
public void batch_insert(String keyspace, String key,
Map<String,List<ColumnOrSuperColumn>> cfmap, int consistency_level) throws
InvalidRequestException, UnavailableException, TimedOutException, TException;
读取数据
获取一个查询条件精确的值。
/**
* Get the Column or SuperColumn at the given column_path. If no value is present,
NotFoundException is thrown. (This is
* the only method that can throw an exception under non-failure conditions.)
*
* @param keyspace
* @param key
* @param column_path
* @param consistency_level
*/
public ColumnOrSuperColumn get(String keyspace, String key, ColumnPath column_path,
int consistency_level) throws InvalidRequestException, NotFoundException,
UnavailableException, TimedOutException, TException;
/**
* Perform a get for column_path in parallel on the given list<string> keys. The
return value maps keys to the
* ColumnOrSuperColumn found. If no value corresponding to a key is present, the key
will still be in the map, but both
* the column and super_column references of the ColumnOrSuperColumn object it maps to
will be null.
*
* @param keyspace
* @param keys
* @param column_path
* @param consistency_level
*/
public Map<String,ColumnOrSuperColumn> multiget(String keyspace, List<String> keys,
ColumnPath column_path, int consistency_level) throws InvalidRequestException,
UnavailableException, TimedOutException, TException;
获取某一个 keyspace,Key,ColumnFamily,SuperColumn(如果有的话需要指定)下面的相关数
据:只查询 Column 的 name 符合条件的相关数据(SlicePredicate)。
11. /**
* Get the group of columns contained by column_parent (either a ColumnFamily name or
a ColumnFamily/SuperColumn name
* pair) specified by the given SlicePredicate. If no matching values are found, an
empty list is returned.
*
* @param keyspace
* @param key
* @param column_parent
* @param predicate
* @param consistency_level
*/
public List<ColumnOrSuperColumn> get_slice(String keyspace, String key, ColumnParent
column_parent, SlicePredicate predicate, int consistency_level) throws
InvalidRequestException, UnavailableException, TimedOutException, TException;
/**
* Performs a get_slice for column_parent and predicate for the given keys in
parallel.
*
* @param keyspace
* @param keys
* @param column_parent
* @param predicate
* @param consistency_level
*/
public Map<String,List<ColumnOrSuperColumn>> multiget_slice(String keyspace,
List<String> keys, ColumnParent column_parent, SlicePredicate predicate, int
consistency_level) throws InvalidRequestException, UnavailableException,
TimedOutException, TException;
查询 Key 的取值范围(使用这个功能需要使用 order-preserving partitioner)。
/**
* @deprecated; use get_range_slice instead
*
* @param keyspace
* @param column_family
* @param start
* @param finish
* @param count
* @param consistency_level
*/
12. public List<String> get_key_range(String keyspace, String column_family, String start,
String finish, int count, int consistency_level) throws InvalidRequestException,
UnavailableException, TimedOutException, TException;
/**
* returns a subset of columns for a range of keys.
*
* @param keyspace
* @param column_parent
* @param predicate
* @param start_key
* @param finish_key
* @param row_count
* @param consistency_level
*/
public List<KeySlice> get_range_slice(String keyspace, ColumnParent column_parent,
SlicePredicate predicate, String start_key, String finish_key, int row_count, int
consistency_level) throws InvalidRequestException, UnavailableException,
TimedOutException, TException;
查询系统的信息。
/**
* get property whose value is of type string.
*
* @param property
*/
public String get_string_property(String property) throws TException;
/**
* get property whose value is list of strings.
*
* @param property
*/
public List<String> get_string_list_property(String property) throws TException;
/**
* describe specified keyspace
*
* @param keyspace
*/
public Map<String,Map<String,String>> describe_keyspace(String keyspace) throws
NotFoundException, TException;
13. 通过这些操作,我们可以了解到系统的信息。
其中一个比较有意思的查询信息是:token map,通过这个我们可以知道哪些 Cassandra Service 是可
以提供服务的。
删除数据
/**
* Remove data from the row specified by key at the granularity specified by
column_path, and the given timestamp. Note
* that all the values in column_path besides column_path.column_family are truly
optional: you can remove the entire
* row by just specifying the ColumnFamily, or you can remove a SuperColumn or a
single Column by specifying those levels too.
*
* @param keyspace
* @param key
* @param column_path
* @param timestamp
* @param consistency_level
*/
public void remove(String keyspace, String key, ColumnPath column_path, long
timestamp, int consistency_level) throws InvalidRequestException,
UnavailableException, TimedOutException, TException;
这里需要注意的是,由于一致性的问题。这里的删除操作不会立即删除所有机器上的该数据,但是最终会
一致。
程序范例
import java.util.List;
import java.io.UnsupportedEncodingException;
import org.apache.thrift.transport.TTransport;
import org.apache.thrift.transport.TSocket;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.TException;
import org.apache.cassandra.service.*;
public class CClient
{
public static void main(String[] args)
14. throws TException, InvalidRequestException, UnavailableException,
UnsupportedEncodingException, NotFoundException
{
TTransport tr = new TSocket("localhost", 9160);
TProtocol proto = new TBinaryProtocol(tr);
Cassandra.Client client = new Cassandra.Client(proto);
tr.open();
String key_user_id = "逖靖寒的世界";
// insert data
long timestamp = System.currentTimeMillis();
client.insert("Keyspace1",
key_user_id,
new ColumnPath("Standard1", null, "网址".getBytes("UTF-8")),
"http://gpcuster.cnblogs.com".getBytes("UTF-8"),
timestamp,
ConsistencyLevel.ONE);
client.insert("Keyspace1",
key_user_id,
new ColumnPath("Standard1", null, "作者".getBytes("UTF-8")),
"逖靖寒".getBytes("UTF-8"),
timestamp,
ConsistencyLevel.ONE);
// read single column
ColumnPath path = new ColumnPath("Standard1", null, "name".getBytes("UTF-8"));
System.out.println(client.get("Keyspace1", key_user_id, path,
ConsistencyLevel.ONE));
// read entire row
SlicePredicate predicate = new SlicePredicate(null, new SliceRange(new
byte[0], new byte[0], false, 10));
ColumnParent parent = new ColumnParent("Standard1", null);
List<ColumnOrSuperColumn> results = client.get_slice("Keyspace1", key_user_id,
parent, predicate, ConsistencyLevel.ONE);
for (ColumnOrSuperColumn result : results)
{
Column column = result.column;
System.out.println(new String(column.name, "UTF-8") + " -> " + new
String(column.value, "UTF-8"));
}
tr.close();
15. }
}
优点与缺点
优点:简单高效
缺点:功能简单,无法提供连接池,错误处理等功能,不适合直接在生产环境使用。
Hector
Hector 是基于 Thrift Java API 包装的一个 Java 客户端,提供一个更加高级的一个抽象。
程序范例
package me.prettyprint.cassandra.service;
import static me.prettyprint.cassandra.utils.StringUtils.bytes;
import static me.prettyprint.cassandra.utils.StringUtils.string;
import org.apache.cassandra.service.Column;
import org.apache.cassandra.service.ColumnPath;
public class ExampleClient {
public static void main(String[] args) throws IllegalStateException,
PoolExhaustedException,
Exception {
CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
CassandraClient client = pool.borrowClient("localhost", 9160);
// A load balanced version would look like this:
// CassandraClient client = pool.borrowClient(new String[] {"cas1:9160",
"cas2:9160", "cas3:9160"});
try {
Keyspace keyspace = client.getKeyspace("Keyspace1");
ColumnPath columnPath = new ColumnPath("Standard1", null, bytes("网址"));
// insert
keyspace.insert("逖靖寒的世界", columnPath, bytes("http://gpcuster.cnblogs.com"));
// read
Column col = keyspace.getColumn("逖靖寒的世界", columnPath);
19. my name is Kevin
3.修改已存在的环境变量
接上个示例
$ MYNAME=”change name to jack”
$ echo $MYNAME
change name to jack
4.使用 env 命令显示所有的环境变量
$ env
HOSTNAME=localhost.localdomain
SHELL=/bin/bash
TERM=xterm
HISTSIZE=1000
27. Super columns
Super columns are a great way to store one-to-many indexes to other records: make
the sub column names TimeUUIDs (or whatever you'd like to use to sort the index),
and have the values be the foreign key.
不如某个用户的好友,Sub column name 是好友加入时间,Sub column value 是
好友的 ID,可以作为外键关联好友的信息表。
复合 Key 可以等效 Super Columns,列名为时间
Alternatively, we could preface the status keys with the user key, which has less temporal locality.
If we used user_id:status_id as the status key, we could do range queries on the user fragment
to get tweets-by-user, avoiding the need for a user_timeline super column.
In column-orientation, the column names are the data
列名 TimeUUID,列值 JSON 格式,可以解决一些问题
http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
Twitter 怎样使用 Cassandra,Twitter 的 Data Model,Blog 的 Data Model
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model
Digg 提供的一个完整例子
http://wiki.apache.org/cassandra/DataModel
http://wiki.apache.org/cassandra/CassandraLimitations
http://www.hellodba.net/2010/02/cassandra.html
(中文翻译,有出入)介绍 Twitter 的 Data Modal,有借鉴意义
修改 Schema 定义, 2 次重启
https://issues.apache.org/jira/browse/CASSANDRA-44
28. 动态创建 Column Falimy,在不重启服务器下
http://github.com/NZKoz/cassandra_object
启动单个节点的 Cluster
安装 JDK 6
tar -zxvf cassandra-$VERSION.tgz
cd cassandra-$VERSION
sudo mkdir -p /var/log/cassandra
sudo chown -R `whoami` /var/log/cassandra
sudo mkdir -p /var/lib/cassandra
sudo chown -R `whoami` /var/lib/cassandra
修改/bin/cassandra.in.sh 里面的启动端口(-Dcom.sun.management.jmxremote.port=8080)
bin/cassandra -f
查看日志
tail -f /var/log/cassandra/system.log
客户端连接
cd /home/bmb/apache-cassandra-0.5.1
bin/cassandra-cli --host 192.168.2.79 --port 9160
cassandra> set Keyspace1.Standard1['jsmith']['first'] = 'John'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
Value inserted.
cassandra> set Keyspace1.Standard1['jsmith']['age'] = '42'
Value inserted.
cassandra> get Keyspace1.Standard1['jsmith']
(column=age, value=42; timestamp=1249930062801)
(column=first, value=John; timestamp=1249930053103)
(column=last, value=Smith; timestamp=1249930058345)
29. Returned 3 rows.
cassandra>
Java 客户端
原始 Thrift
http://apache.freelamp.com/incubator/thrift/0.2.0-incubating/thrift-0.2.0-incubating.tar.gz
封装
http://github.com/charliem/OCM
手动编译 Java Thrift
cd D:7gPersonalResourcesArchitectureCassandrathrift-0.2.0libjava
ant
hector
http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/
http://github.com/rantav/hector/downloads
OCM
http://github.com/charliem/OCM/downloads
Git
下载 Git
http://kernel.org/pub/software/scm/git/git-1.7.0.3.tar.gz
安装
cd /home/bmb/apache-cassandra-0.5.1/git-1.7.0.3
./configure
make
make install
36. {name: 3, value: "101010101010"},
{name: 123, value: "hello there"},
{name: 976, value: "kjjkbcjkcbbd"},
{name: 832416, value: "kjjkbcjkcbbd"}
如果我们指定排序的类型为 UTF8Type
<!--
ColumnFamily 在 storage-conf.xml 中定义
-->
<ColumnFamily CompareWith="UTF8Type" Name="CF_NAME_HERE"/>
排序后的数据就是这样的:
{name: 123, value: "hello there"},
{name: 3, value: "101010101010"},
{name: 832416, value: "kjjkbcjkcbbd"},
{name: 976, value: "kjjkbcjkcbbd"}
大家可以看到,指定的排序类型不一样,排序的结果也是完全不同的。
对于 SuperColumn,我们有一个额外的排序维度,所以我们可以指定 CompareSubcolumnsWith 来
进行另一个维度的排序类型。
假设我们的原始数据如下:
{ // first SuperColumn from a Row
name: "workAddress",
// and the columns within it
value: {
street: {name: "street", value: "1234 x street"},
city: {name: "city", value: "san francisco"},
zip: {name: "zip", value: "94107"}
}
},
{ // another SuperColumn from same Row
name: "homeAddress",
// and the columns within it
value: {
street: {name: "street", value: "1234 x street"},
city: {name: "city", value: "san francisco"},
zip: {name: "zip", value: "94107"}
37. }
}
然后我们定义 CompareSubcolumnsWith 和 CompareWith 的排序类型都是 UTF8Type,那么排序后
的结果为:
{
// this one's first b/c when treated as UTF8 strings
{ // another SuperColumn from same Row
// This Row comes first b/c "homeAddress" is before "workAddress"
name: "homeAddress",
// the columns within this SC are also sorted by their names too
value: {
// see, these are sorted by Column name too
city: {name: "city", value: "san francisco"},
street: {name: "street", value: "1234 x street"},
zip: {name: "zip", value: "94107"}
}
},
name: "workAddress",
value: {
// the columns within this SC are also sorted by their names too
city: {name: "city", value: "san francisco"},
street: {name: "street", value: "1234 x street"},
zip: {name: "zip", value: "94107"}
}
}
再额外提一句,Cassandra 的排序功能是允许我们自己实现的,只要你继承
org.apache.cassandra.db.marshal.IType 就可以了。
参考文档
WTF is a SuperColumn? An Intro to the Cassandra Data Model
DataModel