SlideShare a Scribd company logo
1 of 34
Download to read offline
Introduction to
Tokyo Products

          Mikio Hirabayashi
              <hirarin@gmail.com>
Tokyo Products
• Tokyo Cabinet
  – database library
• Tokyo Tyrant                                    applications
                                                                               Prome
  – database server              custom storage    Tyrant           Dystopia
                                                                                nade


• Tokyo Dystopia                                        Cabinet
  – full-text search engine
                                                      file system
• Tokyo Promenade
  – content management system


• open source
  – released under LGPL
• powerful, portable, practical
  – written in the standard C, optimized to POSIX
Tokyo Cabinet
 - database library -
Features
• modern implementation of DBM
 – key/value database
   • e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley DB
 – simple library = process embedded
 – Successor of QDBM
   • C99 and POSIX compatible, using Pthread, mmap, etc...
   • Win32 porting is work-in-progress

• high performance
 – insert: 0.4 sec/1M records (2,500,000 qps)
 – search: 0.33 sec/1M records (3,000,000 qps)
• high concurrency
  – multi-thread safe
  – read/write locking by records
• high scalability
  – hash and B+tree structure = O(1) and O(log N)
  – no actual limit size of a database file (to 8 exabytes)
• transaction
  – write ahead logging and shadow paging
  – ACID properties
• various APIs
  – on-memory list/hash/tree
  – file hash/B+tree/array/table
• script language bindings
  – Perl, Ruby, Java, Lua, Python, PHP, Haskell, Erlang, etc...
TCHDB: Hash Database
• static hashing                  bucket array

  – O(1) time complexity

• separate chaining                key           value

  – binary search tree             key           value

  – balances by the second hash    key           value

• free block pool                  key           value

                                   key           value
  – best fit allocation
  – dynamic defragmentation        key           value


• combines mmap and                key

                                   key
                                                 value

                                                 value
  pwrite/pread                     key           value
  – saves calling system calls
                                   key           value
• compression                      key           value
  – deflate(gzip)/bzip2/custom
TCBDB: B+ Tree Database
• B+ tree                                     key   value
  – O(log N) time complexity   B tree index   key   value

• page caching                                key
                                              key
                                                    value
                                                    value
  – LRU removing
                                              key   value
  – speculative search
                                              key   value
• stands on hash DB                           key   value
  – records pages in hash DB                  key   value
  – succeeds time and space                   key   value
    efficiency                                key   value
• custom comparison                           key   value

  function                                    key   value

  – prefix/range matching                     key   value
                                              key   value
• cursor                                      key   value
  – jump/next/prev                            key   value
TCFDB: Fixed-length Database
• array of fixed-
  length elements          array
  – O(1) time complexity     value   value   value   value
  – natural number keys      value   value   value   value
  – addresses records by     value   value   value   value
    multiple of key          value   value   value   value

• most effective             value

                             value
                                     value

                                     value
                                             value

                                             value
                                                     value

                                                     value
  – bulk load by mmap        value   value   value   value
  – no key storage per       value   value   value   value
    record                   value   value   value   value
  – extremely fast and       value   value   value   value
    concurrent
TCTDB: Table Database
• column based
  – the primary key and named
    columns
  – stands on hash DB                                                    bucket array

• flexible structure                   primary key   name value name value name value
  – no data scheme, no data type
  – various structure for each
    record                             primary key   name value name value name value


• query mechanism                      primary key   name value name value name value
  – various operators matching
    column values                      primary key   name value name value name value
  – lexical/decimal orders by column
    values                             primary key   name value name value name value

• column indexes
  –   implemented with B+ tree         primary key   name value name value name value

  –   typed as string/number
  –   inverted index of token/q-gram                            column index
  –   query optimizer
On-memory Structures
• TCXSTR: extensible string
  – concatenation, formatted allocation
• TCLIST: array list (dequeue)
  – random access by index
  – push/pop, unshift/shift, insert/remove
• TCMAP: map of hash table
  – insert/remove/search
  – iterator by order of insertion
• TCTREE: map of ordered tree
  – insert/remove/search
  – iterator by order of comparison function
Other Mechanisms
• abstract database
  – common interface of 6 schema
     • on-memory hash, on-memory tree
     • file hash, file B+tree, file array, file table
  – decides the concrete scheme in runtime

• remote database
  – network interface of the abstract database
  – yes, it's Tokyo Tyrant!

• miscellaneous utilities
  – string processing, filesystem operation
  – memory pool, encoding/decoding
Example Code
#include   <tcutil.h>
#include   <tchdb.h>
#include   <stdlib.h>
#include   <stdbool.h>
#include   <stdint.h>

int main(int argc, char **argv){

  TCHDB *hdb;
  int ecode;
  char *key, *value;

  /* create the object */
  hdb = tchdbnew();

  /* open the database */
  if(!tchdbopen(hdb, "casket.hdb", HDBOWRITER | HDBOCREAT)){
    ecode = tchdbecode(hdb);                                       /* traverse records */
    fprintf(stderr, "open error: %s¥n", tchdberrmsg(ecode));       tchdbiterinit(hdb);
  }                                                                while((key = tchdbiternext2(hdb)) != NULL){
                                                                     value = tchdbget2(hdb, key);
  /* store records */                                                if(value){
  if(!tchdbput2(hdb, "foo", "hop") ||                                  printf("%s:%s¥n", key, value);
     !tchdbput2(hdb, "bar", "step") ||                                 free(value);
     !tchdbput2(hdb, "baz", "jump")){                                }
    ecode = tchdbecode(hdb);                                         free(key);
    fprintf(stderr, "put error: %s¥n", tchdberrmsg(ecode));        }
  }
                                                                   /* close the database */
  /* retrieve records */                                           if(!tchdbclose(hdb)){
  value = tchdbget2(hdb, "foo");                                     ecode = tchdbecode(hdb);
  if(value){                                                         fprintf(stderr, "close error: %s¥n", tchdberrmsg(ecode));
    printf("%s¥n", value);                                         }
    free(value);
  } else {                                                         /* delete the object */
    ecode = tchdbecode(hdb);                                       tchdbdel(hdb);
    fprintf(stderr, "get error: %s¥n", tchdberrmsg(ecode));
  }                                                                return 0;
                                                               }
Tokyo Tyrant
- database server -
Features
• network server of Tokyo Cabinet
 – client/server model
 – multi applications can access one database
 – effective binary protocol
• compatible protocols
 – supports memcached protocol and HTTP
 – available from most popular languages
• high concurrency/performance
 – resolves "c10k" with epoll/kqueue/eventports
 – 17.2 sec/1M queries (58,000 qps)
• high availability
  – hot backup and update log
  – asynchronous replication between servers
• various database schema
  – using the abstract database API of Tokyo Cabinet
• effective operations
  – no-reply updating, multi-record retrieval
  – atomic increment
• Lua extension
  – defines arbitrary database operations
  – atomic operation by record locking
• pure script language interfaces
  – Perl, Ruby, Java, Python, PHP, Erlang, etc...
Asynchronous Replication
 master and slaves                                 dual master
 (load balancing)                                  (fail over)

                          write query

      master server                      client
                                                          client
            database                                                 query if the master is dead
                                                  query
                            read query
           update log                             active master                 standby master
                            with load balancing

                                                      database                        database
replicate
the difference
slave server            slave server                  update log                     update log
                                                                   replicate
      database              database
                                                                   the difference



     update log             update log
Thread Pool Model
                                 epoll/kqueue            listen
accept the client connection
if the event is about the listener           queue   first of all, the listening socket is enqueued into
                                                     the epoll queue
           accept           epoll_ctl(add)
                                                                           queue back if keep-alive
                            epoll_wait

                                                         task manager
                            epoll_ctl(del)
                                                                  queue
                                                     enqueue
          move the readable client socket
          from the epoll queue to the task queue

                                                       deque
                                                                                 worker thread

                                                                                 worker thread

                                                     do each task                worker thread
Lua Extention
• defines DB operations as Lua functions
  – clients send the function name and record data
  – the server returns the return value of the function

• options about atomicity
  – no locking / record locking / global locking


  front end           request           back end
                      - function name
                      - key data            Tokyo Tyrant
                      - value data
    Clients
                                            Tokyo Cabinet   Lua processor
                     response
                     - result data

                                                                  script
                                              database          user-defined
                                                                operations
case: Timestamp DB at mixi.jp
• 20 million records               mod_perl
  – each record size is 20 bytes
                                      home.pl          update
• more than 10,000                  show_friend.pl
                                                                 TT (active)


  updates per sec.                   view_diary.pl
                                                                  database

  – keeps 10,000 connections
                                         search.pl               replication
• dual master                            other pages

  replication                                                     TT (standby)

  – each server is only one                                         database
                                                         fetch
• memcached                         list_friend.pl



  compatible protocol
                                    list_bookmark.pl



  – reuses existing Perl clients
case: Cache for Big Storages
• works as proxy                   clients                        1. inserts to the storage
  – mediates insert/search                                        2. inserts to the cache

  – write through, read through

• Lua extension
                                                 Tokyo Tyrant          MySQL/hBase


  – atomic operation by record                    atomic insert
                                                                          database
    locking                                             Lua
  – uses LuaSocket to access
    the storage                                                           database
                                                 atomic search
• proper DB scheme                                      Lua
  – TCMDB: for generic cache                                              database

  – TCNDB: for biased access
  – TCHDB: for large records                           cache
                                                                          database
    such as image                 1. retrieves from the cache
  – TCFDB: for small records           if found, return
                                  2. retrieves from the storage
    such as timestamp             3. inserts to the cache
Example Code
#include   <tcrdb.h>
#include   <stdlib.h>
#include   <stdbool.h>
#include   <stdint.h>

int main(int argc, char **argv){

  TCRDB *rdb;
  int ecode;
  char *value;

  /* create the object */
  rdb = tcrdbnew();

  /* connect to the server */
  if(!tcrdbopen(rdb, "localhost", 1978)){
    ecode = tcrdbecode(rdb);
    fprintf(stderr, "open error: %s¥n", tcrdberrmsg(ecode));
  }

  /* store records */
  if(!tcrdbput2(rdb, "foo", "hop") ||
     !tcrdbput2(rdb, "bar", "step") ||
     !tcrdbput2(rdb, "baz", "jump")){
    ecode = tcrdbecode(rdb);
    fprintf(stderr, "put error: %s¥n", tcrdberrmsg(ecode));
  }                                                                /* close the connection */
                                                                   if(!tcrdbclose(rdb)){
  /* retrieve records */                                             ecode = tcrdbecode(rdb);
  value = tcrdbget2(rdb, "foo");                                     fprintf(stderr, "close error: %s¥n", tcrdberrmsg(ecode));
  if(value){                                                       }
    printf("%s¥n", value);
    free(value);                                                   /* delete the object */
  } else {                                                         tcrdbdel(rdb);
    ecode = tcrdbecode(rdb);
    fprintf(stderr, "get error: %s¥n", tcrdberrmsg(ecode));        return 0;
  }                                                            }
Tokyo Dystopia
- full-text search engine -
Features
• full-text search engine
  – manages databases of Tokyo Cabinet as an inverted
    index

• combines two tokenizers
  – character N-gram (bi-gram) method
     • perfect recall ratio
  – simple word by outer language processor
     • high accuracy and high performance

• high performance/scalability
  – handles more than 10 million documents
  – searches in milliseconds
• optimized to professional use
  – layered architecture of APIs
  – no embedded scoring system
    • to combine outer scoring system
  – no text filter, no crawler, no language
    processor
• convenient utilities
  – multilingualism with Unicode
  – set operations
  – phrase matching, prefix matching, suffix
    matching, and token matching
  – command line utilities
Inverted Index
• stands on key/value database
  – key = token
     • N-gram or simple word
  – value = occurrence data (posting list)
     • list of pairs of document number and offset in the document

• uses B+ tree database
  – reduces write operations into the disk device
  – enables common prefix search for tokens
  – delta encoding and deflate compression

       ID:21            text: "abracadabra"
          a    -   21:10          ca - 21:1, 21:8
          ab   -   21:0,21:7      da - 21:4
          ac   -   21:3           ra - 21:2, 21:9
          br   -   21:5
Layered Architecture
• character N-gram index
  – "q-gram index" (only index), and "indexed database"
  – uses embedded tokenizer

• word index
  – "word index" (only index), and "tagged index"
  – uses outer tokenizer
                                                Applications
                             Character N-gram Index            Tagging Index

                               indexed database         tagged database

                                 q-gram index              word index

                                              Tokyo Cabinet
case: friend search at mixi.jp
• 20 million records
  – each record size is 1K bytes                   query               user interface
  – name and self introduction
                                      merger
• more than 100 qps                       TT's        social               query

• attribute narrowing                    cache        graph

  – gender, address, birthday                                         searcher
  – multiple sort orders              copy the social graph
                                                                        inverted   attribute

• distributed processing
                                                                          index       DB


  – more than 10 servers              indexer
  – indexer, searchers, merger                                      copy the index and the DB
                                        inverted     attribute
• ranking by social                       index         DB


  graph                                                          dump profile data

  – the merger scores the result by
    following the friend links                                    profile DB
Example Code
#include   <dystopia.h>
#include   <stdlib.h>
#include   <stdbool.h>
#include   <stdint.h>

int main(int argc, char **argv){
  TCIDB *idb;
  int ecode, rnum, i;
  uint64_t *result;
  char *text;
                                                                   /* search records */
  /* create the object */                                          result = tcidbsearch2(idb, "john || thomas", &rnum);
  idb = tcidbnew();                                                if(result){
                                                                     for(i = 0; i < rnum; i++){
  /* open the database */                                              text = tcidbget(idb, result[i]);
  if(!tcidbopen(idb, "casket", IDBOWRITER | IDBOCREAT)){               if(text){
    ecode = tcidbecode(idb);                                             printf("%d¥t%s¥n", (int)result[i], text);
    fprintf(stderr, "open error: %s¥n", tcidberrmsg(ecode));             free(text);
  }                                                                    }
                                                                     }
  /* store records */                                                free(result);
  if(!tcidbput(idb, 1, "George Washington") ||                     } else {
     !tcidbput(idb, 2, "John Adams") ||                              ecode = tcidbecode(idb);
     !tcidbput(idb, 3, "Thomas Jefferson")){                         fprintf(stderr, "search error: %s¥n", tcidberrmsg(ecode));
    ecode = tcidbecode(idb);                                       }
    fprintf(stderr, "put error: %s¥n", tcidberrmsg(ecode));
  }                                                                /* close the database */
                                                                   if(!tcidbclose(idb)){
                                                                     ecode = tcidbecode(idb);
                                                                     fprintf(stderr, "close error: %s¥n", tcidberrmsg(ecode));
                                                                   }

                                                                   /* delete the object */
                                                                   tcidbdel(idb);

                                                                   return 0;
                                                               }
Tokyo Promenade
- content management system -
Features
• content management system
  – manages Web contents easily with a browser
  – available as BBS, Blog, and Wiki

• simple and logical interface
  – aims at conciseness like LaTeX
  – optimized for text browsers such as w3m and Lynx
  – complying with XHTML 1.0 and considering WCAG 1.0

• high performance/throughput
  – implemented in pure C
  – uses Tokyo Cabinet and supports FastCGI
  – 0.836ms/view (more than 1,000 qps)
• sufficient functionality
  – simple Wiki formatting
  – file uploader and manager
  – user authentication by the login form
  – guest comment authorization by a riddle
  – supports the sidebar navigation
  – full-text/attribute search, calendar view
  – Atom feed
• flexible customizability
  – thorough separation of logic and presentation
  – template file to generate the output
  – server side scripting by the Lua extension
  – post processing by outer commands
Example Code
#!   Introduction to Tokyo Cabinet
#c   2009-11-05T18:58:39+09:00
#m   2009-11-05T18:58:39+09:00
#o   mikio
#t   database,programming,tokyocabinet

This article describes what is [[Tokyo
Cabinet|http://1978th.net/tokyocabinet/]] and
how to use it.

@ upfile:1257415094-logo-ja.png

* Features

- modern implementation of DBM
-- key/value database
-- e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley
DB
- simple library = process embedded
- Successor of QDBM
-- C99 and POSIX compatible, using Pthread,
mmap, etc...
-- Win32 porting is work-in-progress
- high performance
- insert: 0.4 sec/1M records (2,500,000 qps)
- search: 0.33 sec/1M records (3,000,000 qps)
innovating more and yet more...
               http://1978th.net/
Introduction to Tokyo Products

More Related Content

What's hot

Docker Compose入門~今日から始めるComposeの初歩からswarm mode対応まで
Docker Compose入門~今日から始めるComposeの初歩からswarm mode対応までDocker Compose入門~今日から始めるComposeの初歩からswarm mode対応まで
Docker Compose入門~今日から始めるComposeの初歩からswarm mode対応までMasahito Zembutsu
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムKouhei Sutou
 
DockerとPodmanの比較
DockerとPodmanの比較DockerとPodmanの比較
DockerとPodmanの比較Akihiro Suda
 
いまさら聞けないselectあれこれ
いまさら聞けないselectあれこれいまさら聞けないselectあれこれ
いまさら聞けないselectあれこれlestrrat
 
BuildKitによる高速でセキュアなイメージビルド
BuildKitによる高速でセキュアなイメージビルドBuildKitによる高速でセキュアなイメージビルド
BuildKitによる高速でセキュアなイメージビルドAkihiro Suda
 
TensorFlow XLAは、 中で何をやっているのか?
TensorFlow XLAは、 中で何をやっているのか?TensorFlow XLAは、 中で何をやっているのか?
TensorFlow XLAは、 中で何をやっているのか?Mr. Vengineer
 
急速に進化を続けるCNIプラグイン Antrea
急速に進化を続けるCNIプラグイン Antrea 急速に進化を続けるCNIプラグイン Antrea
急速に進化を続けるCNIプラグイン Antrea Motonori Shindo
 
Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用Motonori Shindo
 
トランザクションをSerializableにする4つの方法
トランザクションをSerializableにする4つの方法トランザクションをSerializableにする4つの方法
トランザクションをSerializableにする4つの方法Kumazaki Hiroki
 
コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)
コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)
コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)NTT DATA Technology & Innovation
 
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMPRTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMPMasashi Shibata
 
DSIRNLP #3 LZ4 の速さの秘密に迫ってみる
DSIRNLP #3 LZ4 の速さの秘密に迫ってみるDSIRNLP #3 LZ4 の速さの秘密に迫ってみる
DSIRNLP #3 LZ4 の速さの秘密に迫ってみるAtsushi KOMIYA
 
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)NTT DATA Technology & Innovation
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPFShuji Yamada
 
Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Etsuji Nakai
 
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)NTT DATA Technology & Innovation
 
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Ageクラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native AgeYoichi Kawasaki
 
実環境にTerraform導入したら驚いた
実環境にTerraform導入したら驚いた実環境にTerraform導入したら驚いた
実環境にTerraform導入したら驚いたAkihiro Kuwano
 

What's hot (20)

Docker Compose入門~今日から始めるComposeの初歩からswarm mode対応まで
Docker Compose入門~今日から始めるComposeの初歩からswarm mode対応までDocker Compose入門~今日から始めるComposeの初歩からswarm mode対応まで
Docker Compose入門~今日から始めるComposeの初歩からswarm mode対応まで
 
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システムMySQL・PostgreSQLだけで作る高速あいまい全文検索システム
MySQL・PostgreSQLだけで作る高速あいまい全文検索システム
 
DockerとPodmanの比較
DockerとPodmanの比較DockerとPodmanの比較
DockerとPodmanの比較
 
HTTP/2 入門
HTTP/2 入門HTTP/2 入門
HTTP/2 入門
 
いまさら聞けないselectあれこれ
いまさら聞けないselectあれこれいまさら聞けないselectあれこれ
いまさら聞けないselectあれこれ
 
Docker Compose 徹底解説
Docker Compose 徹底解説Docker Compose 徹底解説
Docker Compose 徹底解説
 
BuildKitによる高速でセキュアなイメージビルド
BuildKitによる高速でセキュアなイメージビルドBuildKitによる高速でセキュアなイメージビルド
BuildKitによる高速でセキュアなイメージビルド
 
TensorFlow XLAは、 中で何をやっているのか?
TensorFlow XLAは、 中で何をやっているのか?TensorFlow XLAは、 中で何をやっているのか?
TensorFlow XLAは、 中で何をやっているのか?
 
急速に進化を続けるCNIプラグイン Antrea
急速に進化を続けるCNIプラグイン Antrea 急速に進化を続けるCNIプラグイン Antrea
急速に進化を続けるCNIプラグイン Antrea
 
Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用Tanzu Mission Control における Open Policy Agent (OPA) の利用
Tanzu Mission Control における Open Policy Agent (OPA) の利用
 
トランザクションをSerializableにする4つの方法
トランザクションをSerializableにする4つの方法トランザクションをSerializableにする4つの方法
トランザクションをSerializableにする4つの方法
 
コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)
コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)
コンテナとimmutableとわたし。あとセキュリティ。(Kubernetes Novice Tokyo #15 発表資料)
 
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMPRTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
RTMPのはなし - RTMP1.0の仕様とコンセプト / Concepts and Specification of RTMP
 
DSIRNLP #3 LZ4 の速さの秘密に迫ってみる
DSIRNLP #3 LZ4 の速さの秘密に迫ってみるDSIRNLP #3 LZ4 の速さの秘密に迫ってみる
DSIRNLP #3 LZ4 の速さの秘密に迫ってみる
 
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)
 
10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF10分でわかる Cilium と XDP / BPF
10分でわかる Cilium と XDP / BPF
 
Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門Linux女子部 systemd徹底入門
Linux女子部 systemd徹底入門
 
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
 
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Ageクラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
 
実環境にTerraform導入したら驚いた
実環境にTerraform導入したら驚いた実環境にTerraform導入したら驚いた
実環境にTerraform導入したら驚いた
 

Similar to Introduction to Tokyo Products

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemCloudera, Inc.
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsKorea Sdec
 
Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2John Ashmead
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014Avinash Ramineni
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandraShun Nakamura
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Chris Fregly
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksData Con LA
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014clairvoyantllc
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTWsunnygleason
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniquesmark_landry
 

Similar to Introduction to Tokyo Products (20)

Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Doug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop EcosystemDoug Cutting on the State of the Hadoop Ecosystem
Doug Cutting on the State of the Hadoop Ecosystem
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
SDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and modelsSDEC2011 NoSQL concepts and models
SDEC2011 NoSQL concepts and models
 
Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2Ruby on Rails & PostgreSQL - v2
Ruby on Rails & PostgreSQL - v2
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015Spark After Dark - LA Apache Spark Users Group - Feb 2015
Spark After Dark - LA Apache Spark Users Group - Feb 2015
 
Spark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of DatabricksSpark after Dark by Chris Fregly of Databricks
Spark after Dark by Chris Fregly of Databricks
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Modern classification techniques
Modern classification techniquesModern classification techniques
Modern classification techniques
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Introduction to Tokyo Products

  • 1. Introduction to Tokyo Products Mikio Hirabayashi <hirarin@gmail.com>
  • 2. Tokyo Products • Tokyo Cabinet – database library • Tokyo Tyrant applications Prome – database server custom storage Tyrant Dystopia nade • Tokyo Dystopia Cabinet – full-text search engine file system • Tokyo Promenade – content management system • open source – released under LGPL • powerful, portable, practical – written in the standard C, optimized to POSIX
  • 3. Tokyo Cabinet - database library -
  • 4. Features • modern implementation of DBM – key/value database • e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley DB – simple library = process embedded – Successor of QDBM • C99 and POSIX compatible, using Pthread, mmap, etc... • Win32 porting is work-in-progress • high performance – insert: 0.4 sec/1M records (2,500,000 qps) – search: 0.33 sec/1M records (3,000,000 qps)
  • 5. • high concurrency – multi-thread safe – read/write locking by records • high scalability – hash and B+tree structure = O(1) and O(log N) – no actual limit size of a database file (to 8 exabytes) • transaction – write ahead logging and shadow paging – ACID properties • various APIs – on-memory list/hash/tree – file hash/B+tree/array/table • script language bindings – Perl, Ruby, Java, Lua, Python, PHP, Haskell, Erlang, etc...
  • 6. TCHDB: Hash Database • static hashing bucket array – O(1) time complexity • separate chaining key value – binary search tree key value – balances by the second hash key value • free block pool key value key value – best fit allocation – dynamic defragmentation key value • combines mmap and key key value value pwrite/pread key value – saves calling system calls key value • compression key value – deflate(gzip)/bzip2/custom
  • 7. TCBDB: B+ Tree Database • B+ tree key value – O(log N) time complexity B tree index key value • page caching key key value value – LRU removing key value – speculative search key value • stands on hash DB key value – records pages in hash DB key value – succeeds time and space key value efficiency key value • custom comparison key value function key value – prefix/range matching key value key value • cursor key value – jump/next/prev key value
  • 8. TCFDB: Fixed-length Database • array of fixed- length elements array – O(1) time complexity value value value value – natural number keys value value value value – addresses records by value value value value multiple of key value value value value • most effective value value value value value value value value – bulk load by mmap value value value value – no key storage per value value value value record value value value value – extremely fast and value value value value concurrent
  • 9. TCTDB: Table Database • column based – the primary key and named columns – stands on hash DB bucket array • flexible structure primary key name value name value name value – no data scheme, no data type – various structure for each record primary key name value name value name value • query mechanism primary key name value name value name value – various operators matching column values primary key name value name value name value – lexical/decimal orders by column values primary key name value name value name value • column indexes – implemented with B+ tree primary key name value name value name value – typed as string/number – inverted index of token/q-gram column index – query optimizer
  • 10. On-memory Structures • TCXSTR: extensible string – concatenation, formatted allocation • TCLIST: array list (dequeue) – random access by index – push/pop, unshift/shift, insert/remove • TCMAP: map of hash table – insert/remove/search – iterator by order of insertion • TCTREE: map of ordered tree – insert/remove/search – iterator by order of comparison function
  • 11. Other Mechanisms • abstract database – common interface of 6 schema • on-memory hash, on-memory tree • file hash, file B+tree, file array, file table – decides the concrete scheme in runtime • remote database – network interface of the abstract database – yes, it's Tokyo Tyrant! • miscellaneous utilities – string processing, filesystem operation – memory pool, encoding/decoding
  • 12. Example Code #include <tcutil.h> #include <tchdb.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> int main(int argc, char **argv){ TCHDB *hdb; int ecode; char *key, *value; /* create the object */ hdb = tchdbnew(); /* open the database */ if(!tchdbopen(hdb, "casket.hdb", HDBOWRITER | HDBOCREAT)){ ecode = tchdbecode(hdb); /* traverse records */ fprintf(stderr, "open error: %s¥n", tchdberrmsg(ecode)); tchdbiterinit(hdb); } while((key = tchdbiternext2(hdb)) != NULL){ value = tchdbget2(hdb, key); /* store records */ if(value){ if(!tchdbput2(hdb, "foo", "hop") || printf("%s:%s¥n", key, value); !tchdbput2(hdb, "bar", "step") || free(value); !tchdbput2(hdb, "baz", "jump")){ } ecode = tchdbecode(hdb); free(key); fprintf(stderr, "put error: %s¥n", tchdberrmsg(ecode)); } } /* close the database */ /* retrieve records */ if(!tchdbclose(hdb)){ value = tchdbget2(hdb, "foo"); ecode = tchdbecode(hdb); if(value){ fprintf(stderr, "close error: %s¥n", tchdberrmsg(ecode)); printf("%s¥n", value); } free(value); } else { /* delete the object */ ecode = tchdbecode(hdb); tchdbdel(hdb); fprintf(stderr, "get error: %s¥n", tchdberrmsg(ecode)); } return 0; }
  • 14. Features • network server of Tokyo Cabinet – client/server model – multi applications can access one database – effective binary protocol • compatible protocols – supports memcached protocol and HTTP – available from most popular languages • high concurrency/performance – resolves "c10k" with epoll/kqueue/eventports – 17.2 sec/1M queries (58,000 qps)
  • 15. • high availability – hot backup and update log – asynchronous replication between servers • various database schema – using the abstract database API of Tokyo Cabinet • effective operations – no-reply updating, multi-record retrieval – atomic increment • Lua extension – defines arbitrary database operations – atomic operation by record locking • pure script language interfaces – Perl, Ruby, Java, Python, PHP, Erlang, etc...
  • 16. Asynchronous Replication master and slaves dual master (load balancing) (fail over) write query master server client client database query if the master is dead query read query update log active master standby master with load balancing database database replicate the difference slave server slave server update log update log replicate database database the difference update log update log
  • 17. Thread Pool Model epoll/kqueue listen accept the client connection if the event is about the listener queue first of all, the listening socket is enqueued into the epoll queue accept epoll_ctl(add) queue back if keep-alive epoll_wait task manager epoll_ctl(del) queue enqueue move the readable client socket from the epoll queue to the task queue deque worker thread worker thread do each task worker thread
  • 18. Lua Extention • defines DB operations as Lua functions – clients send the function name and record data – the server returns the return value of the function • options about atomicity – no locking / record locking / global locking front end request back end - function name - key data Tokyo Tyrant - value data Clients Tokyo Cabinet Lua processor response - result data script database user-defined operations
  • 19. case: Timestamp DB at mixi.jp • 20 million records mod_perl – each record size is 20 bytes home.pl update • more than 10,000 show_friend.pl TT (active) updates per sec. view_diary.pl database – keeps 10,000 connections search.pl replication • dual master other pages replication TT (standby) – each server is only one database fetch • memcached list_friend.pl compatible protocol list_bookmark.pl – reuses existing Perl clients
  • 20. case: Cache for Big Storages • works as proxy clients 1. inserts to the storage – mediates insert/search 2. inserts to the cache – write through, read through • Lua extension Tokyo Tyrant MySQL/hBase – atomic operation by record atomic insert database locking Lua – uses LuaSocket to access the storage database atomic search • proper DB scheme Lua – TCMDB: for generic cache database – TCNDB: for biased access – TCHDB: for large records cache database such as image 1. retrieves from the cache – TCFDB: for small records if found, return 2. retrieves from the storage such as timestamp 3. inserts to the cache
  • 21. Example Code #include <tcrdb.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> int main(int argc, char **argv){ TCRDB *rdb; int ecode; char *value; /* create the object */ rdb = tcrdbnew(); /* connect to the server */ if(!tcrdbopen(rdb, "localhost", 1978)){ ecode = tcrdbecode(rdb); fprintf(stderr, "open error: %s¥n", tcrdberrmsg(ecode)); } /* store records */ if(!tcrdbput2(rdb, "foo", "hop") || !tcrdbput2(rdb, "bar", "step") || !tcrdbput2(rdb, "baz", "jump")){ ecode = tcrdbecode(rdb); fprintf(stderr, "put error: %s¥n", tcrdberrmsg(ecode)); } /* close the connection */ if(!tcrdbclose(rdb)){ /* retrieve records */ ecode = tcrdbecode(rdb); value = tcrdbget2(rdb, "foo"); fprintf(stderr, "close error: %s¥n", tcrdberrmsg(ecode)); if(value){ } printf("%s¥n", value); free(value); /* delete the object */ } else { tcrdbdel(rdb); ecode = tcrdbecode(rdb); fprintf(stderr, "get error: %s¥n", tcrdberrmsg(ecode)); return 0; } }
  • 22. Tokyo Dystopia - full-text search engine -
  • 23. Features • full-text search engine – manages databases of Tokyo Cabinet as an inverted index • combines two tokenizers – character N-gram (bi-gram) method • perfect recall ratio – simple word by outer language processor • high accuracy and high performance • high performance/scalability – handles more than 10 million documents – searches in milliseconds
  • 24. • optimized to professional use – layered architecture of APIs – no embedded scoring system • to combine outer scoring system – no text filter, no crawler, no language processor • convenient utilities – multilingualism with Unicode – set operations – phrase matching, prefix matching, suffix matching, and token matching – command line utilities
  • 25. Inverted Index • stands on key/value database – key = token • N-gram or simple word – value = occurrence data (posting list) • list of pairs of document number and offset in the document • uses B+ tree database – reduces write operations into the disk device – enables common prefix search for tokens – delta encoding and deflate compression ID:21 text: "abracadabra" a - 21:10 ca - 21:1, 21:8 ab - 21:0,21:7 da - 21:4 ac - 21:3 ra - 21:2, 21:9 br - 21:5
  • 26. Layered Architecture • character N-gram index – "q-gram index" (only index), and "indexed database" – uses embedded tokenizer • word index – "word index" (only index), and "tagged index" – uses outer tokenizer Applications Character N-gram Index Tagging Index indexed database tagged database q-gram index word index Tokyo Cabinet
  • 27. case: friend search at mixi.jp • 20 million records – each record size is 1K bytes query user interface – name and self introduction merger • more than 100 qps TT's social query • attribute narrowing cache graph – gender, address, birthday searcher – multiple sort orders copy the social graph inverted attribute • distributed processing index DB – more than 10 servers indexer – indexer, searchers, merger copy the index and the DB inverted attribute • ranking by social index DB graph dump profile data – the merger scores the result by following the friend links profile DB
  • 28. Example Code #include <dystopia.h> #include <stdlib.h> #include <stdbool.h> #include <stdint.h> int main(int argc, char **argv){ TCIDB *idb; int ecode, rnum, i; uint64_t *result; char *text; /* search records */ /* create the object */ result = tcidbsearch2(idb, "john || thomas", &rnum); idb = tcidbnew(); if(result){ for(i = 0; i < rnum; i++){ /* open the database */ text = tcidbget(idb, result[i]); if(!tcidbopen(idb, "casket", IDBOWRITER | IDBOCREAT)){ if(text){ ecode = tcidbecode(idb); printf("%d¥t%s¥n", (int)result[i], text); fprintf(stderr, "open error: %s¥n", tcidberrmsg(ecode)); free(text); } } } /* store records */ free(result); if(!tcidbput(idb, 1, "George Washington") || } else { !tcidbput(idb, 2, "John Adams") || ecode = tcidbecode(idb); !tcidbput(idb, 3, "Thomas Jefferson")){ fprintf(stderr, "search error: %s¥n", tcidberrmsg(ecode)); ecode = tcidbecode(idb); } fprintf(stderr, "put error: %s¥n", tcidberrmsg(ecode)); } /* close the database */ if(!tcidbclose(idb)){ ecode = tcidbecode(idb); fprintf(stderr, "close error: %s¥n", tcidberrmsg(ecode)); } /* delete the object */ tcidbdel(idb); return 0; }
  • 29. Tokyo Promenade - content management system -
  • 30. Features • content management system – manages Web contents easily with a browser – available as BBS, Blog, and Wiki • simple and logical interface – aims at conciseness like LaTeX – optimized for text browsers such as w3m and Lynx – complying with XHTML 1.0 and considering WCAG 1.0 • high performance/throughput – implemented in pure C – uses Tokyo Cabinet and supports FastCGI – 0.836ms/view (more than 1,000 qps)
  • 31. • sufficient functionality – simple Wiki formatting – file uploader and manager – user authentication by the login form – guest comment authorization by a riddle – supports the sidebar navigation – full-text/attribute search, calendar view – Atom feed • flexible customizability – thorough separation of logic and presentation – template file to generate the output – server side scripting by the Lua extension – post processing by outer commands
  • 32. Example Code #! Introduction to Tokyo Cabinet #c 2009-11-05T18:58:39+09:00 #m 2009-11-05T18:58:39+09:00 #o mikio #t database,programming,tokyocabinet This article describes what is [[Tokyo Cabinet|http://1978th.net/tokyocabinet/]] and how to use it. @ upfile:1257415094-logo-ja.png * Features - modern implementation of DBM -- key/value database -- e.g.) DBM, NDBM, GDBM, TDB, CDB, Berkeley DB - simple library = process embedded - Successor of QDBM -- C99 and POSIX compatible, using Pthread, mmap, etc... -- Win32 porting is work-in-progress - high performance - insert: 0.4 sec/1M records (2,500,000 qps) - search: 0.33 sec/1M records (3,000,000 qps)
  • 33. innovating more and yet more... http://1978th.net/