Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10

149 vues

Publié le

Presentation by Robert Hodges introducing the many ways that ClickHouse can read and write data from other systems, including MySQL, Kafka, S3, and Snowflake.

Publié dans : Logiciels
  • Soyez le premier à commenter

Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10

  1. 1. Polyglot ClickHouse Aka integrating with remote data Robert Hodges 10 September 2020 SF Bay ClickHouse Meetup 1
  2. 2. Introduction to Presenter www.altinity.com Leading software and services provider for ClickHouse Major committer and community sponsor in US and Western Europe Robert Hodges - Altinity CEO 30+ years on DBMS plus virtualization and security. ClickHouse is DBMS #20
  3. 3. Introduction to ClickHouse SQL optimized for analytics Runs on bare metal to cloud Stores data in columns Parallel and vectorized execution Scales to many petabytes Is Open source (Apache 2.0) Is WAY fast on analytic queries a b c d a b c d a b c d a b c d
  4. 4. What do we mean by a polyglot database? 100% Non- Polyglot 100% Polyglot Object Storage RDBMS NoSQL Event Queue App App App App More direct accessMore “translators”
  5. 5. ClickHouse is definitely polyglot S3 ODBC MySQL Kafka Redis HDFS File MongoDB Cassandra JDBC
  6. 6. Accessing remote data from ClickHouse Remote Data Source Database Engine Table Engine Table Function Dictionary ClickHouse
  7. 7. Accessing MySQL data using a database engine
  8. 8. MySQL sample tables request_id* datetime date customer_id sku_id Table: traffic id* name Table: customer id* name Table: sku Database: repl
  9. 9. Access a MySQL database from ClickHouse CREATE DATABASE mysql_repl ENGINE=MySQL( '127.0.0.1:3306', 'repl', 'root', 'secret') use mysql_repl show tables Database engine
  10. 10. Selecting MySQL data from ClickHouse SELECT t.datetime, t.date, t.request_id, t.name customer, s.name sku FROM ( SELECT t.* FROM traffic t JOIN customer c ON t.customer_id = c.id) AS t JOIN sku s ON t.sku_id = s.id WHERE customer_id = 5 ORDER BY t.request_id LIMIT 10 Predicate pushed down to MySQL
  11. 11. Accessing Kafka using a Table Engine
  12. 12. Standard flow from Kafka to ClickHouse Topic Contains messages Kafka Table Engine Encapsulates topic within ClickHouse Materialized View Fetches Rows Target Table Stores Rows
  13. 13. Create target table CREATE TABLE readings ( readings_id Int32 Codec(DoubleDelta, LZ4), time DateTime Codec(DoubleDelta, LZ4), date ALIAS toDate(time), temperature Decimal(5,2) Codec(T64, LZ4) ) Engine = MergeTree PARTITION BY toYYYYMM(time)
  14. 14. Create Kafka Engine table CREATE TABLE readings_queue ( readings_id Int32, time DateTime, temperature Decimal(5,2) ) ENGINE = Kafka SETTINGS kafka_broker_list = 'kafka-headless.kafka:9092', kafka_topic_list = 'readings', kafka_group_name = 'readings_consumer_group1', kafka_num_consumers = '1', kafka_format = 'CSV' Format Connection info
  15. 15. Create materialized view to transfer data CREATE MATERIALIZED VIEW readings_queue_mv TO readings AS SELECT readings_id, time, temperature FROM readings_queue;
  16. 16. Accessing S3 and Snowflake using table functions
  17. 17. Select from S3 CSV to ClickHouse table SET max_insert_threads=32 INSERT INTO sdata SELECT * FROM s3( 'https://s3.us-east-1.amazonaws.com/d1-altinity/data/sdata*.csv', 'aws_access_key_id', 'aws_secret_access_key', 'CSVWithNames', 'DevId Int32, Type String, MDate Date, MDatetime DateTime, Value Float64') Parallelize! Use host/bucket to enable wildcards Format & Schema
  18. 18. Write from ClickHouse to S3 Parquet file INSERT INTO TABLE FUNCTION s3( 'https://d1-altinity.s3.amazonaws.com/data/sdata.parquet', 'aws_access_key_id', 'aws_secret_access_key', 'Parquet', 'DevId Int32, Type String, MDate Date, MDatetime DateTime, Value Float64') SELECT DevId, Type, MDate, MDatetime, Value FROM sdata Where to write What to write Single host does not allow wildcard
  19. 19. Select directly from S3 Parquet SELECT * FROM s3( 'https://d1-altinity.s3.amazonaws.com/data/sdata.parquet', 'aws_access_key_id', 'aws_secret_access_key','Parquet', 'DevId Int32, Type String, MDate Date, MDatetime DateTime, Value Float64') ┌─DevId─┬─Type─┬──────MDate─┬───────────MDatetime─┬─Value─┐ │ 0 │ test │ 2020-08-30 │ 2020-08-30 01:00:00 │ 0 │ │ 0 │ test │ 2020-08-30 │ 2020-08-30 01:00:15 │ 150 │ │ 0 │ test │ 2020-08-30 │ 2020-08-30 01:00:30 │ 300 │ │ 0 │ test │ 2020-08-30 │ 2020-08-30 01:00:45 │ 450 │ . . .
  20. 20. Experimental: Connect ODBC to Snowflake! [snowflake] Description=SnowflakeDB Driver=SnowflakeDSIIDriver Locale=en-US SERVER=gxa99999.snowflakecomputing.com PORT=443 SSL=on ACCOUNT=gxa99999 DATABASE=snowflake_sample_data UID = <user> PWD = <password> Snowflake Data Source
  21. 21. Moving data from Snowflake to ClickHouse CREATE TABLE nation ( N_NATIONKEY UInt64, N_NAME String, N_REGIONKEY UInt64, N_COMMENT String ) ENGINE=Log INSERT INTO nation SELECT * FROM odbc('DSN=snowflake', 'TPCH_SF001', 'NATION') Schema Table Names are case-sensitive!Use Snowflake history in console
  22. 22. Snowflake data, in a ClickHouse near you SELECT N_NATIONKEY, N_NAME, N_REGIONKEY, substring(N_COMMENT, 1, 25) FROM nation LIMIT 5 ┌─N_NATIONKEY─┬─N_NAME────┬─N_REGIONKEY─┬─substring(N_COMMENT, 1, 25)─┐ │ 0 │ ALGERIA │ 0 │ haggle. carefully final │ │ 1 │ ARGENTINA │ 1 │ al foxes promise slyly ac │ │ 2 │ BRAZIL │ 1 │ y alongside of the pendin │ │ 3 │ CANADA │ 1 │ eas hang ironic, silent p │ │ 4 │ EGYPT │ 4 │ y above the carefully unu │ └─────────────┴───────────┴─────────────┴─────────────────────────────┘
  23. 23. Wrap-up
  24. 24. Key Takeaways ● ClickHouse can access a wide range of data stores ● Stability of connectivity varies ○ MySQL is very stable ○ S3 is new, use 20.6 ○ Data stores like Snowflake are experimental ● Try them out, post issues, and make them better ● ClickHouse polyglot capabilities improve constantly ○ MaterializeMySQL engine reads MySQL binlog (20.8)
  25. 25. More Reading ● ClickHouse.tech: ○ Database engines ○ Table engines ○ Table functions ○ Dictionaries ● Altinity blog: https://altinity.com/blog/
  26. 26. Thank you! We are hiring Contacts: info@altinity.com Visit us at: https://www.altinity.com ClickHouse: https://ClickHouse.tech

×