Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Paul Dix
CTO & co-founder, InfluxData
@pauldix
North America Virtual
Experience 2020-11-10
The future of InfluxDB
InfluxDB 2.0 Open Source GA!
November 12, 2013
Introducing InfluxDB,
an open source distributed
time series database
What is time series data?
Stock trades and quotes
Analytics
Log Events
More Events
• Measurements
• Exceptions
• Page Views
• User actions
• Commits
• Deploys
• Things happening in time
Sensor data
Two kinds of time series
data…
Regular time series
t0 t1 t2 t3 t4 t6 t7
Samples at regular intervals
Irregular time series
t0 t1 t2 t3 t4 t6 t7
Events whenever they come in
Things you want to ask questions about,
visualize, or summarize over time.
Where we are today
InfluxDB is great for metrics
InfluxDB is great for analytics*
*on lower cardinality data
InfluxDB open source lacks
distributed features
It’s time to advance…
Requirements
• What cardinality?
• Analytics performance
• Separate compute from storage and tiered storage
• Operator def...
Iterate and Refactor or Rebuild the
Core?
How InfluxDB Organizes Data
Line Protocol
cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
Line Protocol
Measurement
cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
Line Protocol
cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
Tags
Line Protocol
cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
Fields
Line Protocol
cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
nanosecond
epoch
Line Protocol
Series
cpu,host=serverA,num=1,region=west#idle (1.667, 1492214400000000000)
cpu,host=serverA,num=1,region=we...
Inverted Index
Series ID
1 - cpu,host=serverA,num=1,region=west#idle (1.667, 1492214400000000000)
2 - cpu,host=serverB,num...
Every new tag value expands
index
Inverted Index & Time Series
mmap difficulties
Object Store Durability
Towards a new core
short for iron oxide, pronounced (eye-ox)
In-memory columnar database
No storage engine
Parquet + Object Store is huge
Not just object store
Object Store Abstraction
Local
Disk
S3
GCP
Cloud
Storage
In
Memory
Azure
Blob
Storage
Minio Ceph
How Data is Organized
Partition Key region, 1h bucket: ex: west-2020-11-10-11:00
west-2020-11-10-11:00 east-2020-11-10-11:00 west-2020-11-10-12:...
Mapping InfluxDB into
Tables
cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
host num regi...
Real-World Compression
• 591GB TSM across 483 files
• 97GB compressed TSM with gzip (likely due to index size)
• Naive Par...
Partitioning is key to performance
In-memory Perf Preview (tracing example)
• env - production or staging environment
• data_centre - the region within a clo...
Test data cardinalities
104,998,932 rows
• env - 2
• data_centre - 20
• cluster - 200
• user_id - 200,000
• request_id - 2...
Test data sizes
104,998,932 rows ~ 12.5 GB RAM
• env column 301 B
• data_centre ~2.1 KB
• cluster ~19.7 KB
• user_id ~176 ...
Find spans for a trace
SELECT * FROM “traces”
WHERE “trace_id” = “0000MjNg” AND
“time” >= ‘2020-10-30 15:12’ AND
“time” < ...
Find spans for a trace
SELECT * FROM “traces”
WHERE “trace_id” = “0000MjNg” AND
“time” >= ‘2020-10-30 15:12’ AND
“time” < ...
How is InfluxDB IOx distributed?
Flexible Replication Rules
• Synchronous & Asynchronous
• Push & Pull
• Request by request, batch, or bulk
• Partition to ...
One Possible Configuration
Federated, not fully connected cluster
Dix’s maxim
“Your licensing strategy is your
commercialization strategy, whether by
accident or design”
Who coordinates this?
InfluxDB 2.x OSS Journey
InfluxDB Cloud Journey
InfluxDB Enterprise Journey
Introducing InfluxDB,
an open source distributed
time series database
Introducing InfluxDB IOx,
an open source distributed
time series database
Introducing InfluxDB IOx,
an open source federated
time series database
Introducing InfluxDB IOx,
an open source distributed
time series database
analytics database
Introducing InfluxDB IOx,
an open source distributed
time series database
columnar database
Introducing InfluxDB IOx,
an open source distributed
time series database
replication system
Introducing InfluxDB IOx,
an open source distributed
time series database
events processor
Introducing InfluxDB IOx,
an open source distributed
time series database
data lifecycle manager
Introducing InfluxDB IOx,
an open source distributed
time series database
edge processor and data store
Get Involved
• Star & watch the repo at github.com/influxdata/influxdb_iox
• Find the InfluxDB IOx topic on community.infl...
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020
Prochain SlideShare
Chargement dans…5
×

Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020

Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Paul Dix [InfluxData] | InfluxDays Opening Keynote | InfluxDays Virtual Experience NA 2020

  1. 1. Paul Dix CTO & co-founder, InfluxData @pauldix North America Virtual Experience 2020-11-10 The future of InfluxDB
  2. 2. InfluxDB 2.0 Open Source GA!
  3. 3. November 12, 2013
  4. 4. Introducing InfluxDB, an open source distributed time series database
  5. 5. What is time series data?
  6. 6. Stock trades and quotes
  7. 7. Analytics
  8. 8. Log Events
  9. 9. More Events • Measurements • Exceptions • Page Views • User actions • Commits • Deploys • Things happening in time
  10. 10. Sensor data
  11. 11. Two kinds of time series data…
  12. 12. Regular time series t0 t1 t2 t3 t4 t6 t7 Samples at regular intervals
  13. 13. Irregular time series t0 t1 t2 t3 t4 t6 t7 Events whenever they come in
  14. 14. Things you want to ask questions about, visualize, or summarize over time.
  15. 15. Where we are today
  16. 16. InfluxDB is great for metrics
  17. 17. InfluxDB is great for analytics* *on lower cardinality data
  18. 18. InfluxDB open source lacks distributed features
  19. 19. It’s time to advance…
  20. 20. Requirements • What cardinality? • Analytics performance • Separate compute from storage and tiered storage • Operator defined Replication & Partitioning • Able to run without locally attached storage • Bulk data import and export • Subscriptions • Federated by design • Embeddable scripting • Greater compatibility
  21. 21. Iterate and Refactor or Rebuild the Core?
  22. 22. How InfluxDB Organizes Data
  23. 23. Line Protocol cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
  24. 24. Line Protocol Measurement cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000
  25. 25. Line Protocol cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000 Tags
  26. 26. Line Protocol cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000 Fields
  27. 27. Line Protocol cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000 nanosecond epoch
  28. 28. Line Protocol Series cpu,host=serverA,num=1,region=west#idle (1.667, 1492214400000000000) cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000 cpu,host=serverA,num=1,region=west#system (2342.2, 1492214400000000000)
  29. 29. Inverted Index Series ID 1 - cpu,host=serverA,num=1,region=west#idle (1.667, 1492214400000000000) 2 - cpu,host=serverB,num=1,region=west#system (2342.2, 1492214400000000000) cpu - [1, 2] host=serverA - [1] host=serverB - [2] num=1 - [1, 2] region=west - [1, 2] Posting Lists
  30. 30. Every new tag value expands index
  31. 31. Inverted Index & Time Series
  32. 32. mmap difficulties
  33. 33. Object Store Durability
  34. 34. Towards a new core
  35. 35. short for iron oxide, pronounced (eye-ox)
  36. 36. In-memory columnar database
  37. 37. No storage engine
  38. 38. Parquet + Object Store is huge
  39. 39. Not just object store Object Store Abstraction Local Disk S3 GCP Cloud Storage In Memory Azure Blob Storage Minio Ceph
  40. 40. How Data is Organized
  41. 41. Partition Key region, 1h bucket: ex: west-2020-11-10-11:00 west-2020-11-10-11:00 east-2020-11-10-11:00 west-2020-11-10-12:00 Partitions block 1 block 2 Immutable Blocks table table Tables of data Parquet file Parquet file In-memory compressed Segment In-memory compressed Segment Physical Layout Mutable Write Buffer
  42. 42. Mapping InfluxDB into Tables cpu,host=serverA,num=1,region=west idle=1.667,system=2342.2 1492214400000000000 host num region idle system time serverA 1 west 1.667 2342.2 1492214400000000000 Table: cpu
  43. 43. Real-World Compression • 591GB TSM across 483 files • 97GB compressed TSM with gzip (likely due to index size) • Naive Parquet test: • 118GB • 246,140 files
  44. 44. Partitioning is key to performance
  45. 45. In-memory Perf Preview (tracing example) • env - production or staging environment • data_centre - the region within a cloud vendor • cluster - a specific cluster, e.g., a k8s cluster • user_id - an id associated with the user that issued a request that was traced • request_id - an id associated with a single request that started a trace • trace_id - a single id associated with all spans in the trace • node_id - the id of compute node that the trace execution ran across • pod_id - the id of containers that the trace execution ran across • span_id - a random id for every sample generated in the trace
  46. 46. Test data cardinalities 104,998,932 rows • env - 2 • data_centre - 20 • cluster - 200 • user_id - 200,000 • request_id - 2,000,000 • trace_id - 10,000,000 • node_id - 2,000 • pod_id - 20,000 • span_id - ∞ (a new one for each sample row)
  47. 47. Test data sizes 104,998,932 rows ~ 12.5 GB RAM • env column 301 B • data_centre ~2.1 KB • cluster ~19.7 KB • user_id ~176 MB • request_id ~816 MB • trace_id ~1.6GB • node_id ~204 KB • pod_id ~2 MB • span_id ~9.2GB • duration ~840 MB • time ~840 MB
  48. 48. Find spans for a trace SELECT * FROM “traces” WHERE “trace_id” = “0000MjNg” AND “time” >= ‘2020-10-30 15:12’ AND “time” < ‘2020-10-30 16:12’;
  49. 49. Find spans for a trace SELECT * FROM “traces” WHERE “trace_id” = “0000MjNg” AND “time” >= ‘2020-10-30 15:12’ AND “time” < ‘2020-10-30 16:12’; Returned in: 84.666665ms ~ 1.1B rows/sec
  50. 50. How is InfluxDB IOx distributed?
  51. 51. Flexible Replication Rules • Synchronous & Asynchronous • Push & Pull • Request by request, batch, or bulk • Partition to servers, groups of servers • Total operator control via RESTful API
  52. 52. One Possible Configuration
  53. 53. Federated, not fully connected cluster
  54. 54. Dix’s maxim “Your licensing strategy is your commercialization strategy, whether by accident or design”
  55. 55. Who coordinates this?
  56. 56. InfluxDB 2.x OSS Journey
  57. 57. InfluxDB Cloud Journey
  58. 58. InfluxDB Enterprise Journey
  59. 59. Introducing InfluxDB, an open source distributed time series database
  60. 60. Introducing InfluxDB IOx, an open source distributed time series database
  61. 61. Introducing InfluxDB IOx, an open source federated time series database
  62. 62. Introducing InfluxDB IOx, an open source distributed time series database analytics database
  63. 63. Introducing InfluxDB IOx, an open source distributed time series database columnar database
  64. 64. Introducing InfluxDB IOx, an open source distributed time series database replication system
  65. 65. Introducing InfluxDB IOx, an open source distributed time series database events processor
  66. 66. Introducing InfluxDB IOx, an open source distributed time series database data lifecycle manager
  67. 67. Introducing InfluxDB IOx, an open source distributed time series database edge processor and data store
  68. 68. Get Involved • Star & watch the repo at github.com/influxdata/influxdb_iox • Find the InfluxDB IOx topic on community.influxdata.com • Join the #influxdb_iox channel in our community Slack • Join us on the 2nd Wednesday of every month at 8:30 AM Pacific Time for a tech talk on InfluxDB IOx - influxdata.com/community-showcase/influxdb-tech- talks/ • We’re hiring for Rust, distributed systems, and columnar databases expertise. Email to recruiting@influxdata.com and CC me paul@influxdata.com. • Star & watch the repo at github.com/influxdata/influxdb_iox • Find the InfluxDB IOx topic on community.influxdata.com • Join the #influxdb_iox channel in our community Slack • Join us on the 2nd Wednesday of every month at 8:30 AM Pacific Time for a tech talk on InfluxDB IOx - influxdata.com/community-showcase/influxdb-tech- talks/ • We’re hiring for Rust, distributed systems, and columnar databases expertise. Email to recruiting@influxdata.com and CC me paul@influxdata.com.

×