A short introduction to Vertica 
Tommi Siivola, Software Engineer 
RedHat Software Developer Meetup 10.09.2014
- Quick orientation 
- Columns 
- Projections 
- Clustering 
- Hybrid storage 
- Special features 
AGENDA
Quick orientation to Vertica 
- Big data database product from HP 
- For handling terabytes/petabytes of data 
- Column-or...
Quick orientation to Vertica 
- What does that mean in practice? 
– Vertica is a relational database 
– Supports a subset ...
Quick orientation to Vertica 
- Runs on major Linux distros (RHEL, Suse, Debian, Ubuntu) 
- Amazon AMI available for runni...
Concepts: column-oriented 
- Vertica stores data as columns, instead of each row as unit 
– Allows for efficient data comp...
Concepts: column-oriented 
ROWS VS. COLUMNS 
2014-03-15 23.43 3 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 ...
Concepts: column-oriented 
RUN LENGTH ENCODING 
2014-03-15 23.43 3 
(5 times) 23.97 4 
24.51 7 
25.05 6 
25.59 7 
2014-03-...
Concepts: column-oriented 
SKIP UNWANTED COLUMNS date value id 
2014-03-15 23.97 4 
2014-03-15 24.51 7 
2014-03-15 25.05 6...
Concepts: projections 
- Data physically stored in projections 
- Projections similar to materialized views 
– Data optimi...
Concepts: projections 
ONE DATA, MANY PROJECTIONS 
Sorted by date Sorted by id 
2014-03-16 27.21 2 
2014-03-15 23.43 3 
20...
Concepts: clustering 
- Parallel processing 
– Data segments distributed across cluster nodes 
– Performance can be increa...
Concepts: clustering 
SEGMENTATION 
Node 1 
SEGMENT1 
Node 2 
SEGMENT2 
Node 3 
SEGMENT3 
Node 4 
SEGMENT4
Concepts: clustering 
K-SAFETY 
Node 1 
SEGMENT1 
SEGMENT2 
Node 2 
SEGMENT2 
SEGMENT3 
Node 3 
SEGMENT3 
SEGMENT4 
Node 4...
Concepts: Hybrid storage 
- Read-optimized storage (ROS) 
– On disk 
– Heavily encoded & compressed 
- Write-optimized sto...
Concepts: Hybrid storage 
- Inserted data is first aggregated in WOS 
– Inserting to WOS is faster, due to lack of compres...
Vertica feature: Pattern matching 
- Example: Finding sequences in 
web site log data 
- Find all sequences where user 
en...
Vertica feature: Pattern matching 
- Example: find sequences where user enters a site, browses 
and makes a purchase 
SELE...
Extending Vertica 
- Custom SQL functions can be created with R, Java or C++ 
- R can be used for creating scalar and tran...
Find out more 
- Vertica free downloads available at (requires registration) 
– my.vertica.com 
- Vertica documentation av...
THANKS! 
Tommi Siivola, Software Engineer 
tommi.siivola@eficode.com 
+358 (0)50 371 9308 
eficode.fi 
”Automatisoi tai 
n...
Prochain SlideShare
Chargement dans…5
×

A short introduction to Vertica

2 497 vues

Publié le

Some interesting technical aspects of the Vertica big data platform from HP.

Publié dans : Technologie
  • Soyez le premier à commenter

A short introduction to Vertica

  1. 1. A short introduction to Vertica Tommi Siivola, Software Engineer RedHat Software Developer Meetup 10.09.2014
  2. 2. - Quick orientation - Columns - Projections - Clustering - Hybrid storage - Special features AGENDA
  3. 3. Quick orientation to Vertica - Big data database product from HP - For handling terabytes/petabytes of data - Column-oriented
  4. 4. Quick orientation to Vertica - What does that mean in practice? – Vertica is a relational database – Supports a subset of ANSI SQL-99 standard – JDBC/ODBC drivers – A command line client (vsql)
  5. 5. Quick orientation to Vertica - Runs on major Linux distros (RHEL, Suse, Debian, Ubuntu) - Amazon AMI available for running in Vertica in the cloud - Up to 1 TB of data and a cluster of 3 nodes without license (so called ”Community Edition” mode) - Larger setups require a license from HP
  6. 6. Concepts: column-oriented - Vertica stores data as columns, instead of each row as unit – Allows for efficient data compression – Can skip unwanted columns when querying – More efficient aggregate value calculations
  7. 7. Concepts: column-oriented ROWS VS. COLUMNS 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  8. 8. Concepts: column-oriented RUN LENGTH ENCODING 2014-03-15 23.43 3 (5 times) 23.97 4 24.51 7 25.05 6 25.59 7 2014-03-16 26.13 7 (5 times) 26.67 4 27.21 2 27.75 3 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  9. 9. Concepts: column-oriented SKIP UNWANTED COLUMNS date value id 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7 SELECT value, id FROM table
  10. 10. Concepts: projections - Data physically stored in projections - Projections similar to materialized views – Data optimized for querying during insert - Table has one or more projections - Projection contains one or more columns - Data can be duplicated in projections for query efficiency
  11. 11. Concepts: projections ONE DATA, MANY PROJECTIONS Sorted by date Sorted by id 2014-03-16 27.21 2 2014-03-15 23.43 3 2014-03-16 27.75 3 2014-03-15 23.97 4 2014-03-16 26.67 4 2014-03-15 25.05 6 2014-03-15 24.51 7 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 28.29 7 2014-03-15 23.43 3 2014-03-15 23.97 4 2014-03-15 24.51 7 2014-03-15 25.05 6 2014-03-15 25.59 7 2014-03-16 26.13 7 2014-03-16 26.67 4 2014-03-16 27.21 2 2014-03-16 27.75 3 2014-03-16 28.29 7
  12. 12. Concepts: clustering - Parallel processing – Data segments distributed across cluster nodes – Performance can be increased by adding hardware - Reliability (K-safety) – Tolerates nodes going offline - All nodes can respond to queries → queries can be load balanced between nodes
  13. 13. Concepts: clustering SEGMENTATION Node 1 SEGMENT1 Node 2 SEGMENT2 Node 3 SEGMENT3 Node 4 SEGMENT4
  14. 14. Concepts: clustering K-SAFETY Node 1 SEGMENT1 SEGMENT2 Node 2 SEGMENT2 SEGMENT3 Node 3 SEGMENT3 SEGMENT4 Node 4 SEGMENT4 SEGMENT1
  15. 15. Concepts: Hybrid storage - Read-optimized storage (ROS) – On disk – Heavily encoded & compressed - Write-optimized storage (WOS) – In memory – No encoding or compression
  16. 16. Concepts: Hybrid storage - Inserted data is first aggregated in WOS – Inserting to WOS is faster, due to lack of compression and disk write overheads - Background job moves data in batches from WOS to ROS – Writing to ROS is more efficient in batches – Querying is more efficient from ROS
  17. 17. Vertica feature: Pattern matching - Example: Finding sequences in web site log data - Find all sequences where user enters the site, browses and finally makes a purchase - Difficult to express in SQL - Vertica has SQL extension for finding patterns user action 1 enter 1 browse 1 browse 1 purchase 2 enter 2 browse 3 enter 3 browse 3 purchase PATTERNS IN DATA
  18. 18. Vertica feature: Pattern matching - Example: find sequences where user enters a site, browses and makes a purchase SELECT uid,sid,ts,refurl,pageurl,action, event_name(),pattern_id(),match_id() FROM clickstream_log MATCH (PARTITION BY uid, sid ORDER BY ts DEFINE Entry AS refurl NOT ILIKE '%site.com%' AND pageurl ILIKE '%site.com%', Onsite AS pageurl ILIKE '%site.com%' AND action = 'V', Purchase AS pageurl ILIKE '%site.com%' AND action = 'P' PATTERN P AS (Entry Onsite* Purchase) ROWS MATCH FIRST EVENT);
  19. 19. Extending Vertica - Custom SQL functions can be created with R, Java or C++ - R can be used for creating scalar and transform functions - Java, all of the above + load functions - C++, all of the above + aggregate and analytic functions
  20. 20. Find out more - Vertica free downloads available at (requires registration) – my.vertica.com - Vertica documentation available at (no registration) – www.vertica.com/documentation - C-Store research project (Vertica predecessor) – db.csail.mit.edu/projects/cstore/
  21. 21. THANKS! Tommi Siivola, Software Engineer tommi.siivola@eficode.com +358 (0)50 371 9308 eficode.fi ”Automatisoi tai näivety” ja muita kirjoituksia Eficoden blogissa. EFICODE.FI/BLOGI

×