Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle som...
Agenda
1
2
3
4
5
AWS Athena
Google BigQuery
Test Drive
Summary
Q & A
2
DoIT International confidential │ Do not distribute
About me..
Vadim Solovey - CTO // DoiT International
DoIT International confidential │ Do not distribute
DoIT International confidential │ Do not distribute
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle som...
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle som...
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle som...
Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you.
Make the subtitle som...
Section Slide Template Option 2
Put your subtitle here. Feel free to pick from the handful of pretty Google colors availab...
DoIT International confidential │ Do not distribute
AWS Athena
• Serverless Analytical Columnar Database based on Facebook...
DoIT International confidential │ Do not distribute
Google BigQuery
• Serverless Analytical Columnar Database based on Goo...
DoIT International confidential │ Do not distribute
Summary
Feature  Product AWS Athena Google BigQuery
Data Formats *SV, ...
DoIT International confidential │ Do not distribute
How we tested?
• Dataset
• New York Yellow Taxi Public Dataset (https:...
DoIT International confidential │ Do not distribute
Tables & Formats
BigQuery
• trips_ext (500x CSV files, 490MB each) [24...
DoIT International confidential │ Do not distribute
Test Drive Summary
Query Type AWS Athens (GB/time) Google BigQuery (GB...
DoIT International confidential │ Do not distribute
What Athena does better than BigQuery?
Advantages:
• Can be faster tha...
DoIT International confidential │ Do not distribute
What BigQuery does better than Athena?
• It has native table support g...
Section Slide Template Option 2
Put your subtitle here. Feel free to pick from the handful of pretty Google colors availab...
DoIT International confidential │ Do not distribute
[1] Lookup Query
SELECT *
FROM trips_par
WHERE vendor_id = 'VTS'
LIMIT...
DoIT International confidential │ Do not distribute
[2] Lookup & Aggregation
SELECT max(passenger_count)
FROM trips_par
WH...
DoIT International confidential │ Do not distribute
[3] GROUP BY / ORDER BY Query
SELECT substr(string(pickup_datetime),1,...
DoIT International confidential │ Do not distribute
[4] ‘LIKE’ Functions Query
SELECT
count(*)
FROM
log_par
WHERE
UA LIKE ...
DoIT International confidential │ Do not distribute
[5] JSON Functions Query
SELECT
JSON_EXTRACT(Misc_Fields,'$.network.ed...
DoIT International confidential │ Do not distribute
[6] Regex Functions Query
SELECT *
FROM log_par
WHERE REGEXP_MATCH(req...
Prochain SlideShare
Chargement dans…5
×

AWS Athena vs. Google BigQuery for interactive SQL Queries

During the re:Invent 2016, AWS has released the Amazon Athena - an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

We took a look on AWS Athena and compared it to the Google BigQuery - another player of serverless interactive data analysis.

Would you like to know which one is the right tool for you? Join us for this meetup to learn AWS Athena and for the test drive of querying exactly the same dataset using AWS Athena and Google BigQuery to see where each one shines (or totally blows it).

  • Soyez le premier à commenter

AWS Athena vs. Google BigQuery for interactive SQL Queries

  1. 1. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Welcome! DoIT International Practicing multi-cloud since 2010.
  2. 2. Agenda 1 2 3 4 5 AWS Athena Google BigQuery Test Drive Summary Q & A 2
  3. 3. DoIT International confidential │ Do not distribute About me.. Vadim Solovey - CTO // DoiT International
  4. 4. DoIT International confidential │ Do not distribute
  5. 5. DoIT International confidential │ Do not distribute
  6. 6. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. AWS Athena vs Google BigQuery for interactive SQL queries on large datasets (#20/16) Vadim Solovey - CTO // DoIT International Google Cloud Developer Expert | AWS Certified Solutions Architect
  7. 7. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Athena (/əˈθiːnə/; Greek: - the goddess of wisdom, craft, and war
  8. 8. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. OR
  9. 9. Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Will Athena slay BigQuery? Vadim Solovey - CTO // DoIT International Google Cloud Developer Expert | AWS Certified Solutions Architect
  10. 10. Section Slide Template Option 2 Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. your mileage may will vary Warning:
  11. 11. DoIT International confidential │ Do not distribute AWS Athena • Serverless Analytical Columnar Database based on Facebook’s Presto • Data: • External Tables (*SV, JSON, ORC, PARQUET files in S3 bucket) • Ingestion: • Just store files on S3 • Convert to columnar/compressed format using EMR • ANSI SQL 2011 • Priced at $5/TB of scanned data & standard S3 storage/ops costs • Cost Optimization -converting data into columnar format, partitioning, limit queried columns.
  12. 12. DoIT International confidential │ Do not distribute Google BigQuery • Serverless Analytical Columnar Database based on Google Dremel • Data: • Native Tables • External Tables (*SV, JSON, AVRO files stored in Google Cloud Storage bucket) • Ingestion: • File Imports • Streaming API (up to 100K records/sec per table) • Federated Tables (files in bucket, Bigtable table or Google Spreadsheet) • ANSI SQL 2011 • Priced at $5/TB of scanned data + storage + streaming (if used) • Cost Optimization - partitioning, limit queried columns, 24-hour cache, cold data.
  13. 13. DoIT International confidential │ Do not distribute Summary Feature Product AWS Athena Google BigQuery Data Formats *SV, JSON, PARQUET/z, ORC/z External (*SV, JSON, AVRO) / Native ANSI SQL Support Yes* Yes* DDL Support Only CREATE/ALTER/DROP CREATE/UPDATE/DELETE (w/ quotas) Underlying Technology FB Presto Google Dremel Caching No Yes Cold Data Pricing S3 Lifecycle Policy 50% discount after 90 days of inactivity User Defined Functions No Yes Data Partitioning On Any Key By DAY Pricing $5/TB (scanned) plus S3 ops $5/TB (scanned) less cached data
  14. 14. DoIT International confidential │ Do not distribute How we tested? • Dataset • New York Yellow Taxi Public Dataset (https://data.cityofnewyork.us) [130GB, 1.1B rows] • Akamai Log (30GB, 1B rows] • BigQuery [NY Taxi] • Import of data into native table • External table on top of 500x uncompressed CSV files in GCS bucket • Caching: off • AWS Athena [NY Taxi] • Copied 500x uncompressed CSV files from GCS to S3 bucket • Using EMR 5.2 (HIVE/PRESTO) converted the data into ORC/z and PARQUET/z formats
  15. 15. DoIT International confidential │ Do not distribute Tables & Formats BigQuery • trips_ext (500x CSV files, 490MB each) [245GB in total] • trips_nat (130GB total) AWS Athena • trips_csv (500x CSV files, 490MB each) • trips_par (4 files, 3.2GB each) • trips_parz (8 files, 1.7GB each) • trips_orc (8 files, 2GB each) • trips_orcz (8 files, 2.1GB each)
  16. 16. DoIT International confidential │ Do not distribute Test Drive Summary Query Type AWS Athens (GB/time) Google BigQuery (GB/time) t.diff % [1] LOOKUP 48MB (4.1s) 130GB (2.0s) - 51% [2] LOOKUP & AGGR 331MB (4.35s) 13.4GB (2.7s) - 48% [3] GROUP/ORDER BY 5.74GB (8.85s) 8.26GB (5.4s) - 27% [4] TEXT FUNCTIONS 606MB (11.3s) 13.6GB (2.4s) - 470% [5] JSON FUNCTIONS 29MB (17.8s) 63.9GB (8.9s) - 100% [6] REGEX FUNCTIONS (1.3s) 5.45GB (1.9s) + 31% [7] FEDERATED DATA 133GB (19.4s) 133GB (36.4s) +47%
  17. 17. DoIT International confidential │ Do not distribute What Athena does better than BigQuery? Advantages: • Can be faster than BigQuery, especially with federated/external tables • Ability to use regex to define a schema (query files without needing to change the format) • Can be faster and cheaper than BigQuery when using a partitioned/columnar format • Tables can be partitioned on any column Issues: • It’s not easy to convert data between formats • Doesn’t support DDL, i.e. no insert/update/delete • Randomly giving the HIVE_UNKNOWN_ERROR • No streaming support • Struggles with really large datasets
  18. 18. DoIT International confidential │ Do not distribute What BigQuery does better than Athena? • It has native table support giving it better performance and more features • It’s easy to manipulate data, insert/update records and write query results back to a table • Querying native tables is very fast • Easy to convert non-columnar formats into a native table for columnar queries • Supports UDFs, although they will be available in the future for Athena • Supports nested tables (nested and repeated fields) • Works well for petabyte scale queries
  19. 19. Section Slide Template Option 2 Put your subtitle here. Feel free to pick from the handful of pretty Google colors available to you. Make the subtitle something clever. People will think it’s neat. Questions?
  20. 20. DoIT International confidential │ Do not distribute [1] Lookup Query SELECT * FROM trips_par WHERE vendor_id = 'VTS' LIMIT 10 Back
  21. 21. DoIT International confidential │ Do not distribute [2] Lookup & Aggregation SELECT max(passenger_count) FROM trips_par WHERE vendor_id <> 'VTS' Back
  22. 22. DoIT International confidential │ Do not distribute [3] GROUP BY / ORDER BY Query SELECT substr(string(pickup_datetime),1,7) month, COUNT(*) trips FROM [doit-playground:playground.trips_nat] WHERE substr(string(pickup_datetime),1,4) = '2014' GROUP BY 1 ORDER BY 1 Back
  23. 23. DoIT International confidential │ Do not distribute [4] ‘LIKE’ Functions Query SELECT count(*) FROM log_par WHERE UA LIKE '%AppleWebKit%' OR Back
  24. 24. DoIT International confidential │ Do not distribute [5] JSON Functions Query SELECT JSON_EXTRACT(Misc_Fields,'$.network.edgeIP') AS edgeIP, COUNT(*) AS total FROM [doit-playground:playground.akamai_errors] GROUP BY edgeIP ORDER BY total DESC LIMIT 10 Back
  25. 25. DoIT International confidential │ Do not distribute [6] Regex Functions Query SELECT * FROM log_par WHERE REGEXP_MATCH(reqPath, r'msn.*-home') LIMIT 10 Back

×