Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Recent Changes and Challenges for Future Presto

Recent Changes and Challenges for Future Presto

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir
  • Soyez le premier à commenter

Recent Changes and Challenges for Future Presto

  1. 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 2018 Oct, KAI SASAKI Recent Changes and Challenges for Future Presto
  2. 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Bio KAI SASAKI (佐々木 海) • Senior Software Engineer, Arm Treasure Data • Working for 3 years • Query Engine Team • Responsible for maintaining/improving Presto/Spark in TD • 好きな福利厚生:フリーランチ • Twitter: @Lewuathe
  3. 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  4. 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed SQL Engine in TD
  5. 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Presto Architecture
  6. 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Arm Treasure Data eCDP
  7. 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Stats Presto is used intensively for both adhoc analysis and batch processing Queries Data Volume Scalability • • • • • •
  8. 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Topic Recent Presto Changes • • • Challenges for Presto Upgrade • • •
  9. 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  10. 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Cost Based Optimizer Cost based optimizer is mainly designed and developed by the developers in Starburst. • https://www.starburstdata.com/technical-blog/introduction-to-presto-cost-based-optimizer/ • Considering cpu cost, memory cost and network cost. • From 0.207, reordering join with cost estimation is supported
  11. 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Reordering Join A large table should always precedes a small table • • https://www.starburstdata.com/technical-blog/introduction-to-presto-cost-based-optimizer/
  12. 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Join on Partitioned Key Enable distributed join when the probe table is partitioned by the join key naturally. • No need to specify distributed_join session parameter explicitly. • User defined partitioning can be leveraged by this optimization explain (type distributed) select t1.features, t1.class from iris_udp t1 join iris_udp t2 on t1.class = t2.class;
  13. 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Distributed Join On Partitioning Key Fragment 2 [td-presto:[class(varchar)]/16] Output layout: [class_1, $hashvalue_15] Output partitioning: BROADCAST [] - ScanProject[table = td-presto:td:iris_udp] => [class_1:varchar, $hashvalue_15:bigint] $hashvalue_15 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("class_1"), 0)) LAYOUT: com.treasure_data.presto.connector.TDTableLayoutHandle@e686a6e9 class_1 := class(varchar) Fragment 2 [td-presto:[class(varchar)]/16] Output layout: [class_1, $hashvalue_15] Output partitioning: td-presto:[class(varchar)]/16 [class_1] Execution Flow: UNGROUPED_EXECUTION - ScanProject[table = td-presto:td:iris_udp] => [class_1:varchar, $hashvalue_15:bigint] Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00} $hashvalue_15 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("class_1"), 0)) LAYOUT: com.treasure_data.presto.connector.TDTableLayoutHandle@ca9b5858 class_1 := class(varchar) 0.188 0.205
  14. 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Trace Token Each HTTP request sent by the scheduler and exchange HTTP clients will have a “trace token” (a unique ID) in their headers, which will be logged in the HTTP request logs. • Tracing of a query would be easier • X-Presto-Trace-Token seems to be used
  15. 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  16. 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Decimal Literal Behavior Change Decimal literal is interpreted as decimal correctly. • Older version interpreted as double type. • Restore the original behavior by session parameter – System wide: parse-decimal-literals-as-double=true – Session wide: parse_decimal_literals_as_double=true presto> select typeof(1.1); _col0 -------------- decimal(2,1) (1 row) presto> select typeof(1.1 ); _col0 -------- double (1 row) 0.188 0.205
  17. 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Correlated Subquery has strict limitation LIMIT included in correlated subquery is not available. • Need to remove LIMIT clause select t1.features, t3.class from iris t1, lateral ( select * from iris t2 where t1.class = t2.class limit 3 ) t3; Query 20181016_034209_00610_m74gc failed: line 1:52: Given correlated subquery is not supported
  18. 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. deprecated.legacy-order-by Removed deprecated.legacy-order-by flag • Output projection always higher precedence according to ANSI SQL standard select features, class as c ← Higher Precedence From iris c order by c.time; Query 20181015_044127_00895_m74gc failed: line 1:62: Expression c is not of type ROW
  19. 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. deprecated.legacy-order-by Removed deprecated.legacy-order-by flag • Output projection always higher precedence according to ANSI SQL standard select TD_TIME_FORMAT(time, 'yyyy-MM-dd') time, ← Higher Precedence features, class from iris order by TD_TIME_FORMAT(time, 'yyyy'); Query 20181015_044515_00906_m74gc failed: line 1:98: Unexpected parameters (varchar, varchar(4)) for function td_time_format. Expected: td_time_format(bigint, varchar, varchar) , td_time_format(bigint, varchar)
  20. 20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. deprecated.legacy-join-using Removed deprecated.legacy-join-using • Only one equivalent column is assumed to be passed to the output projection select t1.class, t2.class from iris t1 join iris t2 using(class); ← Only ‘class’ is projected Query 20181015_051138_01029_m74gc failed: line 1:8: Column 't1.class' cannot be resolved
  21. 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. TD_INTERVAL Make time index pushdown more familiar • TD_INTERVAL provides a intuitive manner to filter out unnecessary partitions
  22. 22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Functions ● GeoSpatial Functions ○ ST_EnvelopAsPts, ST_GetmetryN, ST_ConvexHull ● Binary Functions ○ to_big_endian_32, from_big_endian_32, hmac_shaXXX ● Statistical Functions ○ wilson_interval_lower, wilson_interval_upper ● MISC ○ current_user, Removed log function
  23. 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  24. 24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges A release including incompatible changes requires always special care. • No breaking existing queries • No breaking silently even if unavoidable • Detect the problems as many as possible in advance • Communication • Workarounds
  25. 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Presto Conductor Orchestration tool to make it easy to operate multiple Presto clusters • A server component built on top of Finagle. – Cluster Preparation – Query Simulation – Worker Process Refresh – Auto Scaling (under PoC) • Create/Destroy a cluster is one of the troublesome operation. Conductor enables us to prepare test cluster automatically. • Managing query simulation is totally delegated to Conductor.
  26. 26. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Signature Constructs to represent a group of customer queries. • Categorize the customer query associating with the query metrics SELECT l.orderkey, sum(l.extendedprice * (1 - l.discount)) AS revenue, o.shippriority FROM customer AS c, orders AS o, lineitem AS l WHERE c.custkey = o.custkey AND o.orderdate < 1539587956 GROUP BY o.shippriority ORDER BY revenue DESC LIMIT 10 L(O(G(S(J(J(T,T),T))))) customer->#,lineitem->#,orders->#
  27. 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Query Simulation Run test queries to ensure no result inconsistency and performance regression. • Record of the checksum of returned result and elapsed time. 0.188 0.205 1. Run test queries 2. Logging results - checksum - elapsed time - tracking id - account id - cluster Conductor
  28. 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Canary Deployment Manage each account migration in 3 states. old canary new • • – • • – • • –
  29. 29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Canary Deployment Change cluster routing by using magic comment -- @TD engine_version: 0.188 SELECT ... -- @TD engine_version: 0.205 SELECT ... 0.188 0.205
  30. 30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Canary Deployment Why canary deployment? Pros • • Cons • • •
  31. 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Recap Recent Presto Changes • • • Challenges for Presto Upgrade • • •
  32. 32. Thank You! Danke! Merci! 谢谢! Gracias! Kiitos! Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.

    Soyez le premier à commenter

    Identifiez-vous pour voir les commentaires

  • taroleo

    Oct. 17, 2018

Recent Changes and Challenges for Future Presto

Vues

Nombre de vues

1 307

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

561

Actions

Téléchargements

3

Partages

0

Commentaires

0

Mentions J'aime

1

×