Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Cassandra Hadoop Best Practices by Jeremy Hanna

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 31 Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à Cassandra Hadoop Best Practices by Jeremy Hanna (20)

Plus par HUG France (20)

Publicité

Plus récents (20)

Cassandra Hadoop Best Practices by Jeremy Hanna

  1. 1. Hadoop + Cassandra Best Practices Thursday, June 6, 13
  2. 2. Some Background Thursday, June 6, 13
  3. 3. Some Background • Hadoop support since early 2010 Thursday, June 6, 13
  4. 4. Some Background • Hadoop support since early 2010 • MapReduce/Pig works with any Hadoop 1.x distribution. Thursday, June 6, 13
  5. 5. Some Background • Hadoop support since early 2010 • MapReduce/Pig works with any Hadoop 1.x distribution. • Hive is a neatly integrated piece of DSE Thursday, June 6, 13
  6. 6. Some Background • Hadoop support since early 2010 • MapReduce/Pig works with any Hadoop 1.x distribution. • Hive is a neatly integrated piece of DSE • Data locality just like with HDFS Thursday, June 6, 13
  7. 7. Some Background • Hadoop support since early 2010 • MapReduce/Pig works with any Hadoop 1.x distribution. • Hive is a neatly integrated piece of DSE • Data locality just like with HDFS • Cassandra can handle ~200 CFs Thursday, June 6, 13
  8. 8. Setup Thursday, June 6, 13
  9. 9. Setup • Analytics specific datacenter Thursday, June 6, 13
  10. 10. Setup • Analytics specific datacenter • Configure replication (KS/DC specific) Thursday, June 6, 13
  11. 11. Setup • Analytics specific datacenter • Configure replication (KS/DC specific) • Isolated reads at CL.LOCAL_QUORUM Thursday, June 6, 13
  12. 12. Setup • Analytics specific datacenter • Configure replication (KS/DC specific) • Isolated reads at CL.LOCAL_QUORUM • Writes will be replicated Thursday, June 6, 13
  13. 13. Setup • Analytics specific datacenter • Configure replication (KS/DC specific) • Isolated reads at CL.LOCAL_QUORUM • Writes will be replicated • Same best practices as with Hadoop alone Thursday, June 6, 13
  14. 14. Vanilla Hadoop Thursday, June 6, 13
  15. 15. Vanilla Hadoop • Co-locate task trackers and data nodes with Cassandra nodes (data locality) Thursday, June 6, 13
  16. 16. Vanilla Hadoop • Co-locate task trackers and data nodes with Cassandra nodes (data locality) • Workload isolation with separate Cassandra datacenter configured Thursday, June 6, 13
  17. 17. Planning Thursday, June 6, 13
  18. 18. Planning • MapReduce over full column family Thursday, June 6, 13
  19. 19. Planning • MapReduce over full column family • Model data accordingly Thursday, June 6, 13
  20. 20. Planning • MapReduce over full column family • Model data accordingly • Add more column families Thursday, June 6, 13
  21. 21. Planning • MapReduce over full column family • Model data accordingly • Add more column families • Can use secondary index, but use caution Thursday, June 6, 13
  22. 22. Execution Thursday, June 6, 13
  23. 23. Execution • Project and select early in your workflow Thursday, June 6, 13
  24. 24. Execution • Project and select early in your workflow • Store common intermediate datasets (in CFS/HDFS) Thursday, June 6, 13
  25. 25. Execution • Project and select early in your workflow • Store common intermediate datasets (in CFS/HDFS) • Bulk loader output format excels Thursday, June 6, 13
  26. 26. Use Cases Thursday, June 6, 13
  27. 27. Use Cases • Typical Hadoop tasks Thursday, June 6, 13
  28. 28. Use Cases • Typical Hadoop tasks • Validate data Thursday, June 6, 13
  29. 29. Use Cases • Typical Hadoop tasks • Validate data • Fix data Thursday, June 6, 13
  30. 30. Use Cases • Typical Hadoop tasks • Validate data • Fix data • Bootstrap a new column family from existing data Thursday, June 6, 13
  31. 31. Thank you • Jeremy Hanna • @jeromatron (twitter and irc) • jeremy@datastax.com • Ping me if you have any questions Thursday, June 6, 13

×