Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Practical NoSQL: Accumulo's dirlist Example

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 32 Publicité

Practical NoSQL: Accumulo's dirlist Example

Télécharger pour lire hors ligne

Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.

Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).

In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.

Many individuals/organizations have a desire to utilize NoSQL technology, but often lack an understanding of how the underlying functional bits can be utilized to enable their use case. This situation can result in drastic increases in the desire to put the SQL back in NoSQL.

Since the initial commit, Apache Accumulo has provided a number of examples to help jumpstart comprehension of how some of these bits function as well as potentially help tease out an understanding of how they might be applied to a NoSQL friendly use case. One very relatable example demonstrates how Accumulo could be used to emulate a filesystem (dirlist).

In this session we will walk through the dirlist implementation. Attendees should come away with an understanding of the supporting table designs, a simple text search supporting a single wildcard (on file/directory names), and how the dirlist elements work together to accomplish its feature set. Attendees should (hopefully) also come away with a justification for sometimes keeping the SQL out of NoSQL.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Practical NoSQL: Accumulo's dirlist Example (20)

Publicité

Plus par DataWorks Summit (20)

Plus récents (20)

Publicité

Practical NoSQL: Accumulo's dirlist Example

  1. 1. Practical NoSQL: Accumulo's dirlist Example May 21, 2019 John Highcock & Henry Sowell
  2. 2. © Cloudera, Inc. All rights reserved. 2 Changing the paradigm Moving to NoSQL • They want NoSQL… • Everyone understands RDBMS • Transition thinking about storing the same data in a NoSQL store
  3. 3. © Cloudera, Inc. All rights reserved. 3 Accumulo Examples Since ACCUMULO-1
  4. 4. © Cloudera, Inc. All rights reserved. 4 Accumulo Examples: dirlist Emulating Filesystem Characteristics
  5. 5. © Cloudera, Inc. All rights reserved. 5 Background for understanding the dirlist example • HDFS small file abuse • Accumulo manages lots of small things well • Scalability
  6. 6. © Cloudera, Inc. All rights reserved. 6 Background for understanding the dirlist example (cont’d) Accumulo K/V structure
  7. 7. © Cloudera, Inc. All rights reserved. 7 Setup • Create some sample files
  8. 8. © Cloudera, Inc. All rights reserved. 8 Setup (cont’d) • /opt/files/prod/done is executable • /opt/files/prod/.shh is hidden
  9. 9. © Cloudera, Inc. All rights reserved. 9 Setup (cont’d) • Compute MD5s for later reference
  10. 10. © Cloudera, Inc. All rights reserved. 10 Ingest and Accumulo Setup • Ingest the files/directories • Different authorization between prod and test • chunkSize arbitrarily low to force chunking (more on that later)
  11. 11. © Cloudera, Inc. All rights reserved. 11 Accumulo Setup (cont’d) • Created 3 tables • Setting auths for the user
  12. 12. © Cloudera, Inc. All rights reserved. 12 dirTable eye chart • Row for each node in the filetree • Entry for various FS characteristics • hidden • executable • md5 for files
  13. 13. © Cloudera, Inc. All rights reserved. 13 dirTable snippet • Example for /opt/files/prod/done • Bits from filesystem replicated to table entries
  14. 14. © Cloudera, Inc. All rights reserved. 14 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  15. 15. © Cloudera, Inc. All rights reserved. 15 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  16. 16. © Cloudera, Inc. All rights reserved. 16 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  17. 17. © Cloudera, Inc. All rights reserved. 17 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  18. 18. © Cloudera, Inc. All rights reserved. 18 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  19. 19. © Cloudera, Inc. All rights reserved. 19 dataTable (file content storage) • done file Example: row cf :cq [vis] value eb3d... refs:377ff...x00name [prod] /opt/files/prod/done
  20. 20. © Cloudera, Inc. All rights reserved. 20 dataTable • Chunked based on size
  21. 21. © Cloudera, Inc. All rights reserved. 21 ChunkCombiner • Configured for dataTable
  22. 22. © Cloudera, Inc. All rights reserved. 22 Index Table • Forward and reverse tokens in row • Provides lookup for dirTable
  23. 23. © Cloudera, Inc. All rights reserved. 23 Query • Leading / Middle / Trailing Wildcard • Exact Term
  24. 24. © Cloudera, Inc. All rights reserved. 24 Query Flexibility • Can pick arbitrary depths to start and stop scans of dirTable due to depth prefix
  25. 25. © Cloudera, Inc. All rights reserved. 25 FileCount • Count file and directory depths per node
  26. 26. © Cloudera, Inc. All rights reserved. 26 Simple file viewer for navigating the dirTable and displaying dataTable content • Opened with root of /opt/files • Displays file/directory metadata in upper right frame • File content (as applicable) in lower right frame Filesystem App on Accumulo
  27. 27. © Cloudera, Inc. All rights reserved. 27 Full Circle with “done” Filesystem App on Accumulo
  28. 28. © Cloudera, Inc. All rights reserved. 28 Chunked file “code” stitched back together Filesystem App on Accumulo
  29. 29. © Cloudera, Inc. All rights reserved. 29 Obligatory authorization example • Top viewer started with only “prod” authorization • Middle viewer with only “test” • Bottom with no authorizations Filesystem App on Accumulo
  30. 30. © Cloudera, Inc. All rights reserved. THANK YOU
  31. 31. © Cloudera, Inc. All rights reserved. Backup
  32. 32. © Cloudera, Inc. All rights reserved. 32 dirTable Query • Search by path • Directory Metadata

×