Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Dapper Tool - A Bundle to Make your ECL Neater

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Scala and Hadoop @ eBay
Scala and Hadoop @ eBay
Chargement dans…3
×

Consultez-les par la suite

1 sur 53 Publicité

Dapper Tool - A Bundle to Make your ECL Neater

Télécharger pour lire hors ligne

Have you ever written a long project for a simple column rename and thought, this should be easier? What about nicely named output statements? Yeah they bother me too. Oh, and DEDUP(SORT(DISTINCT()))? There is a better way! Learn how Dapper can help!

Have you ever written a long project for a simple column rename and thought, this should be easier? What about nicely named output statements? Yeah they bother me too. Oh, and DEDUP(SORT(DISTINCT()))? There is a better way! Learn how Dapper can help!

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Dapper Tool - A Bundle to Make your ECL Neater (20)

Publicité

Plus par HPCC Systems (20)

Plus récents (20)

Publicité

Dapper Tool - A Bundle to Make your ECL Neater

  1. 1. 2019 HPCC Systems® Community Day Challenge Yourself – Challenge the Status Quo Dapper – A Bundle to Make Your ECL NeaterRob Mansfield Senior Data Scientist Proagrica
  2. 2. Please ask questions! Dapper – A Bundle to Make Your ECL Neater
  3. 3. Who thinks ECL can be a little verbose?
  4. 4. Engineers on big projects may need this level of control. But. QAs Analysts Developers Data Scientist
  5. 5. For these people, ECL syntax is a bit of a trial! Dedup • DEDUP(SORT(DISTRIBUTE(x, HASH(y)), x, LOCAL), x, LOCAL); One column transform • PROJECT(x, TRANSFORM(RECORDOF(LEFT), SELF.y := LEFT.y+1; SELF := LEFT;); Named output • OUTPUT(x, NAMED('x')); Write to CSV • OUTPUT(x, , '~ROB::TEMP::x', CSV(HEADING(SINGLE), SEPARATOR(','), TERMINATOR('n'), QUOTE('"'))); Grouped count • [I ran out of space] Dapper – A Bundle to Make Your ECL Neater
  6. 6. How does this stuff work in other languages? Well, R is nice! library(dplyr) df <- read.csv('x') df <- select(df, col1, col2) df <- mutate(df, col3 = col1 + col2) df <- group_by(df, col3) df <- summarise(df, col5 = n()) write.csv(df, file='output.csv') Dapper – A Bundle to Make Your ECL Neater
  7. 7. How does this stuff work in other languages? Well, R is nice! library(dplyr) df <- read.csv('x') %>% select(col1, col2) %>% mutate(col3 = col1 + col2) %>% group_by(col3) %>% summarise(col5 = n()) %>% write.csv(file='output.csv') Dapper – A Bundle to Make Your ECL Neater
  8. 8. SQL is also lovely, but can be hard to arrange into a single call SELECT COUNT(col2), col1 FROM TABLE GROUP BY col1; Dapper – A Bundle to Make Your ECL Neater
  9. 9. ….and Python is, as always, Python Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. … Dapper – A Bundle to Make Your ECL Neater
  10. 10. Enter Dapper… Dapper – A Bundle to Make Your ECL Neater
  11. 11. Let’s work through an example I don’t know about you but I’ve always wanted to know Jabba the Hutt’s Body Mass Index…
  12. 12. Load Data IMPORT dapper.ExampleData; IMPORT dapper.TransformTools as tt; Dapper – A Bundle to Make Your ECL Neater
  13. 13. View Data //load data StarWars := ExampleData.starwars; // Look at the data tt.nrows(StarWars); tt.head(StarWars); Dapper – A Bundle to Make Your ECL Neater
  14. 14. Dapper – A Bundle to Make Your ECL Neater
  15. 15. //Fill blank species with unknown fillblankHome := tt.mutate(StarWars, species, IF(species = '', 'Unkn.', species)); tt.head(fillblankHome); Fill in some blanks Dapper – A Bundle to Make Your ECL Neater
  16. 16. That’s right, we don’t need LEFT or SELF!!! What sorcery is this?!?!? Dapper – A Bundle to Make Your ECL Neater
  17. 17. Okay, we now need to make our BMI column! //make height meters heightMeters := tt.mutate(fillblankHome, height, height/100); //Create a BMI for each character bmi := tt.append(heightMeters, REAL, BMI, mass/(height^2)); //Look at just the new column and name bmiSelect := tt.select(bmi, 'name, bmi'); tt.head(bmiSelect); Dapper – A Bundle to Make Your ECL Neater
  18. 18. Let's work through an example Sort! //Find the highest sortedBMI := tt.arrange(bmiSelect, '-bmi'); tt.head(sortedBMI); Dapper – A Bundle to Make Your ECL Neater
  19. 19. Lovely, I feel that’s one of life’s great questions answered I do of course have other questions on Star Wars
  20. 20. Has anyone else noticed the lack of diversity in the SW universe? //How many of each species are there? species := tt.countn(sortedBMI, 'species'); sortedspecies := tt.arrange(species, '-n'); tt.head(sortedspecies); Dapper – A Bundle to Make Your ECL Neater
  21. 21. There are some pretty exciting eye colours though! //Finally let's look at unique hair/eye colour combinations: colourData := tt.select(StarWars, 'eye_color'); unqiueColours := tt.distinct(colourData, 'eye_color'); //see arrangedistinct() for fancy sort/dedup tt.head(unqiueColours); Dapper – A Bundle to Make Your ECL Neater
  22. 22. Let's work through an example Save //and save our results tt.to_csv(sortedBMI, 'ROB::TEMP::STARWARSCSV'); tt.to_thor(sortedBMI, 'ROB::TEMP::STARWARS'); Dapper – A Bundle to Make Your ECL Neater
  23. 23. Let’s do a quick side-by-side
  24. 24. IMPORT dapper.ExampleData; IMPORT dapper.TransformTools as tt; //load data StarWars := ExampleData.starwars; // Look at the data tt.nrows(StarWars); tt.head(StarWars); //Fill blank species with unknown fillblankHome := tt.mutate(StarWars, species, IF(species = '', 'Unkn.', species)); tt.head(fillblankHome); //Create a BMI for each character bmi := tt.append(fillblankHome, REAL, BMI, mass/height^2); tt.head(bmi); //Find the highest sortedBMI := tt.arrange(bmi, '-bmi'); tt.head(sortedBMI); //Jabba should probably go on a diet. Dapper IMPORT dapper.ExampleData; //load data StarWars := ExampleData.starwars; // Look at the data OUTPUT(COUNT(StarWars), NAMED('COUNTstarWars')); OUTPUT(StarWars, NAMED('starWars')); //Fill blank species with unknown //Create a BMI for each character fillblankHomeAndBMI := PROJECT(StarWars, TRANSFORM({RECORDOF(LEFT); REAL BMI;}, SELF.BMI := LEFT.mass / LEFT.Height^2; SELF.species := IF(LEFT.species = '', 'Unkn.', LEFT.species); SELF := LEFT;)); OUTPUT(fillblankHomeAndBMI, NAMED('fillblankHomeAndBMI')); //Find the highest sortedBMI := SORT(fillblankHomeAndBMI, -bmi); OUTPUT(sortedBMI, NAMED('sortedBMI')); //Jabba should probably go on a diet. Base ECL Dapper – A Bundle to Make Your ECL Neater
  25. 25. //How many of each species are there? species := tt.countn(sortedBMI, 'species'); sortedspecies := tt.arrange(species, '-n'); tt.head(sortedspecies); //Finally let's look at eye colour : colourData := tt.select(StarWars, 'eye_color'); unqiueColours := tt.distinct(colourData, 'eye_color'); //see arrangedistinct() for fancy sort/dedup tt.head(unqiueColours); //and save our results tt.to_csv(sortedBMI, 'ROB::TEMP::STARWARSCSV'); //How many of each species are there? CountRec := RECORD STRING Species := sortedBMI.species; INTEGER n := COUNT(GROUP); END; species := TABLE(sortedBMI, CountRec, species); sortedspecies := SORT(species, -n); OUTPUT(sortedspecies, NAMED('sortedspecies')); //Finally let's look at unique eye colour: colourData := TABLE(sortedBMI, {eye_color}); unqiueColours := DEDUP(SORT(DISTRIBUTE(colourData, HASH(eye_color)), eye_color, LOCAL), eye_color, LOCAL); OUTPUT(COUNT(unqiueColours), NAMED('COUNTunqiueColours')); OUTPUT(unqiueColours, NAMED('unqiueColours')); //and save our results OUTPUT(sortedBMI, , 'ROB::TEMP::STARWARSCSV', CSV(HEADING(SINGLE), SEPARATOR(','), TERMINATOR('n'), QUOTE('"'))); Dapper Base ECL Dapper – A Bundle to Make Your ECL Neater
  26. 26. …and we still haven’t even scratched the surface…
  27. 27. Interested? You can install from our GitHub: ecl bundle install https://github.com/OdinProAgrica/dapper.git There’s also a more in-depth walkthrough (and infographic) here: https://hpccsystems.com/blog/dapper-bundle Similar projects? Yes, yes we have! https://github.com/OdinProAgrica Dapper – A Bundle to Make Your ECL Neater
  28. 28. Bonus deck! We would like to introduce you to hpycc Dapper – A Bundle to Make Your ECL Neater
  29. 29. Hpycc is a Python package that builds on the ideas of Dapper That is: How can we make HPCC Systems more useable to the Data Scientist? How can this translate to engineering and development? Dapper – A Bundle to Make Your ECL Neater
  30. 30. Things I find overly taxing • Spraying new data • Running scripts that I can customise easily • Getting the results of queries and files • ECL dev when I’m offsite Dapper – A Bundle to Make Your ECL Neater
  31. 31. What if you could run all this from a Python notebook? Now you can! Dapper – A Bundle to Make Your ECL Neater
  32. 32. For the purposes of this demo I’ve made a throwaway function Dapper – A Bundle to Make Your ECL Neater
  33. 33. I’m dev-ing locally so I’ll need HPCC Systems running …then create a connection to my server Dapper – A Bundle to Make Your ECL Neater
  34. 34. Let’s grab the raw Star Wars dataset… Dapper – A Bundle to Make Your ECL Neater
  35. 35. What if we have more than one output? Dapper – A Bundle to Make Your ECL Neater
  36. 36. Dapper – A Bundle to Make Your ECL Neater
  37. 37. Dapper – A Bundle to Make Your ECL Neater
  38. 38. Dapper – A Bundle to Make Your ECL Neater
  39. 39. Dapper – A Bundle to Make Your ECL Neater
  40. 40. Interested? You can install from pypi: pip install hpycc There’s also a more info on our github: Similar projects? Yes, yes we have! https://github.com/OdinProAgrica https://github.com/OdinProAgrica/hpycc Dapper – A Bundle to Make Your ECL Neater
  41. 41. Watch this space for our most recent project: Wally! Dapper – A Bundle to Make Your ECL Neater
  42. 42. A little flavour of what we have already… Dapper – A Bundle to Make Your ECL Neater
  43. 43. Interested? You can install from our github: pip install hpycc There’s also a more info on our github: Similar projects? Yes, yes we have! https://github.com/OdinProAgrica https://github.com/OdinProAgrica/wally Dapper – A Bundle to Make Your ECL Neater
  44. 44. Oh, and Dapper has some string tools!
  45. 45. …we are also building a stringtools as part of the Dapper bundle IMPORT dapper.stringtools as st; source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion'; target := 'nobody expects the spanish inquisition'; Dapper – A Bundle to Make Your ECL Neater
  46. 46. …we are also building a stringtools as part of the Dapper bundle source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion'; target := 'nobody expects the spanish inquisition'; Dapper – A Bundle to Make Your ECL Neater
  47. 47. …we are also building a stringtools as part of the Dapper bundle IMPORT STD; source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion'; target := 'nobody expects the spanish inquisition'; one := TRIM(std.Str.ToLowerCase(source), LEFT, RIGHT); two := REGEXREPLACE('1', one, 'body'); three := REGEXREPLACE('[^a-z ]', two, ''); four := REGEXREPLACE('mm', three, 'n'); five := REGEXREPLACE('req', four, 'inq'); six := REGEXREPLACE('s+', five, ' '); six; Dapper – A Bundle to Make Your ECL Neater
  48. 48. …we are also building a stringtools as part of the Dapper bundle IMPORT dapper.stringtools as st; source := 'No1 e-xp-ec-t-s t809he [S]pammish ReQuIsiTion'; target := 'nobody expects the spanish inquisition'; regexDS := DATASET([ {'1' , 'body'}, {'[^a-z ]', '' }, {'mm' , 'n' }, {'req' , 'inq' }, {'s+' , ' ' } ], {STRING Regex; STRING Repl;}); st.regexLoop(source, regexDS); target; Dapper – A Bundle to Make Your ECL Neater
  49. 49. Questions? Rob Mansfield Senior Data Scientist Proagrica, RBI Rob.Mansfield@proagrica.com Dapper – A Bundle to Make Your ECL Neater
  50. 50. View this presentation on YouTube: https://www.youtube.com/watch?v=jOORZdOWnxk&list=PL- 8MJMUpp8IKH5-d56az56t52YccleX5h&index=5&t=0s (20:46)

×