Publicité
Publicité

Contenu connexe

Présentations pour vous(20)

Publicité

Dernier(20)

Publicité

DataPatterns - Profiling in ECL Watch

  1. 2019 HPCC Systems® Community Day Challenge Yourself – Challenge the Status Quo DataPatterns - Profiling in ECL WatchDan S. Camper Thaumaturge HPCC Systems Solutions Lab
  2. Topics • What is DataPatterns? • Improvements since last year • ECL Standard Library integration • ECL Watch integration • Differences between installations 2DataPatterns - Profiling in ECL Watch
  3. What is DataPatterns?
  4. DataPatterns – What is it? • ECL bundle that provides some basic data profiling and research tools to an ECL programmer • Today, it is primarily a data profiling tool • Numerous parameters for controlling analysis and output • Analyze all rows in a dataset or just a sample • Analyze all fields or only certain fields • Enable only specified profiling checks • Specify returned pattern counts • Creates a single dataset as a result • One record for each field analyzed 4DataPatterns - Profiling in ECL Watch
  5. Improvements
  6. Improvements Since Last Year • Profile() • Cardinality Breakdown • Improved UTF-8 handling • Support additional data types • Embedded child records • Child datasets • SET OF • Pretty results • BestRecordStructure() • Optional generated TRANSFORM() • Lots of bug fixes 6DataPatterns - Profiling in ECL Watch
  7. ECL Standard Library Integration
  8. DataPatterns Grows Up … • Portions of bundle integrated with ECL Standard Library • Profile() • BestRecordStructure() • As of HPCC Systems 7.4.0 8DataPatterns - Profiling in ECL Watch
  9. … And Gains A User Interface in ECL Watch 9DataPatterns - Profiling in ECL Watch
  10. Logical File’s Record Structure 10DataPatterns - Profiling in ECL Watch
  11. Executing DataPatterns.Profile() 11DataPatterns - Profiling in ECL Watch
  12. DataPatterns.Profile() In Progress 12DataPatterns - Profiling in ECL Watch
  13. DataPatterns.Profile() Workunit ECL 13DataPatterns - Profiling in ECL Watch
  14. DataPatterns.Profile() Raw Results 14DataPatterns - Profiling in ECL Watch
  15. DataPatterns.Profile() Report Results 15DataPatterns - Profiling in ECL Watch
  16. Differences Between Installations
  17. Differences Between Installations • ECL Bundle contains additional functions • ProfileFromPath() • BestRecordStructureFromPath() • Contains support for pretty report • Available from https://github.com/hpcc-systems/DataPatterns • ECL Standard Library • Does not support pretty report • Available with HPCC Systems 7.4.0 and later • ECL Watch • Supports only data profiling • Available with HPCC Systems 7.4.0 and later 17DataPatterns - Profiling in ECL Watch
  18. 18DataPatterns - Profiling in ECL Watch
  19. DataDetectors – What Is This Data? Bloom Filter Models • Person.FirstName • Person.LastName • Geo.USA.Address.StreetName • Geo.USA.Address.CityName • Geo.USA.Address.PostalCode • Geo.USA.PhoneAreaCode • Geo.CountryName • Geo.CountryCode • Identifier.USA.StockSymbol Heuristic Models • Calendar.Date • Calendar.Month • Calendar.Quarter • Calendar.Year • Calendar.YearMonth • Currency • Group.StockExchange • Geo.USA.Address.State • Geo.Longitude • Geo.Latitude • Geo.LatLon • Identifier.USA.PhoneNumber • Identifier.EmailAddress • Identifier.RecordID • Identifier.WebSiteURL DataPatterns - Profiling in ECL Watch 19
  20. DataDetectors Test – Raw Data 20DataPatterns - Profiling in ECL Watch
  21. DataDetectors Test – Data Examination Results 21DataPatterns - Profiling in ECL Watch
  22. 22DataPatterns - Profiling in ECL Watch
  23. Cloud IDE – ECL Programming in a browser-based IDE 23DataPatterns - Profiling in ECL Watch
  24. Cloud IDE 24DataPatterns - Profiling in ECL Watch
  25. Cloud IDE 25DataPatterns - Profiling in ECL Watch
  26. 26DataPatterns - Profiling in ECL Watch fini View this presentation on YouTube: https://www.youtube.com/watch?v=TtcrOcyf6gQ&list=PL- 8MJMUpp8IKH5-d56az56t52YccleX5h&index=6&t=0s (13:19)
Publicité