3. Capabilities
●
Discovery
○
○
●
Lineage
○
○
●
Search through metadata to find data set/operation of
interest.
View schema, associated metadata etc. for a dataset
Given a data set, trace back to the original source.
Understand the impact of modifying a data set.
Audit
○
○
Generate report of access to a data set in Hadoop.
Generate alert when a restricted data set is accessed.
13. Lineage (Hive Query)
INSERT OVERWRITE TABLE machine_vendors
SELECT upper(trim(regexp_extract(ms.dmidecode,"System InformationntManufacturer: ([^n]+)",1))) AS manufacturer,upper
(trim(regexp_extract(ms.dmidecode,"System InformationntManufacturer: ([^n]+)ntProduct Name: ([^n]+)",2))) AS product,ca.
address_state,ca.customerKey,cm.clusterId,ms.machineName
FROM crm_accounts ca JOIN cluster_metadata cm
ON ca.customerKey = cm.customerKey JOIN machine_stats ms
ON cm.customerKey = ms.customerKey AND cm.clusterId = ms.clusterId AND cm.collectionTS = ms.collectionTS
22. Model (Contd…)
●
Relation
○
○
○
Unique Identity
Two sets of related elements
Relationship type
(Parent Child Relation, Data Flow Relation, Control Flow
Relation, Instance Of Relation, Alias Relation, Generic
Relation)