2. Introduction
I am a Data Management Specialist with the
Northrop Grumman Corporation working at the
Bureau of Land Management National Operations
Center (NOC) in Denver, Colorado.
The NOC harvests and maintains standardized
spatial data from the twelve administrative states
in the Bureau.
3. Today’s Agenda
Advantages of Data Standardization
FME & Data Quality
Geometry Validator Transformer
Building Geometry Checks
Tester Transformer
Building Attribute Checks
Interpreting The Error Table
Maintaining QC
4. Advantages of Data
Standardization
Data standardization ensures common
understanding of content and quality.
Data standardization serves to resolve data
anomalies and data conflicts.
Data standardization optimizes resources needed
for organizational data calls.
5. FME & Data Quality
Quality Control (QC) over standardized data:
Improves data integrity.
Builds confidence in critical business decisions.
Builds trust in the organization.
FME helps to develop an effective QC plan.
6. FME & Data Quality
FME tools provide robust data management
capabilities:
Geometry Validator transformer
Tester transformer
7. Geometry Validator Transformer
As features are harvested from the states, they
undergo a geometry validation process:
FME attempts to repair any features with null,
degenerate/corrupt or self-intersecting geometries.
An error report is created showing all features and
it’s status as passing, repaired, or not repaired.
8. Building Geometry Checks
Our organization uses a similar outline of the
diagram below to validate feature geometry:
13. Tester Transformer
Our organization uses the tester transformer to
evaluate the attribute integrity of harvested
features.
The tester transformer ensures that attribute
values meet the established design constructs.
Attributes that violate design considerations are
noted in the error report.
21. Maintaining QC
Data is periodically harvested from the states.
Once the data satisfies QC measures, as
established by the business/program area, it may
be published for external customers.
22. Thank You!
Questions?
For more information:
Berk Bayer (bbayer@blm.gov)
Tom Chatfield (tchatfie@blm.gov)
Jeff Safran (jsafran@blm.gov)
Northrop Grumman Corporation (www.ngc.com)
Bureau of Land Management (www.blm.gov)
Notes de l'éditeur
Just a brief introduction about me and the organization. I am a Data Management Specialist with the Northrop Grumman Corporation working at the Bureau of Land Management National Operations Center in Denver, Colorado. The Bureau is one of the agencies in the Department of the Interior that help manage federal lands. One of the many key functions of my position is exercising quality control over data and helping our internal customers meet and improve their data workflows. Our internal customers are primarily the twelve administrative states in the Bureau that have been delegated management responsibility over areas of federal land. The state data is standardized through a data cross-walk scheme and replicated to the NOC where it begins the QC process.
Here is an overview of items that we will be covering today. We will be discussing key advantages of data standardization, raising awareness of data quality and how our organization incorporates FME with the QC workflow. We will be discussing the FME transformers that handle the data quality checks and how these checks are translated to an error report. We will also touch on the basic objectives of an effective QC plan.
The initiative driving data standardization is the fact that standardized data ensures a common understanding of content and quality across the states. Data standardization serves to resolve challenges such as data anomalies and data conflicts. Data standardization aims to better support national business directives by expediting the process with which decisions may be made while helping to maintain organized data structure and improving query of data themes. Data standardization also optimizes available resources needed to successfully complete national data calls.
Maintaining consistent and effective QC practices improves data integrity. Customers are able to effectively plan objectives and execute decisions when they are provided with clean data. Acknowledgment of quality data from our customers builds trust and partnership in the organization. FME helps users develop a QC plan that will ensure effective management of spatially enabled standardized data.
FME tools host a wide range of capability that can allow users to effectively manage data. The Geometry Validator transformer detects spatial issues in features and attempts to repair these issues. The Tester transformer evaluates features based on the design constructs of a particular attribute field.
When features are replicated from the states, they undergo the following geometry validation process where a network of Geometry Validator transformers are employed to execute spatial QC checks. Those features that may present null, corrupt, or self intersecting geometries are attempted to be repaired by FME. Features are assigned a geometry validation status as passing, repaired, or unable to be repaired and written to an error report.
This is a conceptual diagram that captures how self intersections are handled by FME. The features that are channeled through all ports are assigned a status and the outputs are written to the error report. In the following, we will explore the properties of this diagram:
This diagram captures the properties of the Geometry Validator that allow us to check for self intersecting features. The potential issue is selected in the corresponding checkbox and the repair option is enabled down below.
In our example, the column in the error report that will hold the status of the spatial checks is declared GeometryError. The status of features without self intersecting geometry, in other words features that Pass the check, will be declared n/a, or not applicable.
Features that may initially fail the self intersections check and are not repairable by FME are assigned the status of “Unable to repair” under the GeometryError column in the error table.
Features that may initially fail the self intersections check but are successfully repaired by FME are assigned the status “Repaired by FME” and held under the GeometryError column in the error report.
As the Geometry Validator helps us track spatial integrity, the Tester transformer helps us track attribute integrity. Each attribute in our database schema is intended to capture the design considerations realized for a particular business need. Tracking and maintaining this type of attribute integrity is central to the flow of the business process. Similar to Geometry errors, attributes that violate the established constructs are noted in the error report.
This conceptual diagram illustrates the State Allotment Attribute denoted by “ST underscore ALLOT”. This particular model is comprised of two Tester transformers that will query the attribute design constructs of the input features. Similar to the spatial checks, the attribute status of the features are recorded in similar fashion in the error report as null or does not exist, invalid, or passing. Let’s explore this model in more detail:
In the first tester transformer, we check to see whether a value for the “state allotment” attribute exists in the input features, using the operator “Attribute Exists”.
Any feature that does not contain a value for the State Allotment attribute is written to the error report under the ST_ALLOT_ERROR column and assigned a tracking status indicating that the value was “NULL” or “did not exist”.
If it is determined that an attribute exists in the first Tester transformer, then test clauses are developed in the second tester transformer to reflect the business rules that govern this particular attribute field. In this particular case, the appropriate attribute value must not equal the value noted in the first clause AND the attribute must be populated as a concatenation of two other attribute field values in respective order, as noted.
Features that contain a value for the State Allotment attribute field yet fail the attribute design constructs are captured in the error report and assigned a tracking status of “ST_ALLOT is Invalid” under the ST_ALLOT_ERROR column .
Features that contain a State Allotment attribute value and that the value passes the parameters of the design constructs are assigned a passing tracking status under the ST_ALLOT_ERROR column in the error table.
The features are compiled from the state geodatabases (noted by the Unique Identifiers on the left) and then they are automatically input through the spatial and attribute checks in the FME workbench. This is a conceptual automated error report that is generated and emailed to the states once the FME workbench translation is executed. The states then load this table and their associated dataset into GIS software and spatially link this table to their dataset based on the feature GlobalID. The states then perform updates to their data according to the status of each feature in the error report. For instance, this particular feature has a ST_ALLOT value that is invalid but it does not have any Geometry Errors.
Data is periodically harvested from the states to help sustain QC efforts. Once it is determined that the internal data meets a satisfactory level of quality, the data must then pass a certification process by the business area, before it may become available for our external customers.