You can watch the replay for this webcast in the IDERA Resource Center: http://ow.ly/xbaO50A59Ah
Deoxyribonucleic acid (DNA) is the fundamental building block that specifies the structure and function of living things. The information in DNA is stored as a code made up of four chemical bases in which the sequencing determines unique characteristics, similar to the way in which letters of the alphabet appear in a certain order to form words and sentences.
Organizations can also be regarded as organic, with a need to adapt to changes in their environment. Every aspect of an organization also has a corresponding data representation, which can be regarded as its DNA. Without the correct tools and techniques, decoding that data structure can be extremely complex. Data modeling reveals that data in most organizations follows similar patterns. Once we recognize that, we can focus on the data characteristics that make each organization unique.
Establishing a data culture is vital to success, enabling a transformational breakthrough to translate data into knowledge and ultimately, strategic advantage. IDERA’s Ron Huizenga will explain how a business-driven data architecture enables you to leverage your data as a valuable strategic asset.
About Ron: Ron Huizenga is the Senior Product Manager of Enterprise Architecture and Modeling at IDERA. Ron has over 30 years of business and IT experience across many different industries including manufacturing, retail, healthcare, and transportation. His hands-on consulting experience with large-scale data development engagements provides practical, real-world insights to enterprise data architecture, business architecture, and governance initiatives.
Abstract:
Deoxyribonucleic acid (DNA) is the fundamental building block that specifies the structure and function of living things. The information in DNA is stored as a code made up of four chemical bases in which the sequencing determines unique characteristics, similar to the way in which letters of the alphabet appear in a certain order to form words and sentences.
Organizations can also be regarded as organic, adapt to change in their environment. Every aspect of an organization also has a corresponding data representation, which can be regarded as its DNA. Without the correct tools and techniques, decoding that data can be extremely complex. Data modeling reveals that data in most organizations follows similar patterns. Once we recognize that, we can focus on the data characteristics that makes each organization unique.
Establishing a data culture is vital to success, enabling a transformational breakthrough to transform data into knowledge and ultimately, strategic advantage.
organizations are continually challenged by very complex data environments. Part of this is due to a proliferation of different technologies and data platforms, but there are additional challenges posed by identifying, ingesting, and utilizing data that the organization itself does not create nor own. This type of data requires significant analysis, scrutiny, and processing before it can be combined with trusted organizational data sources to facilitate informed analytics and decisions.
The difficulty in understanding and managing data resources is compounded further, since the data stores are now likely to be a collection of cloud and on-premise deployments, with widely varying levels of data quality.
Thus, we are typically dealing with varying combinations of:
Data origin: internal vs. external environment
Data store type: relational database (RDBMS) vs. NoSQL
Deployment: on premise vs. cloud
In discussion, it is common to refer to "the data warehouse" or the "data lake" which can leave the impression that there is only one. However, in our complex ecosystem, we will typically have a myriad of raw data stores, document stores, OLTP relational databases, operational data stores and data warehouses. Likewise, the data lake is not one physical data store. Rather, it is a concept which is more commonly being referred to as the Logical Data Lake.
Following the flow of the diagram from left to right, the logical data lake begins once data is ingested, from storage of raw transient data, raw data analysis (data science), approved data stores, trusted data stores, the information refinery (including ETL), refined data (including data warehouse) which ultimately drives Analytics and Reporting. I have indicated a small subset of the typical data store technologies that could be used in specific areas to provide additional context. There are many more available data store technologies. In addition, the depiction of a specific technology in a given area does not mean that the technology is limited to use in only that area. Several data store platforms have been used in multiple or all areas.
In the past, we have often referred to organizations as information factories. This is more relevant today, than ever before. We can't simply trust the quality of data that we find in a particular data store, particularly if it is a raw data feed that has been ingested from outside sources, such as social media sites. IOT sensor data, 3rd party sites, and other external sources. Continuing with the manufacturing analogy, those raw materials need to be inspected and processed before they can be incorporated into any downstream manufacturing processes. Once approved, that data can be be refined and combined with our trusted data sources.
Data modeling is more important now than ever before. ER/Studio will allow you to map all the relevant data stores in the Multi-Hybrid Data Ecosystem and Logical Data Lake incorporating all sources, targets and data lineage. This will provide an integrated blueprint of physical deployment models, enterprise data dictionaries and enterprise models.
From DMBOK:
Data is the representation of facts as text, numbers, graphics, images, sound or video. Technically, data is the plural form of the word Latin word datum, meaning ―a fact. However, people commonly use the term as a singular thing. Facts are captured, stored, and expressed as data.
Information is data in context. Without context, data is meaningless; we create meaningful information by interpreting the context around data.
This context includes:
The business meaning of data elements and related terms.
The format in which the data is presented.
The timeframe represented by the data.
The relevance of the data to a given usage.
Data is the raw material we interpret as data consumers to continually create information.
The official or widely accepted meanings of commonly used terms also represent a valuable enterprise resource, contributing to a shared understanding of meaningful information. Data definitions are just some of the many different kinds of ―data about data‖ known as meta-data. Meta-data, including business data definitions, helps establish the context of data, and so managing meta-data contributes directly to improved information quality. Managing information assets includes the management of data and its meta-data.
Information contributes to knowledge. Knowledge is understanding, awareness, cognizance, and the recognition of a situation and familiarity with its complexity. Knowledge is information in perspective, integrated into a viewpoint based on the recognition and interpretation of patterns, such as trends, formed with other information and experience. It may also include assumptions and theories about causes. Knowledge may be explicit—what an enterprise or community accepts as true–or tacit–inside the heads of individuals. We gain in knowledge when we understand the significance of information.
Like data and information, knowledge is also an enterprise resource. Knowledge workers seek to gain expertise though the understanding of information, and then apply that expertise by making informed and aware decisions and actions. Knowledge workers may be staff experts, managers, or executives. A learning organization is one that proactively seeks to increase the collective knowledge and wisdom of its knowledge workers.
Naming standards are a mechanism to define, apply and enforce naming conventions to model objects. The naming of objects, particularly entities and attributes is extremely important in order to understand the business context and the real world objects that they represent. Naming standards typically comprise the following:
• List of common business terms to be used in naming
• Abbreviation for each term
• Template to specify order of terms (specific to object type)
• Case standards (upper, lower, first letter capitalized, etc.)
• Prefixes and suffixes
Both tools offer the capability to set up and apply naming standards templates, as well as the ability to upload terms and abbreviations from external sources such as Microsoft® Excel spreadsheets. The naming standards templates are quite similar. The following screen capture depicts one of the ER/Studio Naming Standards Template tabs, prior to entering any of the specifications.
Both tools offer the capability to set up and apply naming standards templates, as well as the ability to upload terms and abbreviations from external sources such as Microsoft® Excel spreadsheets. The naming standards templates are quite similar. The screen capture depicts one of the ER/Studio Naming Standards Template tabs, prior to entering any of the specifications.
The typical use case for naming standards is in creating physical object names from their logical counterparts:
entity names table names
attribute names column names
The manner in which this is done differs between ERwin and ER/Studio. The basis for this difference arises from the level of coupling between logical and physical models
The auto naming standards will allow us to bind a naming standards template to data model objects such as entities/tables and attributes/columns. The typical use case would be to have the physical name change in place as we are editing the logical name. We will also be able to apply physical to logical mapping (reverse direction) if that is desired.
Talk about all the different instances, different names. Then address requirement of a repository based solution, allowing those links to be formalized through universal mappings.
Universal Mappings are the ability to link “like” or related objects within the same model file or across separate model files. A typical use case is linking the representations of the same real life business object that exist in different models. For example, let’s assume we are dealing with the concept of employees. Employee data may exist in many different databases across the organization. Once we have reverse engineered those databases, universal mappings would be used to link the tables (or corresponding entities) together. This provides traceability in “where used” functionality to find all instances of the object.
This image depicts the dictionary tab in ER/Studio, showing the attachments and Data Security Information tags. Beside it in the diagram view, a table is depicted, illustrating that they can also be depicted on model diagrams
It’ s critical to point out that when we say data modeling, we are talking about a lot more than simple ER diagrams
Data can be described by the way that it is created, read, updated, deleted, and searched. This life cycle is called the CRUD cycle and is different for different data element types and companies. Lifecycle is extremely important, but often overlooked in less mature organizations.
For example, in the case of master data, how a customer is created depends largely upon a company's business rules, industry segment, and data systems. One company may have multiple customer-creation vectors, such as through the Internet, directly through account representatives, or through outlet stores. Another company may only allow customers to be created through direct contact over the phone with its call center. Further, how a customer element is created is certainly different from how a vendor element is created.
In ER/Studio, metadata lineage is supported directly in the modeling tool through the “where used” and dependent-objects functionality. It is part of the metadata that can be published in Team Server.
Within ER/Studio, data lineage is the ability to document data extraction, transformation and load parameters, which is sometimes referred to as source and target mapping. Data lineage enables you to document the movement of data from point A to point B, and any intermediate steps in between. This movement is sometimes referred to as Extraction, Transformation and Load (ETL). A model produced in ER/Studio can represent any point along the way. Data Architects need the ability to specify the "source" or "target" of data, down to the column/attribute level. Along with the metadata that defines the source and target mapping are rules for how the data is manipulated along the way.
Glossary hierarchies are used to group terms in a manner that aligns with the organizational structure of your business. Typically these areas have different stewards that are responsible for maintaining/updating the definitions as well as adding new business terms that are applicable.
Note the details of the entity. Not just modeling characteristics, but also all of the associated attachments: retention policies, master data classification, business value (whatever is needed – fully definable)
Security Properties – With Alerts.
Note the alert at the top of the page due to the bound security properties.
Can link to reference data in worksheets (google, intranet, SharePoint, MDM repository, external sources)
Data policy
Increasingly complex regulations
Imperatives
Data security
Data Privacy
Data integrity
Create, discuss, update policies
Needs to become part of corporate data culture
Associate policies to data concepts and data elements for easy identification
Policies and rules need to be visible to data users, stewards
Alerting mechanisms
Collaborative stakeholder engagement for important policy decisions, clarification
Operationalize the data
Common & consistent reference data sets
Consistent data usage
Common understanding of how reference and master data is used, stored, connected
Master Data Management (MDM) classification
Reconcile data across operational systems for standardized reporting and analytics
Ensure consistency through enterprise data dictionaries