Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Global Data Management: Governance, Security and Usefulness in a Hybrid World

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
0
GLOBAL DATA MANAGEMENT:
GOVERNANCE, SECURITY AND
USEFULNESS IN A HYBRID WORLD
Sponsored by
By Neil Raden
Hired Brains Re...
1
TABLE OF CONTENTS
GOAL OF GLOBAL DATA MANAGEMENT 1
A SHORT HISTORY OF SECURITY 1
THE SITUATION TODAY 2
“ALIEN” DISTANT D...
1
GOAL OF GLOBAL DATA MANAGEMENT
There is no question that there is a greater, aching desire by organizations to capture
d...
Publicité
Publicité
Prochain SlideShare
Keynote Dubai
Keynote Dubai
Chargement dans…3
×

Consultez-les par la suite

1 sur 12 Publicité

Global Data Management: Governance, Security and Usefulness in a Hybrid World

Télécharger pour lire hors ligne

With Global Data Management methodology and tools, all of your data can be accessed and used no matter where it is or where it is from: on-premises, private cloud, public cloud(s), hybrid cloud, open source, third-party data and any combination of the these, with security, privacy and governance applied as if they were a single entity. Ingenious software products and the economics of computing make it economical to do this. Not free, but feasible.

With Global Data Management methodology and tools, all of your data can be accessed and used no matter where it is or where it is from: on-premises, private cloud, public cloud(s), hybrid cloud, open source, third-party data and any combination of the these, with security, privacy and governance applied as if they were a single entity. Ingenious software products and the economics of computing make it economical to do this. Not free, but feasible.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Global Data Management: Governance, Security and Usefulness in a Hybrid World (20)

Publicité

Plus récents (20)

Publicité

Global Data Management: Governance, Security and Usefulness in a Hybrid World

  1. 1. 0 GLOBAL DATA MANAGEMENT: GOVERNANCE, SECURITY AND USEFULNESS IN A HYBRID WORLD Sponsored by By Neil Raden Hired Brains Research May, 2018
  2. 2. 1 TABLE OF CONTENTS GOAL OF GLOBAL DATA MANAGEMENT 1 A SHORT HISTORY OF SECURITY 1 THE SITUATION TODAY 2 “ALIEN” DISTANT DATA 3 MULTI-JURISDICTIONAL ISSUES 4 RISKS AND REWARDS OF A TRADE-OFF GOVERNANCE POLICY 4 THE FABRIC 4 THE GLOBAL DATA MANAGEMENT (GDM) PROGRAM 5 METADATA 5 LINEAGE 6 GOVERNANCE 6 SECURITY 7 LIFECYCLE 7 WHAT A GDM PERSON DOES (PERSONALLY AND THROUGH THE TEAM) 8 OTHER KEY ROLES 9 CONCLUSION 9 ABOUT THE AUTHOR 10
  3. 3. 1 GOAL OF GLOBAL DATA MANAGEMENT There is no question that there is a greater, aching desire by organizations to capture data and draw insight from it for a multitude of improvements and innovations in operations, customer service, and evenin completely new businesses1. That effort has become more complicated with the emergence of hybrid, distributed computing and data architectures (big data, cloud variants, multi-clouds and IoT). To succeed there is a need to address a broader data management philosophy incorporating collaboration, standardization, reuse, retention (of data and models) and especially, security and governance. To illustrate this need, a short history of enterprise security and governance will help. A SHORT HISTORY OF SECURITY Before the cloud, before big data, and even into the present, security was implemented one application system at a time. If you were in the finance department, you may be granted access to post manual ledger entries through the accounting system. If you were in Human Resources, you may be granted access to view and/or modify an employee’s records through an HR system. These grants were either embedded in the application logic based on your role, or applied externally. But the grants and restrictions were all administered through separate application systems and their security scheme was not transferable from one application to another. As a result, the overall picture was fractured, inconsistent and difficult to administer. It was developed from a time when people in organizations had tightly constrained roles. Today, employees are expected to be agile, adaptable and able to handle multiple roles in the organization simultaneously. Again, before the cloud, before big data, before data science, analysts did devise quantitative methods. In the early days of e-commerce for example, websites already employed recommendation engines, dynamic decision making based on scoring and decision trees for next-best-offer or propensity models. They did this by getting access, usually one data source at a time, from IT. Data warehouses both aided and hindered their work: aided by integrating data from multiple sources and collapsing the security model to just one source, hindered by only providing aggregated data and a rigid design that couldn’t adapt quickly (in fairness, any good data warehouse designer could enhance a schema, but provisioning new data was a slow process). The only thing that prevented the data warehouse from ingesting all of the data, internal and external, that analysts craved was scarcity. The data warehouse could only scale in terms of volume, throughput and demanding use at extreme cost. 1 We use the term“businesses” loosely as these innovationsalso apply to government, non-profits, charities and NGO’s, and any type of organization
  4. 4. 2 What organizations crave seems to shift over decades. Fifty years ago, computers were employed for record-keeping. Reporting from these systems was limited to copious printing of records. The demand for actual reporting generated long backlogs of systems analysts and programmers creating massive hairball of “interfaces” with no management. Early Business Intelligence (BI) emerged that shifted the burden to analysts, freeing IT to focus on new generations of application systems. Data access and security shifted to the data warehouse. About ten years ago, Tom Davenport published his landmark book, “Competing on Analytics2” which put the term “analytics” in play. Suddenly, analytics rose to the top of enterprise computing. Predictive analytics, data science, machine learning and Artificial Intelligence became top of mind, but they needed a place to live. The process of analyzing data in organizations has for decades applied tools designed for the individual. Spreadsheets, for example, proved to be the de-facto modeling and reporting tool for thirty years or more, but they never adequately provided services of security, governance, efficient creation and maintenance of metadata. Other tools for analysis and reporting, such as BI, provided their own solutions for metadata and collaboration, version control, etc., but they were point solutions, only useful for the product itself (Unfortunately, the same can be said for the some of the newer data science workbench products.) When Hadoop burst on the scene ten years ago, it too shared the many of the gaps. That’s not an indictment of DIY (do-it-yourself) analytics or wider analytic practices based on self-service. Rather, it’s a cautionary tale that in an enterprise, the most well-meaning and well-crafted analysis by individual contributors will always bog down with redundancy without adequate Data Management THE SITUATION TODAY With Global Data Management methodology and tools, all of your data can be accessed and used no matter where it is or where it is from: on-premises, private cloud, public cloud(s), hybrid cloud, open source, third-party data and any combination of the these, with security, privacy and governance applied as if they were a single entity. Ingenious software products and the economics of computing make it economical to do this. Not free, but feasible. Large data platforms, such as Hadoop, by their nature contain many different types of data from many different sources. In past decades, IT organizations built business- oriented data models and massaged an often unruly collection of data in data warehouses (frankly, an approach that still has merit), but for today’s technology, 2 Davenport, T. H., & Harris, J. G. (2007). Competing on analytics: The new science of winning. Boston, Mass: Harvard Business School Press
  5. 5. 3 that approach is too slow and too limiting for the hastening digital transformation facing every industry. While corporate IT designs for Security and Governance were conceived in an environment of highly controlled data management and computing, for both operational and analytical processes, those designs are counterproductive in a hybrid, distributed, complex and increasingly streaming near-real-time world. Definitions of security and governance in this environment are quite different. For example:  Old (and still prevalent) meaning of Security: To protect against loss, malicious, innocent and/or inadvertent access to or distribution of data that can cause damage. To isolate various organizational entities from each other. To throttle activity by managing from scarcity.  New meaning of security: Securing that useful and important analysis will not be missed as a result of too restrictive and or misappropriated restrictions, usually as a result of a lack of shared understanding between data stewards and, for example, data scientists  Old meaning of Governance: Is a framework that provides a formal structure for organizations to produce measurable results toward achieving their strategies and ensures that IT investments support business objectives. The most commonly used frameworks are COBIT, ITIL, COSO, CMMI and FAIR.  New meaning for Governance: Governance should be driven by a simple concept (though hard to practice): trade-offs. Giventhe complexity of the computing/data environment today, governance should aim toward a shared understanding of risk-reward for what’s needed and evaluated and managed across the enterprise by intelligent agents that augment the work of data professionals and analytics practitioners. For example, it may be in the organization’s interest to relax some access and use rules derived from simple assumptions to achieve more productive analytics from data scientists. Trade- offs are the opposite of rigidity. “ALIEN” DISTANT DATA The major issue is that enterprise data no longer exists solely in a data center or even a single cloud (or more than one, or combinations of both). Edge analytics for IoT, for example capture, digest, curate and evenpull data from other, different application platforms and live connections to partners, previously a snail-like process using obsolete processes like EDI or evenbatch ETL. Edge computing can be thought of as decentralized from on-premises networks, cellular networks, data center networks, or the cloud. All of these factors pose a risk of data originating in far-flung environments, where the data structures and semantics are not well understood or
  6. 6. 4 documented3. The risk of easily moving data from place to place or the complexity of moving the logic to the data while everything is in motion is too extreme for manual methods. MULTI-JURISDICTIONAL ISSUES Currently, organizations, at best, have governance programs for data and use in their own jurisdictions. But even those organizations that primarily operate in a single jurisdiction may have exposure to regulatory requirements in many others. The 2018 phase-in of the European Union GDPR (General Data Protection Regulation) is one such instance. The solution is a Global Data Management scheme that operates as a single program in in all jurisdictions. RISKS AND REWARDS OF A TRADE-OFF GOVERNANCE POLICY The cadence of technology innovation clearly surpasses most organization’s ability to implement each new or improved technique before the next one arrives. Governance and data management can never be a pure, complete process. It requires trade-offs; picking the issues that make the most sense, have the greatest centrality to the organization’s strategy (ies) and provide both the most protection against danger, as well as insuring the organization can be as effective as possible. Governance and data management tools today are not designed for a trade-off approach. They are layered with rules and restrictions with a “better safe than sorry” mentality. Governance has to be a continuing process between IT and the rest of the organization. Modern governance approaches cannot work with the “IT has the last word” in any discussion. It only leads to dysfunction and missed opportunities. It can’t be done with tools and methodologies of the past decades. THE FABRIC The best way to describe the solution is as a data management “fabric” that metaphorically drapes over all of these environments and provides the management and governance services needed. A short description of its functions is: The Fabric drapes over all the data resources. Is a completely different approach to enterprise data management. It allows an organization to finally derive more value from their data management initiatives than the cost of implementing them. Areas of the organization that previously were denied the insight that could have been provided by data the organization captured (somewhere) can leverage the latent value in distributed data stores, enabled by the capabilities the GDM provides. You can also think of the fabric as an underlying mechanism that orchestrates all of the functions of the GDM and allows for plugging in new capabilities in an open and seamless fashion. 3 A trucking company may have more than twentyseparate telematics providersin the cab, each with its own protocols for applications that require the truckingcompany to absorb and reactto in near-real-time
  7. 7. 5 THE GLOBAL DATA MANAGEMENT (GDM) PROGRAM Metadata, lineage, governance, security, lifecycle - are the components of the GDM. But just as importantly, are the program, the people and skills. The first step is to have an actual implementation of the “fabric.” Hortonworks provides this through its DataPlane service. The common foundation includes the ability to manage and govern data across distributed data lakes. METADATA Has a wide variety of definitions and sub-classes, but in the need for GDM, it powers both operation and understanding. Accelerating the time to value of your data investments, metadata democratizes accessibility and improves the understanding of data and processes across the organization. It rapidly improves the productivity of analysts and data scientists. While operational metadata is the bedrock for technical and operational aspects of uptime, performance, cost, etc., it is fundamental in lifting the productivity of analysts by addressing these six questions: What does the data mean (semantic)? Where does it come from (lineage)? Can I trust it (trust metrics)? Does its meaning vary by context (interpretation)? How do I find it? Who do I ask (Data stewards, SME’s)? Metadata is the key to governance and use. Metadata has to be developed for both consistency of use and understanding as well as flexibility as the organizations
  8. 8. 6 changes. The scope of the metadata catalogs is beyond the capabilities of data stewards to develop manually. The GDM must have intelligent software to: - Capture and catalog metadata for new or modified data assets - Allow for data stewards to examine the machine-generated metadata and make adjustments as necessary - Manage metadata repositories across instances to ensure it is consistent LINEAGE Where the data originated and how it has been manipulated; trust metrics (crowd sourced). A lot of the analytical data wrangling is still a manual process. One drawback is the issue of keeping track of provenance, i.e., what is the source of the data and whether it is still current. Data is rarely gathered just once. It can be reused for multiple versions of the analysis, or evencontinuously updated/refreshed as models are refreshed for continuous improvement. In addition, outcomes often need to be tracked to the original data sources for validation. GOVERNANCE Taking security and access to a new level. Security, grants and restrictions, are driven by context, not location. For example, as an analyst, you manage a corpus of work - data, models, presentations, notebooks. Access to data you need is granted based on the components you use, no matter where in the world they are. Time-consuming requests to IT or data stewards are unnecessary as access is driven by intelligent agents that understand your role. The Hortonworks Data Steward Studio, which operates which the DataPlane Service, provides businesses the capability to develop trust in their data and comply with
  9. 9. 7 regulations by understanding data provenance, origin, lineage and impact. The GDM by its nature is too complicated one or more data stewards to manage with current manual methods. The DSS provides then with the tools to secure, govern and provide the data for todays distributed, hybrid world. A popular misconception about data scientists is that all of their work is one-off and ad hoc, grabbing data and massaging it until it yields answers. In fact, their work is much more formal than that. They have to assign business friendly and intuitive names to data files that they create or download and then organize those files into directories, according to a rational naming convention. When they refresh those files, they must version them and keep track of their differences. This is a complicated process. Data doesn’t always reside in logical files. For, example, clinical and scientific lab equipment can generate hundreds or thousands of data files that scientists must name and organize before running computational analyses on them. SECURITY Previously, data management was highly driven by “silos,” collections of domains in locations. Schemes for governance were highly localized. Access to a data warehouse could be broad for an analyst, but deeper analysis requiring access to other data sources were dependent on data management in place at those sources. Where most data warehouses disappointed practitioners of advanced modeling and analysis (data scientists, for example) such as machine learning models was having access to raw data not otherwise needed in the data warehouse, including detail from source systems, sensor data streaming from the edge, and all manner of external data sources. Existing data management and security programs typically allow access to data sources used by an analyst and cohort of others on a “normal” basis, but requests beyond that range fire an alert. The paradox is, a productive analyst should spend more time working “out-of-the-box” than in it. Fractured data management and security programs thwart their efforts. Your organization is likely composed of a mosaic of data stores (or will be soon): Multi-cloud, IoT, data lakes, data warehouses, on-prem, hybrid cloud, at-rest and streaming. At-rest data can be catalogued and even updated/refreshed according to a governance scheme, but streaming data presents a more challenging problem, not one that can be solved manually as the flow can change without notice. GDM should provide tools to deal with it, but governance policy is the map, software that implements the policy is the journey. LIFECYCLE Everything discussed so far only addresses a scheme of security and governance in place. A GDM must be able to perform as a lifecycle process. That means putting in place a program and architecture that is capable of dynamically adjusting to changing to business realities as well as the rapid cadence of new technology: Integration of
  10. 10. 8 new data and features, adjusting governance policy and administration to changing conditions and doing all of that on a consistent set of tools and metadata. A robust GDM program cannot be implemented as a “project,” it continues through a lifecycle. Hortonworks provides the tools to maintain your GDM through its Data Lifecycle Manager. WHAT A GDM PERSON DOES (PERSONALLY AND THROUGH THE TEAM) One thing to keep in mind is that the fortunes of an organizations do not change by implementing technology. That’s the first step. The leader of the GDM initiative in the organization (often given the title Chief Data Officer, or CDO) needs, above all, to inspire confidence among the various stakeholders in the organization. Above and beyond any particular previous skill and experience in data management, it is paramount the person in this role has the vision to motivate and encourage the organization. This requires someone with the gravitas and communication and political skill to navigate the currents of diverse backgrounds and requirements. The GDM role is the keeper of the strategy to ensure it doesn’t flag as the process is not without challenges. This encompasses all aspects of GDM -- architecture, data catalogs, quality, lineage and metadata. To establish policies, measures, standards and requirements that fit the spirit of the initiative, must dismantle obsolete security and governance methodologies that degrade the vision. Driving the selection process of the components ensures the program can scale economically from both implementation and TCO perspectives.
  11. 11. 9 The GDM leader owns the initiative, no matter how influential various others are in the organization. The breaking down of siloes, fiefdoms and data czars is key to delivering data democratization in support of all services, analytics and data products. Inevitable change management requires careful and thorough communication to business owners and their designated data managers and stewards. The GDM Is the point person with the C-Suite on all matters relating to data for compliance, privacy and governance, and has responsibility for the initial creation of control apparatus to ensure integrity in the program. At some point, it is wise for the GDM to delegate these roles and move on as the project becomes a program. OTHER KEY ROLES There are four key roles that you will need to establish and nurture. Many people in your organization can step up to these roles with training, but will need to re-orient their practices for a global, elastic governed process: - Data scientists and data analysts to understand cross source lineage, apply models across types of data and gain access to data to gain deeper insight into both pre and post transaction analysis - Data stewards to investigate lineage, improve quality and eliminate redundancies across data assets. - Data engineers to move, backup and restore data assets across environments and sources, while implementing an efficient data storage tiering policy. - Data architects to define security and governance policies that are automatically enforced to meet compliance requirements CONCLUSION No organization today is immune from the push for some form of digital transformation. The late Peter Drucker famously said, “The computer actually may have aggravated management's degenerative tendency to focus inward…4” That was almost twenty years ago and is almost certainly not true today. However, it illustrates how information systems have changed, and how quickly. It is no longer solely sufficient to thresh through your internal record-keeping systems for insight, and it is very likely that you already do your analytics in multiple locations, multiple platforms, multiple clusters and with very different kinds of data. In addition, more of your staff are engaged in analytics as a result of better software tools and more will continue to be. It is time to jettison your old piecemeal approach to data 4 Peter F. Drucker (2009). “The Effective Executive: The Definitive Guide to Getting the Right Things Done”, p.16, Harper Collins.
  12. 12. 10 management from the mindset of twenty years ago. Global data management is not optional. ABOUT THE AUTHOR Neil Raden, based in Santa Fe, NM, is an active industry analyst, consultant and widely published author and speaker and also the founder of Hired Brains Research. Hired Brains provides thought leadership, context and advisory consulting and implementation services in Information Management, Analytics/ Data Science, Machine Learning/AI and IoT for clients worldwide. Hired Brains also provides consulting, market research, product marketing and advisory services to the software industry. Neil is the co-author of Smart (Enough) Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions, Prentice-Hall. He welcomes your comments at nraden@hiredbrains.com.

×