A preview of my talk about Data Strategies - what they are, how to implement one and what to do if you need to tame your Data Chaos. Includes tools and architecture examples!
NTEN Your Analytics doesn't have to be dramatic to be usefulAndrew Patricio
My presentation at the 2024 NTEN conference in Portland, OR. I talk about practical approaches and benefits to deploying your analytics and reporting systems. Three high level themes:
1. Focus on people not the system, in particular make sure you start with hiring someone who understands your data before building your system. Data analytics augments human intuition not replaces it.
2. Make sure you start with your organizational vision to define your business outcomes to define your metrics and analytics to define your data. In other words make sure you are tracking relevant data
3. It is more about evolution not revolution. Data science is incremental not sudden.
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
Caserta Concepts President Joe Caserta featured at Data Governance Winter 2014 Conference with a session on the basic and necessary steps needed for data quality and data governance success
For more information on the event and presentation: http://ow.ly/G3N9N
For more information on the services and solutions offered by Caserta Concepts, visit http://casertaconcepts.com/.
Workshop with Joe Caserta, President of Caserta Concepts, at Data Summit 2015 in NYC.
Data science, the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions, may be considered the "sexiest" job of the 21st century, but it requires an understanding of many elements of data analytics. This workshop introduced basic concepts, such as SQL and NoSQL, MapReduce, Hadoop, data mining, machine learning, and data visualization.
For notes and exercises from this workshop, click here: https://github.com/Caserta-Concepts/ds-workshop.
For more information, visit our website at www.casertaconcepts.com
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
NTEN Your Analytics doesn't have to be dramatic to be usefulAndrew Patricio
My presentation at the 2024 NTEN conference in Portland, OR. I talk about practical approaches and benefits to deploying your analytics and reporting systems. Three high level themes:
1. Focus on people not the system, in particular make sure you start with hiring someone who understands your data before building your system. Data analytics augments human intuition not replaces it.
2. Make sure you start with your organizational vision to define your business outcomes to define your metrics and analytics to define your data. In other words make sure you are tracking relevant data
3. It is more about evolution not revolution. Data science is incremental not sudden.
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
Caserta Concepts President Joe Caserta featured at Data Governance Winter 2014 Conference with a session on the basic and necessary steps needed for data quality and data governance success
For more information on the event and presentation: http://ow.ly/G3N9N
For more information on the services and solutions offered by Caserta Concepts, visit http://casertaconcepts.com/.
Workshop with Joe Caserta, President of Caserta Concepts, at Data Summit 2015 in NYC.
Data science, the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions, may be considered the "sexiest" job of the 21st century, but it requires an understanding of many elements of data analytics. This workshop introduced basic concepts, such as SQL and NoSQL, MapReduce, Hadoop, data mining, machine learning, and data visualization.
For notes and exercises from this workshop, click here: https://github.com/Caserta-Concepts/ds-workshop.
For more information, visit our website at www.casertaconcepts.com
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
In our recent Big Data Warehousing Meetup, we discussed Data Governance, Compliance and Security in Hadoop.
As the Big Data paradigm becomes more commonplace, we must apply enterprise-grade governance capabilities for critical data that is highly regulated and adhere to stringent compliance requirements. Caserta and Cloudera shared techniques and tools that enables data governance, compliance and security on Big Data.
For more information, visit www.casertaconcepts.com
Joe Caserta's 2016 Data Summit Workshop "Introduction to Data Science with Hadoop" on May 9, expanded on his Intro to Data Science Workshop held at last year's Summit. Again, Joe presented to a standing-room only audience with a focus on the data lake, governance and the role of the data scientist.
For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/
Standards make it easier to create, share, and integrate data by making sure that there is a clear understanding of how the data are represented and that the data you receive are in a form that you expected. Data standards are the rules by which data are described and recorded. In order to share, exchange, and understand data, we must standardize the format as well as the meaning. Simply put, using standards makes using things easier. If different groups are using different data standards, combining data from multiple sources is difficult, if not impossible.
Alyson Murphy is the in-house Senior Data Architect at Moz. She works with stakeholders to build a data solution that help Moz make data informed business decisions. Sean Work runs the blog at KISSmetrics.com. He’s been with the team since 2010.
We consolidated key data that was routinely used for analysis onto one reporting server. We then funneled key pieces of data into our web analytics solution so that there were fewer places to have to look for data when it was time to do an analysis.
We used to use email to get and prioritize projects. We shifted to Trello which allows us to have templates to ensure request quality and to be transparent about when certain projects will be worked on.
Where to Focus 2
Then we may branch out into orange and other colors.
But that’s all it is, a series of colors.
It’s not until much later that we start to see the entire picture. Where to Focus
In reality, your ball probably looks like this.
Goal 1: Build the Minimum Viable Ball
Components of a Data System There are 6 main areas of focus for building a successful and scalable data system.
Data Infrastructure Consolidating your data sources will make analysis easier and quicker which is important when you start adding people to your team.
Data Integrity Data Infrastructure and Data Integrity are perhaps the most important places to start because decisions in these areas waterfall into the other areas of your Data System.
Data Access and Visualization Data Access and Visualization is key as your company starts to grow. The goal is to make the data as easy to access as possible for people who have the skills to fish for their own data..
Infrastructure Change Process When you are in startup mode, everyone might have access to do what they need to do quickly to implement the changes they need to make. In a small organization this works out because everyone knows what everyone else is working on.
Goal 1: Build the Minimum Viable Beach Ball
To optimize the system, you may have to sub-optimize the subsystem.
Over COMMUNICATE what sections you are working on (helps with buy-in)
Ways to get buy-in
Pre-Research Buying Decisions
Data Infrastructure Data Integrity Data Access & Visualization Components of a Data System Infrastructure Change Process People in your Org Data Utilization Process Changes
In order to re-evaluate the KPI’s we look at, we had a collaborative meeting where each of the groups came up with a dashboard. We then looked for areas where we needed to create alignment. After that, we started building.
Data Infrastructure Data Integrity Data Access & Visualization Components of a Data System Process Changes Infrastructure Change Process People in your Org Data Utilization Process
Minimum Viable Product (MVP) vs. All-in-one Do you want to ship as little as possible as soon as possible and learn and add versus shipping a totally finished product all at once.
The pioneers in the big data space have battle scars and have learnt many of the lessons in this report the hard way. But if you are a general manger & just embarking on the big data journey, you should now have what they call the 'second mover advantage’. My hope is that this report helps you better leverage your second mover advantage. The goal here is to shed some light on the people & process issues in building a central big data analytics function
Snowplow had our debut at the Data Science Festival in London this April. It was a good chance for us to engage with the data science community and learn more about the important work data scientists are doing and how Snowplow best can support this work. We definitely learned a lot and would like to thank everyone who made it by our booth for a chat.
Alex, Snowplow’s Co-Founder and CEO, held a talk on the topic “What makes an effective data team”. He took the well-known concept of Maslow’s Hierarchy of Needs and applied that to the needs of the data team.
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Data Science. Business Analytics is the statistical study of business data to gain insights. Data science is the study of data using statistics, algorithms and technology. Uses mostly structured data. Uses both structured and unstructured data.
Closing the data source discovery gap and accelerating data discovery comprises three steps: profile, identify, and unify. This white paper discusses how the Attivio
platform executes those steps, the pain points each one addresses, and the value Attivio provides to advanced analytics and business intelligence (BI) initiatives.
Whether you are interested in healthcare data analytics or looking to get started with big data and marketing, these fundamental principles from data experts will contribute to your success. http://www.qubole.com/new-series-big-data-tips/
From Asset to Impact - Presentation to ICS Data Protection Conference 2011Castlebridge Associates
This is a presentation I delivered to the Irish Computer Society Data Protection Conference in February 2011 and again on a webinar for dataqualitypro.com in March 2011.
It looks (for what I believe was the first time) at the relationship between Information Quality and Data Governance principles and practices and the objectives of Data Protection/Privacy compliance. it includes my first version of the mapping of the 8 Data Protection principles to the POSMAD Information Life Cycle referred to by McGilvray and others in the IQ/DQ fields.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Standards make it easier to create, share, and integrate data by making sure that there is a clear understanding of how the data are represented and that the data you receive are in a form that you expected. Data standards are the rules by which data are described and recorded. In order to share, exchange, and understand data, we must standardize the format as well as the meaning. Simply put, using standards makes using things easier. If different groups are using different data standards, combining data from multiple sources is difficult, if not impossible.
Alyson Murphy is the in-house Senior Data Architect at Moz. She works with stakeholders to build a data solution that help Moz make data informed business decisions. Sean Work runs the blog at KISSmetrics.com. He’s been with the team since 2010.
We consolidated key data that was routinely used for analysis onto one reporting server. We then funneled key pieces of data into our web analytics solution so that there were fewer places to have to look for data when it was time to do an analysis.
We used to use email to get and prioritize projects. We shifted to Trello which allows us to have templates to ensure request quality and to be transparent about when certain projects will be worked on.
Where to Focus 2
Then we may branch out into orange and other colors.
But that’s all it is, a series of colors.
It’s not until much later that we start to see the entire picture. Where to Focus
In reality, your ball probably looks like this.
Goal 1: Build the Minimum Viable Ball
Components of a Data System There are 6 main areas of focus for building a successful and scalable data system.
Data Infrastructure Consolidating your data sources will make analysis easier and quicker which is important when you start adding people to your team.
Data Integrity Data Infrastructure and Data Integrity are perhaps the most important places to start because decisions in these areas waterfall into the other areas of your Data System.
Data Access and Visualization Data Access and Visualization is key as your company starts to grow. The goal is to make the data as easy to access as possible for people who have the skills to fish for their own data..
Infrastructure Change Process When you are in startup mode, everyone might have access to do what they need to do quickly to implement the changes they need to make. In a small organization this works out because everyone knows what everyone else is working on.
Goal 1: Build the Minimum Viable Beach Ball
To optimize the system, you may have to sub-optimize the subsystem.
Over COMMUNICATE what sections you are working on (helps with buy-in)
Ways to get buy-in
Pre-Research Buying Decisions
Data Infrastructure Data Integrity Data Access & Visualization Components of a Data System Infrastructure Change Process People in your Org Data Utilization Process Changes
In order to re-evaluate the KPI’s we look at, we had a collaborative meeting where each of the groups came up with a dashboard. We then looked for areas where we needed to create alignment. After that, we started building.
Data Infrastructure Data Integrity Data Access & Visualization Components of a Data System Process Changes Infrastructure Change Process People in your Org Data Utilization Process
Minimum Viable Product (MVP) vs. All-in-one Do you want to ship as little as possible as soon as possible and learn and add versus shipping a totally finished product all at once.
The pioneers in the big data space have battle scars and have learnt many of the lessons in this report the hard way. But if you are a general manger & just embarking on the big data journey, you should now have what they call the 'second mover advantage’. My hope is that this report helps you better leverage your second mover advantage. The goal here is to shed some light on the people & process issues in building a central big data analytics function
Snowplow had our debut at the Data Science Festival in London this April. It was a good chance for us to engage with the data science community and learn more about the important work data scientists are doing and how Snowplow best can support this work. We definitely learned a lot and would like to thank everyone who made it by our booth for a chat.
Alex, Snowplow’s Co-Founder and CEO, held a talk on the topic “What makes an effective data team”. He took the well-known concept of Maslow’s Hierarchy of Needs and applied that to the needs of the data team.
The Data Lake - Balancing Data Governance and Innovation Caserta
Joe Caserta gave the presentation "The Data Lake - Balancing Data Governance and Innovation" at DAMA NY's one day mini-conference on May 19th. Speakers covered emerging trends in Data Governance, especially around Big Data.
For more information on Caserta Concepts, visit our website at http://casertaconcepts.com/.
Data Science. Business Analytics is the statistical study of business data to gain insights. Data science is the study of data using statistics, algorithms and technology. Uses mostly structured data. Uses both structured and unstructured data.
Closing the data source discovery gap and accelerating data discovery comprises three steps: profile, identify, and unify. This white paper discusses how the Attivio
platform executes those steps, the pain points each one addresses, and the value Attivio provides to advanced analytics and business intelligence (BI) initiatives.
Whether you are interested in healthcare data analytics or looking to get started with big data and marketing, these fundamental principles from data experts will contribute to your success. http://www.qubole.com/new-series-big-data-tips/
From Asset to Impact - Presentation to ICS Data Protection Conference 2011Castlebridge Associates
This is a presentation I delivered to the Irish Computer Society Data Protection Conference in February 2011 and again on a webinar for dataqualitypro.com in March 2011.
It looks (for what I believe was the first time) at the relationship between Information Quality and Data Governance principles and practices and the objectives of Data Protection/Privacy compliance. it includes my first version of the mapping of the 8 Data Protection principles to the POSMAD Information Life Cycle referred to by McGilvray and others in the IQ/DQ fields.
In this presentation at DAMA New York, Joe started by asking a key question: why are we doing this? Why analyze and share all these massive amounts of data? Basically, it comes down to the belief that in any organization, in any situation, if we can get the data and make it correct and timely, insights from it will become instantly actionable for companies to function more nimbly and successfully. Enabling the use of data can be a world-changing, world-improving activity and this session presents the steps necessary to get you there. Joe explained the concept of the "data lake" and also emphasizes the role of a strong data governance strategy that incorporates seven components needed for a successful program.
For more information on this presentation or Caserta Concepts, visit our website at http://casertaconcepts.com/.
Similar to BigQuery is not a Data Strategy.pdf (20)
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
2. What we'll do
Define Data Strategy
Identify organisational symptoms of
no /an insufficient Data Strategy
Learn how to move towards data maturity
Understand Data Architecture Principles & Tooling
What we won't do
Bash BigQuery - we still BigQuery
4. "It takes hours for me to put this report together
"These reports are supposed to match but never
do, it's just like that"
"This field is repeated in multiple places"
"Check with <ultimate data person here>
- they know the data"
"Oh, I thought this data explained
this but you're using it for that"
"The analytics are quite slow"
"Remember to refer to this
extensive list of data quirks"
"We can't really trust the data"
5. All your data is in a tool, not in files
Data Systems are slowing down
(ie custom reporting, dashboards, ETLs)
Data flow is not documented
Little to no Load Testing
No schema management, not even manual checks
Data Scientists/Analysts/other folks using the data, are
spending more than half their time cleaning and
normalising the data or no Engineers at all
Hardly any access control - relying on trust
The same data is being maintained in different places by
different teams
Evidence in Data Systems
7. Having a data dumping ground
Re-implementing the same transformations all over the
organisation
Putting up with painfully slow reports / queries &
dashboards
Having no idea where the data is used
No clear data owners
Not being able to present your data in different formats
easily
No planning, just dealing with issues and projects as they
crop up
Keeping data to maybe use it one day
What a Data Strategy is NOT
8. Understand:
How data supports your
Business Strategy
Things we want to achieve with data
Things we have to do our data
Offensive
Defensive
14. What not to do
Do not panic
Do not stop building things
15. Things to do first
Add timestamps and other audit
& lineage metadata
Understand your org's data flow
Find data owners
Understand where access problems are
- try to mitigate them with access controls
you currently have
Start a Data Dictionary
Start storing important Raw data files
Consider a Data Guild