DBA's used to be Relational Database centric for instance managing Microsoft SQL Server or Oracle, in this changing world of polyglot database environments their role has expanded not just into new platforms other than SQL but also new legal governance, modelling techniques, architecture etc. They need to have a base knowledge of Kimball, Inmon, Data Vault, what CAP theorem is, LAMBDA, Big Data, Data Science etc.
2. Professional
◦ 29 years of Database experience – (6 on DB2, 1 on Oracle
and 23 on SQL Server)
◦ Freelance SQL Server and Data Platform specialist
◦ Fellow BCS, Masters in BI, PGCert in Data Science
◦ I also do F# (and the less relevant cousin C#)
Community
◦ Founder member of UK SQL User Group,
SQLServerFAQ.com, DataIdol.com, DDD, SQLBits and SQL
Relay
◦ Microsoft SQL Server MVP since 1997, and now a Data
Platform MVP
◦ Technical blog:
http://sqlblogcasts.com/blogs/tonyrogerson (legacy)
http://dataidol.com/tonyrogerson (General DP blog)
http://sqlserverfaq.com/tonyrogerson (MS DP blog)
3. Group discussion – I can only discuss from
what I’ve seen myself over the past few years
and recent while looking for work
4. What’s a Data Platform?
Define the traditional Database Administrator
◦ Logical and Physical Modelling
◦ Data Governance
◦ HADR
The importance of a play area
The expanding skillset
◦ Beyond Relational – alternative Databases
◦ Polyglot Database Environment
◦ The Distributed Database and understanding CAP
◦ Alternate architectures - LAMBDA
◦ ETL
◦ Business Intelligence, Data Science, Data Platform Engineer
◦ What else? Audience please….
5.
6. Types
Structured
Un-structured
Semi-structured
Applications
Fat client, Web
Intranet, Mobile
Storage
Database Type
SQL
NoSQL
NewSQL
Business Intelligence
Standard Reporting from
standard process metrics
from the Data Warehouse/
Reporting database
Business Analytics
Investigative Reporting
over past data.
Management Science
Data Science
Investigative {Data
Analytics, Business
Analytics}
over structured, semi,
unstructured data for
possible patterns – use of
Machine Learning and
Pattern Matching
algorithms.
Data Creators,
Data Contributors,
Data Consumers
7. Business
Intelligence
SSRS, Crystal,
Business Objects,
PowerPivot, Excel,
QlikView, Tableau,
Reporting apps….
Types
Structured – Normal Form, JSON, XML
Un-structured – {developers think all data is like this }
Semi-structured – JSON, XML, Key/Value Pair
Applications
C#, F#, Java etc.
[Data sourcing]
Storage
Database Type
SQL – Oracle, DB2, Sybase, SQL Server, MySQL etc.
NoSQL – CouchDB, Raven, Cassandra, Hadoop, MongoDB, Neo4j
NewSQL – Postgres-XL, Postgres-XC, Volt-DB, NuoDB
Business
Analytics
SAS, SPSS,
Statistica, MatLab
etc..
Data Science
BI + BA + ‘R’, Pyphon,
Machine Learning
packages, SQL, MapR,
Data Extraction, ML,
Visualisations, Story
Boarding
SQL, MapR, U-SQL..Data Creators,
Data Contributors,
Data Consumers
8. SSIS
◦ pull RSS feed and store in SQL Server
◦ ODATA source example
Azure File Share
◦ Storing archive data
10. Data is an Asset – Security Guard
Data Custodian – Compliance, ???
Liaison between Business and Devs
Liaison between Business and Infrastructure
What else?
11. Custodian of the Business Taxonomy
◦ Data Dictionary
Logical / Physical
◦ Normal Form
◦ Logical Model (relationships) V Physical Model
(vender dependent schema)
Relational V Dimensional
◦ Entity Relationship modelling (tables and
relationships between)
◦ Dimensional Modelling (facts and dimensions) –
models to usability and performance
12. ICO Principals
Data Protection Laws – Security, Retention
Your responsibilities – vary within the Org
13. High Availability
◦ Understanding Latency
◦ Mirroring
◦ Availability Groups
◦ Log Shipping (?)
Disaster Recovery
◦ Practiced Procedures
◦ DR Resource misalignment
◦ Implementing contingency
◦ Dealing with Data corruption or Accidents (if I only
have AG’s – what’s the issue?)
14. Applying Database releases
◦ Which Databases? SQL / NoSQL etc.
Supportability (level of reqd knowledge)
Patching Servers
15.
16. You protect the Integrity and Availability of
the “Database Platform”
Not limited to SQL Server
◦ NoSQL products
◦ Relational “SQL” products
◦ NewSQL
18. Align with your company
◦ Talk to developers, see what they are using, take a
lead with Data Technology – nurture their use of
Data.
◦ Data is an Asset, without data your company won’t
exist – make your company realise your importance
and you need to be right up there in the decision
making for technology direction
Align with the industry
◦ Job boards, trends
Be one (ok – a couple of) steps ahead!
19. You can’t play in live!
Decent laptop – 16GiB+ RAM, SSD / M2 Flash
VirtualBox
◦ Multiple Windows Server, build a domain, build a
cluster etc.
◦ Multiple Linux
◦ Etc.
20. Beyond Relational – alternative
Databases
Polyglot Database Environment
The Distributed Database and
CAP
LAMBDA
ETL
MDM
Cloud
21.
22. Business environment is “Polyglot”
Require understanding of
◦ NoSQL
◦ CAP Theorem
◦ LAMBDA (edge case)
◦ Big Data – what it really is
◦ CEP (is this a Database related tech?)
◦ ETL
◦ Data Science – what it really is
◦ BI
◦ Kimball, Inmon
◦ Data Vault
23. Really means – No NF
Key Value Stores (Riak, CouchDB)
Column (Cassandra)
Document (MongoDB)
Graph (Neo4J)
Object (Bit niche )
Ironically – most have a SQL like interface
now or in development!
24. Consistency
◦ All nodes show the same value
◦ Eventual Consistency
Availability
◦ Node will return data
Partition Tolerance
◦ Islands form when network fails – clients connect to
local nodes so when isolated you lose consistency.
You can only have two of the 3 and never all
three.
25. 1 2
3 4
5 6
Insert
Update
Delete
DatCtr A
Insert
Update
Delete
DatCtr B
Insert
Update
Delete
DatCtr C
26. No – it’s not just Hadoop
Velocity, Variety, Volume
BD can be done in anything.
◦ Velocity – CEP, In-Memory, distributed computing
◦ Variety – varied types of data, structured / un.
◦ Volume – size of the data
BD is not definitive – depends on your
budget, ability etc.
27. Processing a data stream in flight
Window over the stream and determine
trends
Read the stream rather than poll the database
28. If you aren’t using Machine Learning / Data
Mining algol’s you aren’t doing Data Science
If you know what you are looking for – you
aren’t doing DS.
DS isn’t just R, you can do DS in numerous
tools, R has a large library of packages to use
against your data
DS is where you are looking for patterns in
your data and trying to understand them to
then formulate standard process flows to take
advantage.
29. Scale out – distributed – data processing
architecture
Batch, Speed, Service layers
For low latency, high updates
Robust
30. Kimball
◦ Dimensional modelling with star schema
◦ Dimensions and Facts
◦ Bottom up – data marts to EDW
◦ Aspires to Single Version of the Truth
Inmon
◦ Normal Form
◦ Can also use star schema
◦ Form the EDW and then use data marts
◦ Stronger approach to Single Version of the Truth
31. Modelling method
Pull all your uncleansed data and store it in
one place
Buffer between Operational Databases and
the Conformed Data Warehouse
32. Are you really on the Cloud or just managed
remotely located server environment?
Real cloud has immediate elasticity, hides
infrastructure, easy to spawn up new
resource and near immediate.
Market’d cloud is really managed servers – no
immediate elasticity, servers are provisioned
and that takes time.
33. True cloud offers elasticity for Distributed
Database capabilities – proper scale out.
◦ Azure Elastic Database (Sharding)
◦ SQL 2016 Stretch Feature
Remember CAP? Yep – you need to understand
that.
On-Prem tends to be scale up, single box –
single database
Cloud – some of your tasks will disappear
because it’s done for you. But your role is a Data
Centric role and not Infrastructure Centric.
Notes de l'éditeur
20:00 – 21:00 Tony Rogerson - SQL Server Data Platform specialist” who used to be known as “Database Administrator"
The year was 1995 and I was a SQL Developer/Database Administrator designing schema, writing and optimising SQL, managing log shipping and backups. The year is now 2016 and that relatively small skill set has exploded dramatically with ETL (SSIS plus some C#), MDM, Business Intelligence (Kimball, Inmon, Lambda, hybrid), Data Science (Statistics, Business Skills, R, F#, HDInsight, Hadoop), Cloud (AWS, Azure, Thirdparty on/off prem), Data Governance (ICO principles/rules, Security, International DP rules).
In this session we will look at today’s SQL Server Data Platform specialists, you know who they are because even though you are still called “DBA” you are actually one of them!
We will cover off introductions with demos into the following technology areas: ETL, BI, DS and Azure with examples on using them within a Data Platform setting.