1) Migrating your on-prem #Enterprise #Data #Warehouse into the #Cloud? Here is what you need to learn (and unlearn) when designing a modern Cloud #DataWarehouse in #BigQuery!
2) Launching a #Startup? See how to supercharge your idea with #Firebase!
Watch the recording at https://youtu.be/zezhXNqD0rs and more forward-looking talks on #Cloud #Architectures & #DataEngineering join http://ServerlessToronto.org User Group.
4. Introducing C2C
The Independent Google Cloud Community
We’re here to unite Google
Cloud customers across the
globe.
C2Cglobal.com
Connections
Customer-to-customer conversations,
events, forums, and other outlets to
connect with peers and experts.
Events and Education
Customer stories, presentations, blogs,
and points of view on hot topics, best
practices, and the latest Google Cloud
news.
Exclusive Access
Sessions and conversations with Google
Cloud experts and executives to learn
from the best and share your feedback
to help shape what’s next.
6. What You Can Expect:
Connect
● Community platform to share resources, discuss ideas, and provide advice on issues and ongoing projects
● Live Member Discussions to share experiences, discuss best practices, and find inspiration from other thought leaders
and experts
● Regional Connect Events for peer-to- peer sharing and network-building.
Learn
● On-demand videos, blogs, and resources to provide a launchpad of aggregated expertise from customers, partners and
GC.
● Cohort-based learning programs to build subject matter expertise and GCP literacy across the community.
Shape
● Best practices through the shared expertise of communities of practice.
● Trusted resources collections vetted by customers.
● Product feedback delivered with a unified customer voice to shape the future of cloud.
Join: c2cglobal.com
Questions: info@c2cglobal.com
Follow: @meetC2C
7. Agenda
☑ 4:00pm - 4:15pm Connect & Network
☑ 4:15pm - 5:00pm Dan Sullivan “How to Design a Modern Data Warehouse in
BigQuery, or Why I Needed to Forget Everything I Learned in Data Modeling
School”
☑ 5:00pm - 5:45pm Kudz Murefu “Small Teams, Big Things with Firebase &
GCP Serverless Services”
☑ 5:45pm - 6:00pm WIN cool PRIZES from our sponsors! Closing Comments &
Networking
All time is GMT.
8. How to Design a Modern Data Warehouse in BigQuery
or
Why I Needed to Forget Everything I Learned in Data
Modeling School
Author of the official Google Cloud study guides for the
Professional Architect, Professional Data Engineer, and Associate Cloud Engineer
Dan Sullivan
PEAK6 Technologies
Cloud Architect and Data Scientist
https://www.dansullivanlearning.com/
9. How to Design a Modern
Data Warehouse in
BigQuery
...or why I needed to forget everything I learned in data
modeling school
11. Datastore Options
➤ Relational
➢ Highly structured and transactional
➢ Difficult to scale
➤ NoSQL
➢ Semi-structured, eventual consistency, scalable
➤ Analytical
➢ Structured, scalable, not transactional
12. Data Warehouse (early 2000s)
➤ Few servers
➤ Tightly coupled storage and
compute
➤ Scale vertically
➤ Built on same relational database
management systems used for
OLTP
13. BigQuery
➤ Serverless data warehouse
➤ Petabyte scale
➤ Uses SQL but is not a relational database
➤ Analytical database
➤ Other features
➢ BigQuery ML
➢ BigQuery BI Engine
➢ BigQuery GIS
16. Dremel
➤ Multi-tenant cluster
➤ SQL queries to execution trees
➢ Leaves are called slots; read data and perform computation
➢ Inner nodes perform aggregation
➤ Dynamically allocate slots to queries
➤ Maintains fairness
➤ Single user cloud get 1,000s of slots
18. Colossus
➤ Distributed storage system
➤ Handles replication and recovery
➤ No need to managed storage
https://en.wikipedia.org/wiki/Google_File_System#/media/File:GoogleFileSystemGFS.svg
19. Jupiter & Borg
➤ Jupiter
➢ Google networking switch
➢ Petibit scale
➢ Storage to compute communication
➢ No need for rack awareness
➤ Borg
➢ Predecessor of Kubernetes
➢ Manages mixers and slots
https://medium.com/@jerub/the-production-environment-at-google-8a1a
aece3767
https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p183.pdf
20. Capacitor
➤ Columnar storage format
➤ Supports semi-structured data
➢ Nested structures
➢ Repeated fields
➤ No need to read parent column to produce a
nested structure attribute value
➤ Compression
22. If you remember anything
from this talk ...
➤ Design for scanning in parallel
➤ Partition to minimize amount of data scanned
➤ Cluster to further reduce the amount of data scanned
➤ Joins may require shuffling data across slots so ...
➤ Denormalize using nested and repeated fields
24. Partitioned Tables
➤ Table is divided into segments called partitions
➤ Improves query performance
➤ Lowers cost by reducing amount of data scanned
25. Partition by Ingestion Time
➤ Loads data into daily, date-based partitions
➤ Automatically creates new partitions
➤ Uses ingestion time to determine partition
➤ Create pseudo-column _PARTITIONTIME
➢ Date-based timestamp
➢ Used in queries to limit the number of partitions scanned
26. Date/Timestamp Partitioning
➤ Partition based on date or timestamp column
➤ Each partition holds one day of data
➤ No need for _PARTITIONTIME
➤ Special partitions
➢ _NULL_ when nulls in partition column
➢ _UNPARTITION_ when values in column outside allowed range
27. Integer Range Partition
➤ Partition column must be an integer type
➤ Partition column cannot be repeated
➤ Cannot use Legacy SQL to query partitioned tables
28. Sharding vs. Partitioning
➤ Sharding
➢ Use separate table for each day
➢ [TABLE_NAME_PREFIX]_YYMMDD
➢ Use UNION in queries to scan multiple tables
➤ Partitioning is preferred over sharding
➢ Less metadata to maintain
➢ Less permission checking overhead
➢ Better performance
29. Requiring Partition Filter
➤ Require_partitioning_filter parameter
➤ Specified at table level (formerly at partition level)
➤ Requires a WHERE clause with the partition column
31. Clustered Tables
➤ Data sorted based on values in one or more columns
➤ Can improve performance of aggregate queries
➤ Can reduce scanning when cluster columns used in WHERE clause
➤ Used with partitioned tables
32. Automatic Reclustering
➤ As new data is added to a table, data may
be stored out of order
➤ BigQuery automatically re-clusters in the
background
36. One more time … if you remember
anything from this talk ...
➤ Design for scanning in parallel
➤ Partition to minimize amount of data scanned
➤ Cluster to further reduce the amount of data scanned
➤ Joins may require shuffling data across slots so ...
➤ Denormalize using nested and repeated fields to avoid needing joins
37. Small Teams, Big Things
with Firebase & GCP Serverless Services
Kudz Murefu
Founder Strma Music
https://Strma.io
39. ➔ Strma is a streaming app for african music
➔ Our journey started in 2017 whilst a business student
➔ Mission was to create a simple way to deliver Afro-music over the web
➔ We launched on Wordpress as a simple blog, off we went!
Birth of the Idea
40. Prevailing Challenges
➔ Heavy reliance on Plugins
➔ Very slow page loads
➔ Limited File storage for songs
➔ Expensive Hosting
The exodus from Wordpress
41. What to use for my backend
➔ Database?
➔ Hosting?
➔ Backend Jobs?
+
42. A miracle from heaven
Firebase
Authentication
Realtime Database
Functions
Hosting
Storage
43. Realtime Database
➔ Simple NoSQL Database
➔ Can be accessed from the web or through your codebase
➔ Easily interact with the Database Tree
➔ No need to setup a server
45. Realtime Synching
➔ Allows for real time updates with no extra configuration
➔ Changes are broadcasted to all clients
➔ Just subscribe with to database with 3 lines of code
47. Firebase Storage
➔ Built on top of Google Cloud Storage
➔ Same technology powering Spotify and Google photos
➔ Robust uploads and downloads
➔ Use with drag & drop interface or using codebase
50. Firebase Hosting
➔ Easily deploy your website to a global CDN
➔ Comes with versioning and ability to rollback
➔ SSL certificates are built in
➔ Free tier 10gb or PayAsYouGo plan
51. Cloud Functions
➔ Easily trigger code to do some task through http
➔ Code is simple and in javascript & typescript
➔ Use with Database to trigger when data changes
➔ Use with Storage on file upload
➔ Can schedule to run periodically
53. Bringing it altogether
➔ Firebase is an all in one solution
➔ Simple but robust enough to go from ZERO to HERO
➔ Allows to focus more on business instead of Infrastructure
Authentication
Realtime Database
Functions
Hosting
Storage
54. Growth, growth, growth...
➔ 5000 weekly users on the website, and growing
➔ Just launched our Android app
➔ We plan to grow the platform to 1 million+ users
➔ And our team is growing
55. ➔ Needed a way to gradually introduce updates
➔ Canary like deployments
➔ e.g. Release a Beta feature to 15% of traffic
➔ Easily validate performance before releasing to 100%
traffic.
➔ CI/CD for remote developers
From firebase hosting to Cloud Run
Staging
Deploy
Deploy
Firebase Hosting
Cloud Run
Before Now
Production
60. Raffle time!
We have a lot of prizes from our amazing sponsors.
Let’s raffle them off!
Raffle Drawing
https://wheelofnames.com/
Prizes:
1. Dan Sullivan Google Cloud Associate Cloud Engineer Certification
Practice Exam ($50 value each) to all attendees.
2. C2C The Independent Google Cloud Community offers 5 hoodies.
3. O’Reilly 5 Books & 30 days full access to library ($50 value each).
4. ROI Training 4 On Demand Google Cloud Certification training:
ACE/PCE ($500 value each).
5. Jetbrains offers 3 free annual Personal subscriptions ($249 value
each).
61. Uniting people from every corner of the Google Cloud
universe to connect, learn, and shape the future of the cloud.
Connect with Google Cloud Professionals on the C2C Community Platform
Your one-stop shop for engaging with other members, staying on top of upcoming events, browsing articles
and videos, and so much more. The structure and navigation reflects our three main community focuses:
connect, learn, and shape.
Connect: Join a group (we've got plenty for you to choose from) and start engaging in real time with other
members. New groups starting for Germany and the UK and Ireland!
Learn: Think of this section as a library for C2C content. Each of our top focus areas has a dedicated
collection of articles, videos, and content from our community and events.
Shape: Help shape the future of C2C by sharing your expertise, ideas, and by requesting topics
you want us to cover with our C2C events and content.
Join by Monday for a chance
to win a C2C hoodie!
Create your account at c2cglobal.com
Select C2C-Sponsored Event as your referral