Data analytics at a petabyte scale final

•Télécharger en tant que PPTX, PDF•

0 j'aime•504 vues

Or Koren, Head of Data @ ironSource presents how re-designed it's big data analytics using Presto to analyze at petabyte scale,.

Technologie

Data analytics at a PB scale
200 researchers using just Presto and a data-lake
Or Koren | Head of Data @ ironSource

About Me ● Married ● 35 ● Tel Aviv ● Coding

Agenda:
ironSource overview
The past: 2016-2017
The present: 2018-2019
The future: 2020-2021

ESTABLISHE
D
ACQUISITIONS TO
DATE
EMPLOYEES
San Francisco
United States
New York
United States
London
United Kingdom
Berlin
Germany
Kiev
Ukraine
Tel Aviv
Israel
Bangalore
India
Hong Kong
China
Tokyo
Japan
Seoul
South Korea
Beijing
Shanghai
Shenzhen
China
53
7
11
1
39
5
3
1
5
624
30
ironSource Overview
ESTABLISHED
SEP. 2010
ACQUISITIONS TO DATE
8
EMPLOYEES
779
R&D EMPLOYEES
395

ironSource Solutions & Products
Developer Solutions
In-app advertising network and
mediation platform for app developers
PRODUCTS & PLATFORMS
ironSource Mediation
In-App Advertising Network
PRODUCTS & PLATFORMS
Enterprise Solutions
Engagement platform for Carriers & OEMs
PRODUCTS & PLATFORMS
ironSource Aura
PRODUCTS & PLATFORMS
Digital Solutions
Software delivery platform and B2C
security products
PRODUCTS & PLATFORMS
Delivery & Ad Monetization Platform
Security products
PRODUCTS & PLATFORMS
*
*In advanced negotiations

Our old data architecture
● 10 redshift clusters
● 5 RDS clusters
● 1000+ ETLs
● 1 Tableau
● Hard to scale
● Hard to maintain
● Hard to work
● Limited data
● Expensive

To the future
● Lifetime data
● Fast SQL
● Easy scale
● Data science
● Open source

Data Lake
Parquet
● Files based
● Open Source
● Column oriented
● S3 bucket

Hive
Apache Hive is a data warehouse software project
which was built on top of Apache Hadoop in order
to provide data query and analysis.
● One place to rule them all
● Hadoop Ecosystem
● Presto
● Spark
● Athena
Data Lake

Presto & Qubole
Qubole delivers a Self-Service Platform for Big Data
Analytics built on Amazon Web Services, Microsoft
and Google Clouds.
Scalable Clusters
Qubole configuration scales clusters up and down by
looking over the execution plan of the queries.
Spots
Maintenance & Versions
Qubole takes care of new versions & 24/7 support
For Every Query

Our Volume
500TB
Daily scan (from S3)
70K
Daily queries over
Presto 200
Users
500
Dashboards

● 10 redshift clusters
● 5 RDS clusters
● 1000+ ETLs
● 1 Tableau
● Hard to scale
● Hard to maintain
● Hard to work
● Limited data
● Expensive
Our data architecture
● 1 redshift cluster
● 0 RDS clusters
● 300 ETLs
● 1 Tableau & 1 Re/dash
● Reduce costs by 50%
● Agile to the business
+
Our new data architecture

● Replace 90% of our ETLs to ELTs
● Help our data science team by being more clear
on the logic, reducing their work time by 80%
● Keeping raw data without any manipulation
Reduce ML Model deployment time by 50%
● No ETL time - no schedule
The New ETL
is ELT
Extract,
Load,
Transform.

Presto Connectors
Kafka
Real-time alerts over presto
ScyllaDB
Increase our insights with our ML
models
Elasticsearch
Join business KPIs with R&D logs

Key notes to take home
Data-Lake Keep all your raw data in one place.
It will help you in the future with costs, research, reduce resources and ML models
Qubole Enjoy the benefits of 3rd party companies and continue to work on your business
Scale Reach endless data with big clusters that scale per query
ELT Move 90% of your ETLs to ELTs, to reduce lags and costs
Agile Promote your business with quick insights
Free to Learn Take 10% of your time and learn!
Try and play with the data :)

Thank You
Or Koren
or@ironsrc.com
Linkedin: korenor

Recommandé

Quix presto ide, presto summit ILOri Reshef

Presto summit israel 2019-04Ori Reshef

Presto for apps deck varada prestoconfOri Reshef

Presto Summit 2018 - 08 - FINRAkbajda

Presto talk @ Global AI conference 2018 Bostonkbajda

The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...Databricks

An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...Databricks

Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Databricks

Recommandé

Quix presto ide, presto summit ILOri Reshef

Presto summit israel 2019-04Ori Reshef

Presto for apps deck varada prestoconfOri Reshef

Presto Summit 2018 - 08 - FINRAkbajda

Presto talk @ Global AI conference 2018 Bostonkbajda

The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...Databricks

An Update on Scaling Data Science Applications with SparkR in 2018 with Heiko...Databricks

Scale and Optimize Data Engineering Pipelines with Software Engineering Best ...Databricks

Detecting Mobile Malware with Apache Spark with David PryceDatabricks

Big Data Meets Learning Science: Keynote by Al EssaSpark Summit

Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichDatabricks

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit

Find your dataOliver Busse

2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)Albert Wong

Presto Summit 2018 - 03 - Starburst CBOkbajda

Elastic Stack roadmap deep diveElasticsearch

Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Databricks

How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiDatabricks

A (XPages) developers guide to CloudantFrank van der Linden

Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks

Bridging the Completeness of Big Data on DatabricksDatabricks

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks

How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks

Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...Databricks

Spark - Migration Story Roman Chukh

How R Developers Can Build and Share Data and AI Applications that Scale with...Databricks

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn

Spline 2 - Vision and Architecture OverviewVaclav Kosar

AWS vs Azure vs Google (GCP) - SlidesTobyWilman

Customer migration to azure sql database from on-premises SQL, for a SaaS app...George Walters

Contenu connexe

Tendances

Detecting Mobile Malware with Apache Spark with David PryceDatabricks

Big Data Meets Learning Science: Keynote by Al EssaSpark Summit

Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichDatabricks

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli Spark Summit

Find your dataOliver Busse

2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)Albert Wong

Presto Summit 2018 - 03 - Starburst CBOkbajda

Elastic Stack roadmap deep diveElasticsearch

Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Databricks

How Apache Spark Changed the Way We Hire People with Tomasz MagdanskiDatabricks

A (XPages) developers guide to CloudantFrank van der Linden

Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks

Bridging the Completeness of Big Data on DatabricksDatabricks

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks

How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks

Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...Databricks

Spark - Migration Story Roman Chukh

How R Developers Can Build and Share Data and AI Applications that Scale with...Databricks

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn

Spline 2 - Vision and Architecture OverviewVaclav Kosar

Tendances (20)

Detecting Mobile Malware with Apache Spark with David Pryce

Big Data Meets Learning Science: Keynote by Al Essa

Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich

Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli

Find your data

2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)

Presto Summit 2018 - 03 - Starburst CBO

Elastic Stack roadmap deep dive

Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...

How Apache Spark Changed the Way We Hire People with Tomasz Magdanski

A (XPages) developers guide to Cloudant

Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...

Bridging the Completeness of Big Data on Databricks

Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...

How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...

Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...

Spark - Migration Story

How R Developers Can Build and Share Data and AI Applications that Scale with...

Dirty Data? Clean it up! - Rocky Mountain DataCon 2016

Spline 2 - Vision and Architecture Overview

Similaire à Data analytics at a petabyte scale final

AWS vs Azure vs Google (GCP) - SlidesTobyWilman

Customer migration to azure sql database from on-premises SQL, for a SaaS app...George Walters

Platform Requirements for CI/CD Success—and the Enterprises Leading the WayVMware Tanzu

Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo

Microsoft SQL Server 2016 - Everything Built InDavid J Rosenthal

Zakir_Hussain_cvzakir hussain

Data Culture Series - Keynote - 3rd DecJonathan Woodward

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media

Architecting an Open Source AI Platform 2018 editionDavid Talby

Customer migration to Azure SQL database, December 2019George Walters

Bluegranite AA Webinar FINAL 28JUN16Andy Lathrop

RedisGraph A Low Latency Graph DB: Pieter CailliauRedis Labs

Data Amp South Africa - SQL Server 2017Travis Wright

Digital transformation with microsoft data and ai MichaelRoenker

Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon

TestGuild and QuerySurge Presentation -DevOps for Data TestingRTTS

Sql 2017 net rafMaximiliano Accotto

Sql 2016 2017 fullMaximiliano Accotto

Bringing your data to life using Power BI - SPS London 2016Chirag Patel

Modern Business Intelligence and Advanced AnalyticsCollective Intelligence Inc.

Similaire à Data analytics at a petabyte scale final (20)

AWS vs Azure vs Google (GCP) - Slides

Customer migration to azure sql database from on-premises SQL, for a SaaS app...

Platform Requirements for CI/CD Success—and the Enterprises Leading the Way

Accelerate Self-Service Analytics with Data Virtualization and Visualization

Microsoft SQL Server 2016 - Everything Built In

Zakir_Hussain_cv

Data Culture Series - Keynote - 3rd Dec

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...

Architecting an Open Source AI Platform 2018 edition

Customer migration to Azure SQL database, December 2019

Bluegranite AA Webinar FINAL 28JUN16

RedisGraph A Low Latency Graph DB: Pieter Cailliau

Data Amp South Africa - SQL Server 2017

Digital transformation with microsoft data and ai

Turn Data Into Actionable Insights - StampedeCon 2016

TestGuild and QuerySurge Presentation -DevOps for Data Testing

Sql 2017 net raf

Sql 2016 2017 full

Bringing your data to life using Power BI - SPS London 2016

Modern Business Intelligence and Advanced Analytics

Dernier

A Call to Action for Generative AI in 2024Results

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Real Time Object Detection Using Open CVKhem

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Dernier (20)

A Call to Action for Generative AI in 2024

CNv6 Instructor Chapter 6 Quality of Service

Driving Behavioral Change for Information Management through Data-Driven Gree...

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

08448380779 Call Girls In Friends Colony Women Seeking Men

GenCyber Cyber Security Day Presentation

Powerful Google developer tools for immediate impact! (2023-24 C)

Finology Group – Insurtech Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

What Are The Drone Anti-jamming Systems Technology?

🐬 The future of MySQL is Postgres 🐘

Breaking the Kubernetes Kill Chain: Host Path Mount

Handwritten Text Recognition for manuscripts and early printed texts

Axa Assurance Maroc - Insurer Innovation Award 2024

[2024]Digital Global Overview Report 2024 Meltwater.pdf

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Real Time Object Detection Using Open CV

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Data analytics at a petabyte scale final

1. Data analytics at a PB scale 200 researchers using just Presto and a data-lake Or Koren | Head of Data @ ironSource

2. About Me ● Married ● 35 ● Tel Aviv ● Coding

3. Agenda: ironSource overview The past: 2016-2017 The present: 2018-2019 The future: 2020-2021

4. ironSource overview

5. ESTABLISHE D ACQUISITIONS TO DATE EMPLOYEES San Francisco United States New York United States London United Kingdom Berlin Germany Kiev Ukraine Tel Aviv Israel Bangalore India Hong Kong China Tokyo Japan Seoul South Korea Beijing Shanghai Shenzhen China 53 7 11 1 39 5 3 1 5 624 30 ironSource Overview ESTABLISHED SEP. 2010 ACQUISITIONS TO DATE 8 EMPLOYEES 779 R&D EMPLOYEES 395

6. ironSource Solutions & Products Developer Solutions In-app advertising network and mediation platform for app developers PRODUCTS & PLATFORMS ironSource Mediation In-App Advertising Network PRODUCTS & PLATFORMS Enterprise Solutions Engagement platform for Carriers & OEMs PRODUCTS & PLATFORMS ironSource Aura PRODUCTS & PLATFORMS Digital Solutions Software delivery platform and B2C security products PRODUCTS & PLATFORMS Delivery & Ad Monetization Platform Security products PRODUCTS & PLATFORMS * *In advanced negotiations

7. The past

8. Our old data architecture ● 10 redshift clusters ● 5 RDS clusters ● 1000+ ETLs ● 1 Tableau ● Hard to scale ● Hard to maintain ● Hard to work ● Limited data ● Expensive

9. Our data scientist…

10. To the future ● Lifetime data ● Fast SQL ● Easy scale ● Data science ● Open source

11. The present

12. Data Lake Parquet ● Files based ● Open Source ● Column oriented ● S3 bucket

13. Hive Apache Hive is a data warehouse software project which was built on top of Apache Hadoop in order to provide data query and analysis. ● One place to rule them all ● Hadoop Ecosystem ● Presto ● Spark ● Athena Data Lake

14. Presto & Qubole Qubole delivers a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Scalable Clusters Qubole configuration scales clusters up and down by looking over the execution plan of the queries. Spots Maintenance & Versions Qubole takes care of new versions & 24/7 support For Every Query

15. Auto scaling demo Presto UI

16. Our Volume 500TB Daily scan (from S3) 70K Daily queries over Presto 200 Users 500 Dashboards

17. Our data scientist …

18. ● 10 redshift clusters ● 5 RDS clusters ● 1000+ ETLs ● 1 Tableau ● Hard to scale ● Hard to maintain ● Hard to work ● Limited data ● Expensive Our data architecture ● 1 redshift cluster ● 0 RDS clusters ● 300 ETLs ● 1 Tableau & 1 Re/dash ● Reduce costs by 50% ● Agile to the business + Our new data architecture

19. The future

20. ● Replace 90% of our ETLs to ELTs ● Help our data science team by being more clear on the logic, reducing their work time by 80% ● Keeping raw data without any manipulation Reduce ML Model deployment time by 50% ● No ETL time - no schedule The New ETL is ELT Extract, Load, Transform.

21. Presto Connectors Kafka Real-time alerts over presto ScyllaDB Increase our insights with our ML models Elasticsearch Join business KPIs with R&D logs

22. Key notes to take home Data-Lake Keep all your raw data in one place. It will help you in the future with costs, research, reduce resources and ML models Qubole Enjoy the benefits of 3rd party companies and continue to work on your business Scale Reach endless data with big clusters that scale per query ELT Move 90% of your ETLs to ELTs, to reduce lags and costs Agile Promote your business with quick insights Free to Learn Take 10% of your time and learn! Try and play with the data :)

23. Thank You Or Koren or@ironsrc.com Linkedin: korenor

Notes de l'éditeur

Good morning everyone!!! I am Or and I will show you today how we use presto and datalake at a PB scale
Before we start i want to show you a bit about myself and my team, so this picture was taken at last Purim CUSTOME party here next by, at hangar 11 (eleven). For those have that noticed, i heart my knee 4 weeks ago Skiing in Val-Morel, france So i had to sit in the sun, drink and relax… I am: Married, 35, leave in Tel-Aviv. And Coding has been my life, since i was eleven….
I will show you a bita about ironSource. Then I will take you to a journey of time since 2016 (before Presto) until today (with presto) and what we are going to use in the future.
ironSource was created 10 years ago. We are almost 800 employees & more than 50% of us are R&D. Our headquarters & R&D center is located in Tel-Aviv & we have 9 more offices around the world.
ironSource has few different business divisions: Developer solutions - This division focuses on providing tools and technology to mobile app developers - specifically game developers. We offer an SDK which essentially enables the developer to run ads in his app to make more money. We are very strong with rewarded video - so if any of you are gamers, you may be familiar with the moment in a game when you run out of lives and you are offered a rewarded video to watch in order to continue playing. That’s an example of what we do Enterprise solutions - Focusing on helping mobile device manufacturers and mobile carriers to engage with their customers. Instead of having 20 different applications pre-installed on your device, users have the power to set up their device the way they want to, with the apps they really want and need. Digital solutions - This is my division, we are focusing on the desktop world, (Mac, PC). We help software developers with technologies that help monetize their software and distribute it to new users.
Lets have a look back on our -- AR-KI-TECH-TURE We had 10 different Redshift clusters One for BI One for Researchers One for R&D One for Data science One for Realtime data One for Historic Data One for DWH One for QA One for Critical ETLs One for Backups As you can imagine, it was really hard to work with. We had 5 RDS clusters - Mainly for our Applications (Like OLAP) We had more than 1000 ETLs... We had 1 Tableau Server And it was really hard. Hard to scale - Redshift scale very slow - from few hours till days... Hard to maintain - We had Vacum Tables, Delete old data, move data from one redshift to another. Hard to work - From two aspects: Not all the tables where on the same cluser. 30% of our Clusters power, went only for the insertion of the data. Limited data - We could not insert all the data into one cluster. Very Expensive
This is how our data scientist looked like at the time. Or Even like that.
So, We stop and thought where we want to be in the future. First of all, we wanted lifetime data, which is very important to our business Fast SQL - We wanted SQL that is fast enough for our dashboard usage We wanted the ability to scale very fast We focused on our data science team, as we know we are going to increase our data science team and ML models Open source - we did not wanted to be attached to a certain company
So we started to create our Data lake, we choose to use parquet files, which is open source and column oriented. We keep all of our data in S3 as we convert the data from json into parquet in batch operations on near-realtime.
Hive & Hive MetaStore - we have one source of truth for our table definition. Which works perfect with any Hadoop Ecosystem Such as: Presto Spark Athena And more.
Presto and Qubole. We use presto to query our data-lake via Qubole. Qubole is self service platform that enables us to configure presto clusters that easily scales, uses SPOTS, and they take care of Maintenance, new Versions & 24h support. Once you configured your cluster, it can increase itself, from 3 to 50 nodes for example within seconds… And that is being done for every query you do
Let’s see an example of Auto scaling over presto. I run around 50 different dashboards that uses presto and saved Presto UI snapshot every few seconds. As you can see, at the start there are 3 nodes and 4 queries. And as i run the dashboards, the number of nodes is increasing as the number of queries. After all queries are finished, the cluster is decreasing back to normal.
A bit about our volume. We have around 70 thousands queries running via Presto every Day. We have 200 users 500 dashboards and increasing And half of Peta-Byte scan per day, just from S3, without the caching of Presto.
Remember our data scientist? Well, I think this is the best picture to explain how he feels.
Lets see how our -- AR-KI-TECH-TURE ---- looks like today We eliminate 9 of our Redshift’s, kept only one for Finance/DWH. We eliminate all our RDS’s - all the data is stored in the data-lake We reduce 70% of our ETL’s - as we don’t need to move data from one place to another. We have add to our Tableau Server, a Re/Dash server.Re/Dash is an open source BI tool, we use Re/Dash for the short terms solutions and Tableau to the long term solutions By adding the Data-Lake, all of our problems disappeared! In addition, we have reduced our costs by 50%! And most important, we became much more agile to the business, instead of having first insights for a new project in 2 to 8 weeks, we are giving the first insights in the first day OR even the first hour!
What we expect to use more in the future.
First of all. ELT. The new ETL is ELT. If you don’t know what is ELT: Extract, Load, Transform. It means you need to create your Business logic in a big query (OR VIEW) We are going to reduce around 90% of our ETL’s and move them to ELT. Why ELT? Data science. We see that the ELT reduce the Data science work by 80%!The main reason is that they can create a dataset within minutes. By cloning an ELT of specific business unit And add more features. 2. Deployment - ELT helps Data engineers deploy the ML model, since all the RAW Data is in one place, and the model was created upon this data and not aggregated data. 3. No lag - NO scheduler - you become more realtime.
We are going to increase our usage in Presto Connectors. Kafka, we are going to change our alerts system ( for business KPI’s) from Data-lake to kafka.To ensure faster findings on real-time! Scylla DB - increase our insights into Scylla-DB for our ML models. ElasticSearch - We use ElasticSearch via Kibana to monitor server logs and r&d logs, we see strong needs to be able to join those logs with business KPI’s
A few notes to take home Data-Lake - Keep all your data in one place, it will save you time, effort & money. Qubole - Use Big Data services like Qubole to be able to focus on your business and not on the maintenance Scale - Presto Scales just works perfect ELT - don’t do ETL’s, you don’t need them anymore. With Presto, you can be much more agile to your business Free to LearnAs i always encourage my team TO DO, take 10% of your time, learn & play with the data.