This presentation is an overview guide to help us define a process or data pipeline, to load data from Mixpanel into Amazon Redshift for further analysis.
We will see how to:
- access and extract data from Mixpanel through its API
- how to load it into Redshift
This is not a full solution as it will require to writing the code to get the data and make sure that this process will run every time new data are generated.
10. Use Mixpanel’s Export API
https://mixpanel.com/docs/api-documentation/data-
export-api
Access it with:
CURL
Postman
Apache HttpClient for Java
Spray-client for Scala
Hyper for Rust
Ruby rest-client
Python http-client
11. Use Mixpanel’s Export API
https://mixpanel.com/docs/api-documentation/data-
export-api
Or Use Mixpanel’s Libraries /SDKs
Python
PHP
Ruby
Javascript
12. Mixpanel API Resources
Annotations
annotations - list the annotations for a specified date
range.
create - create an annotation
update - update an annotation
delete - delete an annotation
Export
export - get a "raw dump" of tracked events over a time
period
13. Mixpanel API Resources
Events
events - get total, unique, or average data for a set of
events over a time period
top - get the top events from the last day
names - get the top event names for a time period
Event Properties
properties - get total, unique, or average data from a
single event property
top - get the top properties for an event
values - get the top values for a single event property
14. Mixpanel API Resources
Funnels
funnels - get data for a set of funnels over a time period
list - get a list of the names of all the funnels
Segmentation
segmentation - get data for an event, segmented and
filtered by properties over a time period
numeric - get numeric data, divided up into buckets for an
event segmented and filtered by properties over a time
period
sum - get the sum of a segment's values per time unit
average - get the average of a segment's values per time
unit
Segmentation Expressions - a detailed overview of what a
segmentation expression consists of
15. Mixpanel API Resources
Retention
retention - get data about how often people are coming back
(cohort analysis)
addiction - get data about how frequently people are
performing events
People Analytics
engage - get data from People Analytics Let’s assume that
we want to export our raw data from Mixpanel. To do so
we’ll need to execute requests to the export endpoint.
16. Mixpanel API Resources
Let’s assume that we want to export our raw data from Mixpanel.
We’ll need to execute requests to the export endpoint.
Eg “a request that would get us back raw events from Mixapanel”
18. Prepare Mixpanel Data for Amazon
Redshift
• Follow Amazon Redshift Data Model
• Map into tables and columns
• Adhere to the datatypes that are supported by
Redshift*
• Have in mind the best practices that Amazon has
published regarding the design of a Redshift database.
Amazon Redshift is built around industry-standard SQL
with added functionality to manage very large datasets
and high performance analysis.
* As your data are probably coming in a representation like JSON that supports a
much smaller range of data types you have to be really careful about what data
you feed into Redshift and make sure that you have mapped your types
22. Amazon S3
2. Create a bucket
Execute an HTTP PUT on the Amazon AWS REST API endpoints
for S3. (Use: CURL or Postman or use the
libraries provided by Amazon)*
* You can find more information by reading the API reference for the Bucket
operations on Amazon AWS documentation.
23. Amazon S3
3. Start sending your data to Amazon S3
Use the same AWS REST API
Use the endpoints for Object operations
25. Amazon Kinesis Firehose
1. Create a delivery stream
2. Add data to the stream
* Whenever you add new data to the stream, Kinesis takes care of adding
these data to S3 or Redshift. Going through S3 in this case is redundant if
your goal is to move your data to Redshift.
Amazon Kinesis Firehose offers a real time streaming approach into
data importing
Use the same AWS REST API
Push by using a Kinesis Agent.
26. Load data into Redshift #1
INSERT
1. Connect to Amazon Redshift instance with
your client, (JDBC or ODBC)
2. Perform an INSERT command for your data.
for more information you can check the INSERT examples page on the
Amazon Redshift documentation.
27. Load data into Redshift #2
COPY
For more examples on how to invoke a COPY command you can check
the COPY examples page on Amazon Redshift documentation.
1. Connect to Amazon Redshift instance with
your client, (JDBC or ODBC)
2. Perform an COPY command for your data.