DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Shrilesh kathe 2017
1. Shrilesh Kathe
Advanced Analytics
Education
Bachelor of Engineering in Electronics and Telecommunication from
Ramrao Adik Institute of Technology, Mumbai University.
Technical skills
Apache Spark:
Good understanding of the parallel processing architecture of Spark.
Distributing the dataset across the nodes to leverage on distributed
computing using spark RDD, Data Frame etc.
Designing the linage graph for optimal execution of the transformations.
Comfortable developing the driver program using Python, Java and
Scala.
Building Dstreams from live streaming data and 24/7 applications using
Spark Streaming.
Java:
In depth knowledge of core Java programming.
Very good understanding of OOPs concepts like polymorphism,
inheritance, multithreading, exception handling, etc.
Comfortable with Eclipse JDK tool kit.
Analytics skills:
Prepared linear regression models from various datasets for prediction
and inference in R.
Checking model accuracy in prediction by taking into account bias,
variance and mean squared error (MSE).
Applied nPath function on a given weblog to recognize various patterns
in the data and visualized the data using sankey chart in Teradata App
Center.
Certification
IBM certified developer Apache Spark 1.6
Oracle Certified Professional Java SE 6 Programmer.
Page: 1
2. Project:
WEST BI NIKE CWA Implementation.
Role : Offshore ETL developer.
Client : Nike.
Period : From Oct-2016 till date.
Industry : Web Analytics.
Project Name : WEST BI NIKE CWA Implementation
Project Type : Development.
Tools : SPARK SQL ,Python, HIVE
Responsible for developing Generic ETL solution to capture web traffic data
for different report suits in SPARK environment.
Writing pyspark application to parse raw data, populate staging tables and
store the refined data in partitioned tables using hive .
Migration of current CWA solution to Apache Spark using PySpark.
Development of Dynamic SQL for facilitating XML parsing and creation of
logical functions in Python.
Maintaining and updating useful documents for the project.
Quality analysis and timely testing for the loaded data.
Process orchestration and general development/troubleshooting on the
solution.
Remote Terminal Unit For LPG Filling Plant, HPCL, Chembur.(B.E Final year
Project)
The objective of the project was to connect the LPG refinery and the filling
plant at HPCL Chembur.
Remote Terminal Unit interfaces the setup and monitors the fluid flow and
records the volume, temperature, density, etc. of the fluid flowing in the
pipelines.
This Remote Terminal Unit is developed using PLCs and SCADA. The RTU
interfaces various components used at the plant like Servo Gauge, ROVs,
MOVs, etc. to a centralized computer using an OFC(Optical Fiber Cable).
Page: 2
3. Training
Participated in TCS CodeVita 2015, an Online Coding competition and
cleared the first round. It consisted of 5 questions where we had to build
algorithms to tackle real life problems. The coding was done in Java.
Attended a week’s training on Teradata Aster. Learned about Aster
architecture of queen and worker nodes. Studied different SQL-MR functions
like nPath, sessionize, nGram, etc.
Completed 'Programming for Everybody', an online course in PYTHON
offered by University Of Michigan on COURSERA with Distinction.
Completed ‘An Introduction to Interactive Programming in Python’ an online
course offered by RICE university on event driven programming in python.
Developed various interactive games like Tic Tac Toe, Rock Paper Scissors
Lizard Spock, Paddle, etc.
Page: 3