Robin_Hadoop

Robin David
Mobile: +919952447654 email: robinhadoop@gmail.com
Objective:
Intend to build a career with leading corporate of hi-tech environment with committed and
dedicated people. Willing to work as a key player in challenging and creative environment
A position in which my technical abilities, education and past work experience will be utilized
to best benefit the organization.
Experience:
 Around 9 years of total experience in IT with 3+ years of Experience in Hadoop
 Professional experience of 4 months with Polaris, as Senior consultant mainly focusing
on building Reference Data Management (RDM) solution within Hadoop
 Professional experience of 18 months with iGATE Global Solutions, as Technical lead
mainly focusing on Data Processing with Hadoop, Developing IV3
engine (iGATE’S
Proprietary Big Data Platform) and Hadoop cluster Administration
 Professional experience of 5.7 Years with Cindrel Info Tech, as Technology Specialist
mainly focusing on ADS, LDAP, IIS, FTP, DW/BI and Hadoop
 Professional experience of 11 months with Purple Info Tech, as Software Engineer
mainly focusing on Routers and Switches
Achievements in Hadoop
• Having experience to build own plug-in with web user interface for
o HDFS data encryption/decryption
o Column based data masking on Hive
o Hive Benchmarking
o Sqoop automation for data ingestion on HDFS
o HDD space issues alert automation on Hadoop cluster environments
• Integrated Revolution R with cloudera distribution
• Integrated Informatica - 9.5 and HDFS for Data processing in Apache and cloudera
Cluster

Professional Summary
• Expertise in creating data lake environment in HDFS within secure manner
• Having experience to pull the data from RDBMS and various staging area to HDFS
using Sqoop, Flume and Shell scripts
• Providing solution and technical architecture to migrate existing DWH to Hadoop
platform
• Using the data Integration tool Pentaho for designing ETL jobs in the process of
building Data lake
• Having experience to build and processing data within the Datastax Cassandra cluster
• Having good experience to handle data within advance hive
• Experienced in Installation, Configuration and Management Pivotal, Cloudera,
Hortonworks and Apache Hadoop Clusters
• Designed and built IV3
IGATE proprietary Big Data platform
• Good experience in Hadoop Distributed File System [HDFS] management
• Having experience in shell scripting for various HDFS operations
• Installation, Configuration and Management on EMC tools (gemfireXD, Green plum
DB, and HAWQ)
• Configure Hadoop clusters on amazon cloud
• Recovery, Hadoop cluster, and name node or data node failures
• End-to-end performance tuning of Hadoop clusters and Hadoop Map/Reduce routines
against very large data sets
• Kerberos implementation with Hadoop clusters
• Configuring HA on Hadoop clusters
• Integrate Splunk server to HDFS
• Installing Hadoop cluster Monitoring tools (Pivotal Command Center, Ambari,
Cloudera manager and Ganglia)
• Cluster health monitoring and fixing performance issues
• Data balancing on cluster and Commissioning/Decommissioning data nodes in
existing Hadoop cluster
• Having experience on HDFS create directory structure/permission for project specific
needs and Set access permission for groups and users as required for project specific
needs
• Understand/Analyze specific job (project) and the run process
• Understand/Analyze the scripts, map reduce codes, and input/output files/data for
operations support
• Build an archiving platform in Hadoop environment
• Good exposure on understanding on Hadoop services and quick problem resolution
skills
• Having good experience in ADS, LDAP, DNS, DHCP, IIS, GPO, User Administration,
Patch Maintenance, SSH, SUDO ,Configuring RPM through YUM, FTP and NFS
Technical Skills
Operating System RHEL 5.x/6.x, Centos 5.x/6.x, UBUNTU, Windows servers and
Client family
Hardware Dell , IBM and HP

Database MySQL, Postgresql, MSSQL and ORACLE
Tools Command Center, Check_MK, Ambari, Ganglia, Cloudera Manager
and GIT LAB.
Languages Shell Scripting and Core Java
Cloud Computing
Framework
AWS
Hadoop Ecosystem Hadoop, ZooKeeper, Pig, Hive, Sqoop, Flume, Hue and Spark.
Certifications:
 Microsoft Certified IT Professional (MCITP - 2008)
 Microsoft Certified Professional (MCP-2003)
 CCNA
Educational Qualifications:
Bachelor of Science (Computer)
St. Joseph’s College, Bharthidasan University - Trichy
Major Assignments:
Project1
Market Reference Data Management is a pure play reference data management solution built
using big data platform and RDBMS to be used in the securities market industry. As part of this
project data is collected from different market data vendors such as Reuters, interactive data etc.
for different types of asset classes such as equity, fixed income and derivatives. The entire
solution is built using pentaho as the ETL tool and the final tables are stored in Hive. The
different downstream application will access data from hive as and when required.
• Responsible for provide an architecture plan to implement entire RDM solution
• Understanding the securities master data model
• Create an API for getting data from various sources.
• Create Hive data models
• Design ETL jobs with pentaho for Data cleansing, Data identification and Load the data
to Hive tables
• Create a Hive (HQL) scripts to process the data
(12 – 2015) To (till date)
Project: Reference Data Management
Domain Solution Building in Finance Domain
Environment: CDH 5.0
Role: Senior Consultant

Project 2
GE purchases parts from different vendor across the world within all the business units. There
is no central repository to monitor the vendors across the business units. Parts purchased were
charged on different scale within the vendor and its subsidiary and other vendors. To identify
the purchase price difference and to build a master list of vendors, GE software COE and
IGATE together designed a data lake on Pivotal Hadoop consisting of all PO (Purchase order)
And invoice data imported from multiple SAP/ERP sources. The data in the data lake is
cleansed, integrated with DNB to build a master list of vendors and the data is analyzed to
identify anomaly behavior in PO.
Job Responsibilities:
• Monitoring POD and IPS Hadoop clusters - Each environment has Sandbox,
Development and Production division.
• Having experience in CHECK_MK tool for monitoring Hadoop cluster environment.
• Provide the solution for all Hadoop ecosystem and EMC tools.
• Having experience in working together with EMC support.
• Shell scripting for various Hadoop operations
• User creation and quota allocation on Hadoop cluster and GPDB environment.
• Talend Support
• Having experience in GIT-LAB tool
• Provide the solution for performance issues in Green plum DB environment
• Bring back failure segments to active in GPDB environment
Project 3
Retail Omni channel Solution leverages Cross Channel Analytics (Web, Mobile & Store) along
with Bluetooth LE technology in the store to deliver superior customer experience. The target
messages / personalized promotions are delivered at the right time (Moment of Truth) to
maximize sales conversions.
• Create a Hive SQL Script for Data Processing and Data merging from multiple tables
(03 – 2015) To (09 - 2015)
Project: Data Lake Support
Client GE - Software COE
Environment: Pivotal and EMC Tools
Role: Technical Lead
(01 – 2014) To (06 - 2014)
Project: Retail Omni channel Solution
Client Retail Giant in US
Environment: Cloudera Distribution CDH (4X)

• Loading data to HDFS
• Export Data from HDFS to RDBMS (MySQL) using Sqoop
• Create a script for IROCS automation
Project 4
The principle motivation for IV3
is to provide a turnkey Big Data platform that abstracts the
complexities of technology implementation and frees up bandwidth to focus on creating
differentiated business value. IV3 is software based big data analytics platform designed to
work with enterprise class Hadoop distributions providing an open architecture and big data
specific software engineering processes. IV3
is power-packed with components and enablers
covering the life cycle of Big Data implementation starting from Data Ingestion, storage &
transformation to various analytical models. It aims to marshal the three Vs of Big Data
(Volume x Velocity x Variety) to deliver the maximum business impact.
• Implement data ingestion (RDBMS to HDFS) in IV3
Platform
• Testing IV3
Tools on different Hadoop Distribution
• Configure auto Yarn-Memory Calculator on IV3
Platform
• HDFS data encryption/decryption
• Column based data masking on Hive
• Hive Benchmarking
• Sqoop automation for data ingestion on HDFS
• Create a automation script for detecting HDD space issues on Hadoop cluster
environments
Project 5
(06- 2014) – (09-2014)
Project: Predictive Fleet Maintenance
Client Penske
Environment: Cloudera (CDH 4.6) - Hive
The business requirement of Penske is about collecting data from repository of un tapped data –
Vehicle Diagnostics, Maintenance & Repair and this data potentially be leveraged to generate
economic value. Penske want to create a future ready Big Data platform to efficiently store,
process and analyze the data in consonance with their strategic initiatives. Penske engaged
IGATE to partner with them in this strategic initiative to tap insights hidden in -Diagnosis,
Maintenance & Repair data. IGATE would be leveraging its state of the art Big Data
Engineering lab to implement the data engineering and data science part of this project.
(12 - 2013) To (09 - 2015)
Project: IV3
(Proprietary Big Data Platform)
Client IGATE
Environment: CDH, HDP and Pivotal

• Understand project scope, business requirement and current business processes.
• Map business requirements with use cases.
• Implemented use cases using with Hive.
Project 6
(01 – 2012) To (11-2013)
Project: WA-Insights and Analytics
Client Watenmal Group – Global
Environment: Apache Hadoop - HIVE, Map Reduce, Sqoop
Role: Technology Specialist
WAIA is intended to support all retail business segments involved in sale of goods and
supporting services. WAIA Retail store integrated data model addresses 3 major aspects of a
store business. (1) The physical flow and control of merchandise into, through and out of the
store. (2)The selling process where the products and services offered for sale are transformed
into tender and sales revenue is recognized. (3)The control and tracking of tender from the
point of sale where it is received through its deposit into a bank or other depository
• Understanding Hadoop main components and architecture
• Data migration between RDBMS to HDFS by using Sqoop
• Understanding nuance map/reduce program in UDF
• Data merging and optimization in hive
• Adding new data nodes in existing Hadoop cluster
• Safely decommissioning failure data nodes
• Monitoring Hadoop cluster with ganglia tool
Project 7:
(5 – 2008) to (12 – 2011)
Project: eyeTprofit and Maginss
Client SI and Arjun Chemicals
Environment: Windows – sql server, .Net Framework and LDAP
Role: Technology Specialist
EyeTprofit and Magniss enables businesses to easily analyze profitability, budget versus actual,
revenue, inventory, and cash requirements etc instantaneously especially when the information
is spread across multiple applications. EyeTprofit and Magniss is a non-invasive reporting
system that facilitates information from multiple functions to be culled out and presented in an
appropriate form to enable informed decisions.

• LDAP integration with BI Tool
• Administering Active directory, DHCP and DNS Servers
• Managing group policies
• Distributed file system management (DFS)
• Administering FTP servers and IIS servers
• Patch Maintenance
• Administering File Shares, Disk Quotas
• Providing access to the share drive users
• Remote Support
• Configure virtual machines
Project 8:
(5 – 2007) to (04 - 2008)
Project: TRMS
Client TTP
Environment: Windows – Storage server , Router and Layer 3 Switches
Role: Software Engineer
A State of the art Traffic Management system, the first of its kind in India. It helps regulate and
enforce law with the efficiency, expediency and accuracy of technology.
• Expertise in handling Wireless Communication
• Maintaining with hand-held computer devices
• Expertise in handling storage server
• Responsibility of designing the topology of network in the client places implementing
the infrastructure with high security and with hierarchy
• Handling the issues on the Network related problems on the client places, mainly
debugged issues that oriented on Network peripherals

Robin_Hadoop

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (14)

Similaire à Robin_Hadoop

Similaire à Robin_Hadoop (20)

Robin_Hadoop