Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Rajeev kumar apache_spark & scala developer

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 6 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Rajeev kumar apache_spark & scala developer

  1. 1.  zingy.rajeev@gmail.com  +31645503407 Rajeev Kumar Apache Spark and Scala Developer Amsterdam, NL 8+ years experienced & result-oriented ETL Developer , Apache Spark & Scala,Java, possessing a proven track record in software development using Cloudera Hadoop, Apache Spark, Scala,Java and ETL tools, Proficient in processing structured/unstructured data & deploying Apache Spark to analyse huge datasets, identify patterns & gain valuable insights. Demonstrated capability in accomplishing project life cycle management of client server & web applications. Adept at exporting & importing data using Hadoop clusters & implementing in-depth knowledge of web server using Java/J2EE. Highly skilled at data management & capacity planning for end-to-end data management & performance optimization. Expertise in Data analysis, Data Integration, Data Modelling, Data Warehousing, Holds expertise in various ETL and Reporting tools. - Expertise in Teradata, DB2, SQL Server, SQL, PL/SQLa and Big Data domain. - Automated various processes through UNIX - Domain Expertise : Banking, Financial Services, Insurance and Health Care PROFESSIONAL EXPERIENCE Data Engineer  Oct '13 - Present Infosys Ltd. Amsterdam, NL Infosys Limited is an Indian multinational corporation that provides business consulting, information technology and outsourcing services. photo_camera Expertise with the tools in Hadoop Ecosystem including Hive, HDFS, MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper. Implemented spark using scala and spark sql for faster testing and processing of data. Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, OLAP, BI, Client/Server applications. Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle. Strong experience in writing applications using scala swing core libraries. Knowledge on MongoDB No SQL database data modelling, tuning and backup. Strong experience in Dimensional Modeling using Star and Snow Flake Schema, Identifying Facts and Dimensions, Physical and logical data modeling using ERwin and ER-Studio. Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data. Used Scala libraries to process XML data that we store in HDFS. Load the data in Spark RDD and do in memory data computation and generate output response. Involved in converting map reduce program to spark transformation using spark RDD and scala. Developed spark script using REPl and Scala Eclipse IDE. Used Sqoop to extract data from RDBMS databases. Involved in moving log files from generated from different data source to HDFS for further processing through flume. Data sanitation using INFORMATICA Test data management tool and CA Fast data masking tool. Expertise in working with relational databases such as Oracle 11g/10g/9i/8x, SQL Server 2008/2005, DB2 8.0/7.0, UDB, MS Access and Teradata, Hadoop,PrgSql. Strong experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses and Data Marts using Informatica Power Center (Repository Manager, Designer, Workflow Manager, Workflow
  2. 2. Analytics & Performance Enhancement Leadership & Team Management Data Management & Reporting Key Achievements Sr. Software Engineer  Feb '10 - Oct '13 Syntel ltd Pune, IN Syntel, Inc. is a U.S.-based multinational provider of integrated technology and business services. Key Achievements Monitor, Metadata Manger), Power Exchange, Power Connect as ETL tool on Oracle, DB2 and SQL Server Databases. Tool creation for test data on demand for test data mining for testing. Test data setup to copy data from production environment to test environment after data sensitization. Using Jenkins continuously improve the test automation code. Wrote automated load unload job creation for data sanitation for mainframe ZOs. Experience in all phases of Data warehouse development from requirements gathering for the data warehouse to develop the code, Unit Testing and Documenting. Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting. Proficient in the Integration of various data sources with multiple relational databases like Oracle11g /Oracle10g/9i, MS SQL Server, DB2, Teradata, VSAM files and Flat Files into the staging area, ODS, Data Warehouse and Data Mart. Experience in using Automation Scheduling tools like Autosys and Control-M. Worked extensively with slowly changing dimensions. Hands-on experience across all stages of Software Development Life Cycle (SDLC) including business requirement analysis, data mapping, build, unit testing, systems integration and user acceptance testing. Excellent interpersonal and communication skills, and is experienced in working with senior level managers, business people and developers across multiple disciplines. Deployed Scala shell commands to develop Spark scripts in accordance with client requirements Enhanced performance of Spark applications for fixing accurate batch interval time and memory tuning Utilised memory computing of Spark via Scala to perform advanced procedures like text analytics & processing Working with team of 7 Data Scientists & Software Engineers to develop scalable distributed data solutions via Hadoop Coordinated with multiple internal departments to evaluate existing Predictive Models & enhance efficiency Liaised with stakeholders to perform data manipulation, transformation, hypothesis testing & predictive modelling Created Scala/SQL codes to extract data from multiple databases & conceptualised ideas for Advanced Data Analytics Utilised Spark MLLIB Libraries to design recommendation engines & implement data processing & retrieval initiatives Deployed Scala to generate complex client-specific reports, render data analysis & create statistical models Initiated automation & directed performance optimisation to boost traffic by 38% and advertising revenue by 16% Selected out of 3000+ employees to receive the Star Performer of the Year Award '16 for extraordinary performance Hands on experience in Informatica powercenter tool for data integration. Supervised data file objects & sizing, monitored database usage/growth and strategised & executed standby database Maintained logins, DB yield, table space development, user profiles/indexes, storage parameters, etc. Analyzed Log files & conducted Root Cause Analysis to diagnose & resolve 150+ issues Achieved a reduction in data processing time by 78% by designing automated solutions
  3. 3. INTERNSHIPS Summer Intern  Jan '09 - Jan '10 S Tel Mumbai, IN S Tel Private Limited was a GSM based cellular operator in India. EDUCATION B.Tech - Electronic and Tele communication  Apr '04 - Jul '08 GH RAI SONI COLLEGE OF ENGINEERING Nagpur, IN PROJECTS PROJECT 1: ABN AMRO Bank Risk data Aggregation project | Amsterdam, '18 till today Client: ABN AMRO Bank Brief: ABN AMRO Bank Risk data Aggregation project successfully demonstrated the use of big data technologies to implement complete, end to end data warehouse and business intelligence operation and usage of EDH(Enterprise data hub) as centralized,unified data source. This is an implementation of existing data ware house application based on Oracle-Db2 into hadoop,enterprise data hub. Apart from EDH development, this project also demonstrate the replacement of existing ETL and reporting operation with Hadoop ETL and reporting tools. Environment: HDFS (for storage), Spark SQL (for transformation), Spark MLlib (for ML), Zeppelin (for visualization) PROJECT 2: On-Demand Test Environments and Data for Development and QA | Amsterdam, '17_18 Client: ABN AMRO Bank Brief: To reduce the bank cost on test environment infrastructure, ABN Amro came up with the project of ondemand creation of test environment using docker hub images. With the just single click database instance of Db2 , Oracle , Mysql database would be created in unix containers. The containers can be started to start the services and destroyed based on need basis Environment: Windows ,Docker hub, Nexus repository , Docker images Worked as a Web Developer & Mobile Application Intern to develop web pages by using scripting languages CGPA: 7 / 10 Overcame challenges of storing & processing structured/semi-structured data via Hadoop Framework & Apache Spark Transferred data into HDFS & deployed Scala to analyze voting patterns across multiple sources and channels Converted semi-structured XML data into a structured format to enable further processing using Apache spark Delivered the output into RDBMS via Sqoop & achieved real-time processing of the website on Python-based server Designed Ooze workflow to automate the entire process of pulling the data from core tables to build aggregate tables that help the business user in decision making. Validating the data between source and destination. Involved in development of an ingestion framework (ELF) to extract data from SQL Server and ingest the source data into Hadoop DataLake using Sqoop. Data mining from json,xml, and csv files using spark. Pulled the docker images from docker hub. Started/Stop the docker containers. Build a Java swing application to automate the docker commands. Exported the right data to containers for development and testing.
  4. 4. PROJECT 3: Teradata migration| Amsterdam, '17_18 Client: ABN AMRO Bank Brief: As a part of this project report build in Teradata were migrated from Teradata to Hadoop, as few heavy tables were moved in hive. This includes migration of Teradata procedure to hive queries, along with implementation as well as validation of same in Hadoop platfrom and giving warranty support to the deployed report. Environment: HDFS (for storage), Spark SQL (for transformation), Sqoop, Mainframe,Oracle, Db2, Teradata PROJECT 4: Data Mart Creation| RBS UK, '15_16 Client: Royal Bank of Scotland Brief: Design a system to replay the real time Data mart creation of transactions in various up and down stream systems. Environment: Mainframe,Teradata, Informatica power center, Talend Data integration tool PROJECT 5: : Integrated Date Warehousing (IDW) | Bengaluru, '15 Client: : Gander Mountain | Bengaluru, '15 Brief: Build java swing application to automate the data mining from various data sources using JDBC url based on various user test conditions. Created data pipeline for CI/CD pipeline integration Overcame challenges of storing & processing structured data via Cloudera Hadoop Framework & Apache Spark Understanding of existing stored procedure in teradata and re engineerthat in Hive and Oozie. Transferred data into HDFS & deployed Scala for consistency check across dataframes. Involved in development of an ingestion framework (ELF) to extract data from Teradata and ingest the source data into Hadoop MiniBank using Sqoop. Designed Ooze workflow to automate the entire process of pulling the data from core tables Validating the data between source and destination. Understanding the Business requirements based on Functional specification to design the ETL methodology in technical specifications. Developed data conversion/quality/cleansing rules and executed data cleansing activities such as data Consolidation, standardization, matching Trillium for the unstructured flat file data Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Power Center 8.5 Experience in integration of heterogeneous data sources like Oracle, DB2, SQL Server and Flat Files (Fixed & delimited) into Staging Area. Designed and developed mappings using Source Qualifier, Expression, Lookup, Router, Aggregator, Filter, Sequence Generator, Stored Procedure, Update Strategy, joiner and Rank transformations. Managed the Metadata associated with the ETL processes used to populate the Data Warehouse Implemented complex business rules in Informatica Power Center by creating re-usable transformations, and robust Mapplets. Worked with Functional team to make sure required data has been extracted and loaded and performed the Unit Testing and fixed the errors to meet the requirements. Copied/Exported/Imported the mappings/sessions/ worklets /workflows from development to Test Repository and promoted to Production. Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values Used PMCMD command to automate the Power Center sessions and workflows through UNIX. Consolidating all data marts into an Integrated Date Warehouse (IDW) to reduce costs & increase efficiency Deployed Informatica PowerCenter 10.X as the ETL tool for implementation of ETL processes in the project Overcame challenges w.r.t multiple brand acquisition like silo functioning, data redundancy, multiple data marts, etc. Replaced multiple reporting tools, resolved integration & delivery issues & created a common 'enterprise' data model Resolved issues w.r.t data quality & departmental coordination to reduce cost of maintaining multiple data marts Reduced cost of training multiple teams & deployed a single tool for data analysis & systems integration Minimised data replication across multiple data marts & maintained a single true source of extracting data
  5. 5. Environment: Informatica power center,Oracle 11C, Mysql,Db2 V10 PROJECT 6: Android Application development| Bangalore, '13 Client: R N Technologies. Brief: Building App on life and teaching of great saint Swami Vivekakanada Environment: Eclipse Andriod IDE, Andriod operating system PROJECT 7: TDM AND UAT BATCH EXECUTION | Bangalore, '10-13 Client: HUMANA , USA. Environment:JCL, COBOL, DB2, CICS, FILE-AID ,INFOPAC ,VSAM, Rexx, Easytrieve, INFOMATICA, ILM TDM TOOL,JAVA,TOAD Brief: This project involves development of new processes, enhancement to the existing process and operations support using ILM TDM Tool. This project involve data sub setting, data masking and data mining for various platform of Humana health care (Eg. Claims, Authorization, CI-Medicare etc) based on release cycle. Analyzed Business Requirements. Integration Testing. KEY SKILLS • Data Processing • Big Data Analytics • Apache Spark Framework • Scala Programming • AWS • Installation, Configuration & Testing • Product Design & Development • Systems Architecture Support • Client Relationship Managemen •DUTCH langauage A1 level • Project Management • Quality Assurance • Research, Reporting & Documentation • Leadership & Team Management • Strategy • Software Development • Service-oriented Architecture TECHNICAL SKILLS Resolved issues pertaining to multiple vendor database, ETL & reporting platforms Integrated the acquired business data into heterogeneous platforms to consolidate all the data marts into an IDW Created Andriod application bases on life and teaching of Swami Vivekakanada Designed and developed functional specifications and detailed design for new processes/programs. Performed Coding, Unit, Integration and support for User Acceptance testing Designed and developed Test Cases, Test Scripts and Test Plans for base line applications and document the test results. Written and executed oracale, DB2 Queries to extract data from tables. Developed various audit reports in COBOL and Cobol-DB2 used by clients to make decisions critical to the smooth migration of legacy data to APS. Wrote complex SQLs to accomplish various business logics. Executed batch jobs and tested the applications for Developed Batching monitoring tool for batch cycle monitoring. Developing new programs with COBOL, DB2, CICS, MQ-Series and maintaining the existing programs for volume testing. Involved in Abend solving and bug fixing and provided technical support for the team. Designing workflows in informatica ETL and Masking test data using ILM tool. Big Data Ecosystem: Hadoop, Hive, Flume, MapReduce, Apache Spark, Oozie, Kafka, AWS Spark Framework: Spark RDD's, SparkSQL, Spark Streaming, Spark Mlib, Apache Kafka & Architecture ETL Tools : Informatica power center, Talend data integration , CA Test data manager, Informatica Test data masking,Oracle developer Languages: Java, Scala, Java,HTML, C, JavaScript, Cobol, JCl, Rexx
  6. 6. TRAINING & CERTIFICATIONS Software: Apache, Nginx Database/Server: Tomcat, MySQL, MongoDB,DB2,Oracle 12g Apache Spark and Scala Certification Training | Edureka | '18 DUTCH langauage A1 level Certified DB2 Admin | Java Developer- Professional | ‘16

×