Personal Information
Entreprise/Lieu de travail
Bengaluru Area, India, 10 India
Profession
Senior Software Developer at IBM Analytics
Secteur d’activité
Technology / Software / Internet
À propos
I am working at IBM-ISL in Analytics group. I am involved in designing and development of solutions for the problems involving huge amount of data. Currently working on Spark and related technologies to build next generation analytic platform. I am a result-oriented engineer with 3 years of experience in building products using Java and Big Data technologies like Spark, Scala, Hadoop, PIG, Hive, HBase, Impala, Oozie and Apache Solr.
Software Skills
Big Data Technologies : Spark, Scala, Hadoop, Map-Reduce, YARN, HDFS, Solr, Hive, Impala, Pig, Shark, CDH, Oozie, HBase, Phoenix, Zookeeper
Programming Languages : C, C++, Core Java
Middleware Technologies : Java, Spring Framework, JAXB, hibe...
Mots-clés
apache spark
spark
hadoop
data analytics
big data
big data analytics
scala
data science
mapreduce
data mining
machine learning
generating physical plan
catalyst optimizer
plan optimization & execution
rdd recap
comparison with pig and hive pipeline
dataframes operations
architecture of spark sql
extensions
data cleansing
dataframes
spark sql library
diagram for logical plan container
definition of a dataframes api
code generation
catalyst analyzer
dataframes features
big data university
streaming
streaming applications
twitter
opensource
spark streaming
fault tolerance
architecture
apache spark introduction
resilient distributed dataset
rdd basics
rdd deep dive
rdd
Tout plus
Présentations
(6)J’aime
(14)Migrating to Spark 2.0 - Part 2
datamantra
•
il y a 6 ans
Migrating to spark 2.0
datamantra
•
il y a 6 ans
Running Zeppelin in Enterprise
DataWorks Summit
•
il y a 6 ans
Introduction to Kubernetes
rajdeep
•
il y a 9 ans
Getting Started with Alluxio + Spark + S3
Alluxio, Inc.
•
il y a 7 ans
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
•
il y a 7 ans
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
•
il y a 8 ans
Comparison of various streaming technologies
Sachin Aggarwal
•
il y a 8 ans
Interactive Analytics using Apache Spark
Sachin Aggarwal
•
il y a 8 ans
Apache Spark Streaming: Architecture and Fault Tolerance
Sachin Aggarwal
•
il y a 8 ans
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
•
il y a 8 ans
kafka
Ariel Moskovich
•
il y a 8 ans
Tuning and Debugging in Apache Spark
Patrick Wendell
•
il y a 9 ans
Hive tuning
Michael Zhang
•
il y a 10 ans
Personal Information
Entreprise/Lieu de travail
Bengaluru Area, India, 10 India
Profession
Senior Software Developer at IBM Analytics
Secteur d’activité
Technology / Software / Internet
À propos
I am working at IBM-ISL in Analytics group. I am involved in designing and development of solutions for the problems involving huge amount of data. Currently working on Spark and related technologies to build next generation analytic platform. I am a result-oriented engineer with 3 years of experience in building products using Java and Big Data technologies like Spark, Scala, Hadoop, PIG, Hive, HBase, Impala, Oozie and Apache Solr.
Software Skills
Big Data Technologies : Spark, Scala, Hadoop, Map-Reduce, YARN, HDFS, Solr, Hive, Impala, Pig, Shark, CDH, Oozie, HBase, Phoenix, Zookeeper
Programming Languages : C, C++, Core Java
Middleware Technologies : Java, Spring Framework, JAXB, hibe...
Mots-clés
apache spark
spark
hadoop
data analytics
big data
big data analytics
scala
data science
mapreduce
data mining
machine learning
generating physical plan
catalyst optimizer
plan optimization & execution
rdd recap
comparison with pig and hive pipeline
dataframes operations
architecture of spark sql
extensions
data cleansing
dataframes
spark sql library
diagram for logical plan container
definition of a dataframes api
code generation
catalyst analyzer
dataframes features
big data university
streaming
streaming applications
twitter
opensource
spark streaming
fault tolerance
architecture
apache spark introduction
resilient distributed dataset
rdd basics
rdd deep dive
rdd
Tout plus