Sachin Aggarwal

184 Abonné

I am working at IBM-ISL in Analytics group. I am involved in designing and development of solutions for the problems involving huge amount of data. Currently working on Spark and related technologies to build next generation analytic platform. I am a result-oriented engineer with 3 years of experience in building products using Java and Big Data technologies like Spark, Scala, Hadoop, PIG, Hive, HBase, Impala, Oozie and Apache Solr. Software Skills Big Data Technologies : Spark, Scala, Hadoop, Map-Reduce, YARN, HDFS, Solr, Hive, Impala, Pig, Shark, CDH, Oozie, HBase, Phoenix, Zookeeper Programming Languages : C, C++, Core Java Middleware Technologies : Java, Spring Framework, JAXB, hibe...

apache spark spark hadoop data analytics big data big data analytics scala data science mapreduce data mining machine learning generating physical plan catalyst optimizer plan optimization & execution rdd recap comparison with pig and hive pipeline dataframes operations architecture of spark sql extensions data cleansing dataframes spark sql library diagram for logical plan container definition of a dataframes api code generation catalyst analyzer dataframes features big data university streaming streaming applications twitter opensource spark streaming fault tolerance architecture apache spark introduction resilient distributed dataset rdd basics rdd deep dive rdd

Tout plus

Activité
À propos

Sachin Aggarwal

Présentations

Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive

Apache Spark Streaming: Architecture and Fault Tolerance

Interactive Analytics using Apache Spark

Comparison of various streaming technologies

Data Science with Spark by Saeed Aghabozorgi

Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer

J’aime

Migrating to Spark 2.0 - Part 2

Migrating to spark 2.0

Running Zeppelin in Enterprise

Introduction to Kubernetes

Getting Started with Alluxio + Spark + S3

Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer

Taking Spark Streaming to the Next Level with Datasets and DataFrames

Comparison of various streaming technologies

Interactive Analytics using Apache Spark

Apache Spark Streaming: Architecture and Fault Tolerance

Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive

kafka

Tuning and Debugging in Apache Spark

Hive tuning