1. KARANJEET SINGH
2656 Ellendale Place, Apt #2, Los Angeles, CA 90007 | (213) 675-9583 | karanjes@usc.edu
GitHub: https://github.com/karanjeets
EDUCATION
M.S. Computer Science, University of Southern California, USA GPA: 3.63, Expected May 2017
BTech Information Technology, Punjab Technical University, India May 2011
TECHNICAL SKILLS
Programming Languages Java, J2EE, Scala, Python
Data Engineering (Apache) [Spark, Solr, Tika, Kafka], (Cloudera) [HDFS, Map Reduce]
Natural Language Processing Models (Naive Bayes, MEM, HMM), Chart Parsing, BLEU (MT)
Additional Knowledge Linux (Certified), SQL Server, Spring-MVC, Hibernate, JAXWS, Jersey, D3js
OPEN SOURCE CONTRIBUTION
Recommender System for Social Services NGO Jan 2017 – Present
§ Leading a team of 5 and coordinating with the organization to define goals and manage expectations.
§ Created a Collaborative Filtering algorithm with 50% accuracy over 6,000 referrals and now improving it.
Sparkler [https://github.com/USCDataScience/sparkler] July 2016 – Present
§ Incubated an efficient and flexible web crawler that scales horizontally on Apache Spark.
§ Developed a fair crawl pipeline which outputs to Solr via customized RDD providing near real-time analytics.
§ Added support for OSGi based plugin system using Maven and Apache Felix.
§
Distributed Release Audit Tool (DRAT) Sep 2015 – Feb 2016
§ A distributed, parallelized wrapper around Apache RAT to audit for open source software licenses.
§ Scaled DRAT to run on all Apache projects using Wrangler supercomputer having clusters of Hadoop nodes.
PROFESSIONAL EXPERIENCE
Research Intern, NASA Jet Propulsion Laboratory, USA June 2016 – Present
§ Developing an automated system that helps users explore and better understand the domain on the Web.
§ The system is capable of discovering 42% of human annotated documents (Ground Truth).
§ Created Spark jobs to extract information from 16 million documents under 20 hours using a 2 node cluster.
Research Assistant, University of Southern California, USA Dec 2015 – May 2016
§ Developed dark web search applications for DARPA Memex using Java, Python, and Hadoop.
§ Indexed, Visualized and Analyzed weapons dataset pilot to unearth illegal activities in the US.
§ Used Apache Nutch and Solr for crawling and indexing the data respectively.
Software Engineer, Computer Sciences Corporation, India July 2011 – June 2015
§ Developed web applications for U.S. payments technology firm using Spring and Hibernate.
§ Analyzed, coordinated, estimated the functional requirements and prepared technical design documents.
§ Worked on JAX-WS to securely communicate with other internal applications.
§ Designed and developed the architecture for securely generating reports containing PII data.
§ Worked on SQL procedures to efficiently scan through millions of records.
COURSE PROJECTS
Political Inclination of Spanish Twitter Users (NLP) Mar 2016 – Apr 2016
§ Researched on quantifying the political inclination of Spanish tweets on Twitter.
§ Developed text classification algorithm with 96% accuracy and introduced Hashtag Hijack Detection.
Crawling, Deduplication, Indexing, and Visualization of Weapons Dataset (IR) Sep 2015 – Nov 2015
§ Crawled weapon images and relevant content using Nutch, Tika, and Selenium.
§ Wrote deduplication algorithms for both exact and near duplicates to remove similar content and images.
§ Performed content extraction using the GeoTopic parser, TikaOCR, and NER.
§ Developed content/link based algorithms to derive the relationship between the data and indexed it in Solr.
§ Visualized data using Data Driven Documents (D3js), FacetView and Banana.
2. AWARDS AND ACHIEVEMENTS
§ Speaker at Spark Summit East 2017 for project Sparkler. Feb 2017
§ Speaker at ApacheCon Big Data Europe 2016 for project Sparkler. Nov 2016
§ Podcast interview by the supercomputing group at the University of Texas, Austin. Aug 2016
§ Speaker at ApacheCon North America 2016 for project DRAT. May 2016
§ Committer and Project Management Committee member of Apache Nutch. May 2016
§ USC Dean’s Master’s Fellowship Award, Recipient. Aug 2015
§ CSC India White Paper Contest, Among top 10 National Finalists. Nov 2012
§ CSC India Java Artifacts Contest, First position for an artifact “Barcode Over the Web (BOW)”. May 2012
§ Microsoft Imagine Cup India, Among top 4 National Finalists. May 2009
§ Dell Social Innovation Challenge at the University of Texas, Austin, Semi-Finalist. May 2009