We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

3.00/5 (Submit Your Rating)

NJ

SUMMARY

  • Professional IT experience in ingestion, storage, querying, processing and analysis of Big Data using various Hadoop Ecosystem applications like MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, Flume, Zookeeper, Kafka & Impala
  • Specialized in writing complex scripts and User Defined Functions in Pig & Hive and customized MapReduce jobs in Java
  • Expertise in implementing Spark and Scala applications using higher order functions for both batch and interactive analysis
  • Proficient in importing and exporting data using Sqoop and coordinating cluster resources using Zookeeper
  • Hands on experience in spinning up different AWS instances namely EC2 - classic & EC2-VPC using cloud formation templates
  • Strong analytical, quantitative, problem solving and communication skills

TECHNICAL SKILLS

Hadoop/Big Data: Apache Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, Flume, Zookeeper, Impala, Spark, Scala, Ambari, Impala, Kafka, YARN, HDFS, Ranger, Hortonworks & Cloudera distributions

NOSQL Databases: HBase, Cassandra, MongoDB

RDBMS: Oracle, MySQL, SQL Server, Teradata, DB2

Languages: C, C++, Objective C, Java, Scala, R, Python, OpenGL, MIPs, MATLAB, COBOL

Scripting Languages: Unix, Perl, Java Script, Linux Bash Shell Scripting

Operating Systems: Windows, UNIX, Linux, Mac OS X and Mainframes

Tools: Tableau, Erwin Data Modeler, Weka, RapidMiner, Orange, Jenkins, Talend, Maven, GitHub, Informatica, Subversion, Excel, Netbeans IDE, Eclipse

TECHNICAL EXPERIENCE

Hadoop / Spark Developer

Confidential, NJ

Technology/Tools: Hadoop-Hortonworks, HDFS, MapReduce, Hive, Sqoop, Kafka, Scala, Spark, Hbase, Talend, Oozie, Maven

Responsibilities:

  • Installed, Configured and performed Troubleshooting of Hadoop ecosystem components like MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Kafka & Impala
  • Imported and exported data from various RDBMS and NOSQL databases to HDFS and vice-versa using Sqoop
  • Collected and aggregated large amount of web log data using Apache Kafka and stored the data into HDFS for analysis
  • Programmed MapReduce jobs for analyzing petabytes of data sets on daily basis and derive data patterns
  • Created Managed and External Hive tables and implemented static/dynamic partitioning and Bucketing
  • Developed complex queries and User Defined Functions to extend core functionality of PIG & HIVE for data analysis
  • Implemented a streaming process using Spark to pull data from an external REST API
  • Performed advanced procedures like text analytics and processing, using the in-memory computing of Spark using Scala
  • Migrated complex Map Reduce programs into Apache Spark RDD transformations
  • Used Talend for connecting, cleansing and sharing cloud and on-premises data
  • Scheduled workflow using Oozie for Map Reduce jobs, Pig & Hive Queries and managed cluster resources using Zookeeper
  • Migrated entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services

Big Data Research Assistant

Confidential

Technology/Tools: Hadoop, HDFS, MapReduce, Hive, Sqoop, Zookeeper, Spark, Scala, Hbase, Python, Shell Scripting, Oozie

Responsibilities:

  • Implemented Data Summarization, Segmentation, Clustering, and Predictive Analysis using Apache Spark for research on "West Nile Virus Surveillance" to determine the correlations between weather and mosquitoes infected with Zika Virus
  • Installed and Configured Hadoop ecosystem components like MapReduce, HDFS, Hive, Pig, Sqoop, Spark and Zookeeper
  • Developed MapReduce jobs for Consolidating data from various sources and deriving data patterns
  • Imported and exported data from HDFS to HBase & Hive and vice-versa using Sqoop
  • Performed data cleansing and resolved integrity problems using Pig & Used Spark API over Hadoop to analyze data in Hive
  • Scheduled workflow using Oozie for Map Reduce jobs, Pig & Hive Queries and managed cluster resources using Zookeeper

Hadoop / Spark developer

Confidential

Technology/Tools: Hadoop-Hortonworks/Cloudera, HDFS, MapReduce, Hive, Sqoop, Hbase, Spark, Scala, Kafka, Oozie

Responsibilities:

  • Developed and tested complex MapReduce jobs for aggregating identified and validated data
  • Implemented Spark applications using Scala and Spark SQL for faster testing and processing of data
  • Developed an equivalent Spark Scala code for existing SAS code to extract summary insights on the hive tables
  • Designing and executing Spark SQL queries on data in Hive in Spark context and ensured performance optimization
  • Integrated Amazon Redshift with Spark using Scala
  • Implemented Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the Hive queries
  • Designed and implemented Pig UDFs for evaluating, filtering, loading and storing data
  • Importing data from Amazon S3 to HIVE using Sqoop & Kafka and maintained multi-node Dev and Test Kafka Clusters
  • Imported data from MySQL and MongoDB to HDFS and Hbase using Sqoop
  • Extracted data from Agent Nodes into HDFS using Python scripts and performed UNIX shell commands using pythonsub-process
  • Developed applications using SCRUM and Agile Methodology
  • Executed hundreds of Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows
  • Built Hadoop clusters on multiple EC2 instances and Used Amazon Simple Storage Service(S3) for storing and accessing data from Hadoop clusters
  • Performed Hadoop updates, patches and version upgrades as required

Java Developer

Confidential

Technology/Tools: Java, Eclipse, JSP, JQuery, JavaScript, HTML, CSS, Spring, SOAP, XML, Maven, Jenkins, Struts MVC, WebLogic

Responsibilities:

  • Designed highly-performing, scalable, enterprise-grade Java application for United Health Group
  • Involved in the full life cycle of software design process including prototyping, design, interface implementation, testing and maintenance
  • Designed the screens of applications using HTML, JSP, JavaScript and CSS
  • Developed dynamic and browser compatible pages using HTML5, DHTML, CSS3, JQuery and JavaScript
  • Used Spring Validation framework to implement the server side validations and used Angular JS to get the data from server asynchronously by using JSON objects
  • Implemented Object-relation mapping in the persistence layer using Hibernate framework in conjunction with Spring functionality
  • Implemented Cross cutting concerns like logging and declarative transaction management using Spring AOP
  • Created JUnit test cases for unit testing and developed generic JS functions for validations
  • Optimized SQL queries to improve the loading times of web pages
  • Performed reviews for Code, Design and Technical Specifications

We'd love your feedback!