Hadoop / Spark Developer Resume
NJ
SUMMARY
- Professional IT experience in ingestion, storage, querying, processing and analysis of Big Data using various Hadoop Ecosystem applications like MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, Flume, Zookeeper, Kafka & Impala
- Specialized in writing complex scripts and User Defined Functions in Pig & Hive and customized MapReduce jobs in Java
- Expertise in implementing Spark and Scala applications using higher order functions for both batch and interactive analysis
- Proficient in importing and exporting data using Sqoop and coordinating cluster resources using Zookeeper
- Hands on experience in spinning up different AWS instances namely EC2 - classic & EC2-VPC using cloud formation templates
- Strong analytical, quantitative, problem solving and communication skills
TECHNICAL SKILLS
Hadoop/Big Data: Apache Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, Flume, Zookeeper, Impala, Spark, Scala, Ambari, Impala, Kafka, YARN, HDFS, Ranger, Hortonworks & Cloudera distributions
NOSQL Databases: HBase, Cassandra, MongoDB
RDBMS: Oracle, MySQL, SQL Server, Teradata, DB2
Languages: C, C++, Objective C, Java, Scala, R, Python, OpenGL, MIPs, MATLAB, COBOL
Scripting Languages: Unix, Perl, Java Script, Linux Bash Shell Scripting
Operating Systems: Windows, UNIX, Linux, Mac OS X and Mainframes
Tools: Tableau, Erwin Data Modeler, Weka, RapidMiner, Orange, Jenkins, Talend, Maven, GitHub, Informatica, Subversion, Excel, Netbeans IDE, Eclipse
TECHNICAL EXPERIENCE
Hadoop / Spark Developer
Confidential, NJ
Technology/Tools: Hadoop-Hortonworks, HDFS, MapReduce, Hive, Sqoop, Kafka, Scala, Spark, Hbase, Talend, Oozie, Maven
Responsibilities:
- Installed, Configured and performed Troubleshooting of Hadoop ecosystem components like MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Kafka & Impala
- Imported and exported data from various RDBMS and NOSQL databases to HDFS and vice-versa using Sqoop
- Collected and aggregated large amount of web log data using Apache Kafka and stored the data into HDFS for analysis
- Programmed MapReduce jobs for analyzing petabytes of data sets on daily basis and derive data patterns
- Created Managed and External Hive tables and implemented static/dynamic partitioning and Bucketing
- Developed complex queries and User Defined Functions to extend core functionality of PIG & HIVE for data analysis
- Implemented a streaming process using Spark to pull data from an external REST API
- Performed advanced procedures like text analytics and processing, using the in-memory computing of Spark using Scala
- Migrated complex Map Reduce programs into Apache Spark RDD transformations
- Used Talend for connecting, cleansing and sharing cloud and on-premises data
- Scheduled workflow using Oozie for Map Reduce jobs, Pig & Hive Queries and managed cluster resources using Zookeeper
- Migrated entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services
Big Data Research Assistant
Confidential
Technology/Tools: Hadoop, HDFS, MapReduce, Hive, Sqoop, Zookeeper, Spark, Scala, Hbase, Python, Shell Scripting, Oozie
Responsibilities:
- Implemented Data Summarization, Segmentation, Clustering, and Predictive Analysis using Apache Spark for research on "West Nile Virus Surveillance" to determine the correlations between weather and mosquitoes infected with Zika Virus
- Installed and Configured Hadoop ecosystem components like MapReduce, HDFS, Hive, Pig, Sqoop, Spark and Zookeeper
- Developed MapReduce jobs for Consolidating data from various sources and deriving data patterns
- Imported and exported data from HDFS to HBase & Hive and vice-versa using Sqoop
- Performed data cleansing and resolved integrity problems using Pig & Used Spark API over Hadoop to analyze data in Hive
- Scheduled workflow using Oozie for Map Reduce jobs, Pig & Hive Queries and managed cluster resources using Zookeeper
Hadoop / Spark developer
Confidential
Technology/Tools: Hadoop-Hortonworks/Cloudera, HDFS, MapReduce, Hive, Sqoop, Hbase, Spark, Scala, Kafka, Oozie
Responsibilities:
- Developed and tested complex MapReduce jobs for aggregating identified and validated data
- Implemented Spark applications using Scala and Spark SQL for faster testing and processing of data
- Developed an equivalent Spark Scala code for existing SAS code to extract summary insights on the hive tables
- Designing and executing Spark SQL queries on data in Hive in Spark context and ensured performance optimization
- Integrated Amazon Redshift with Spark using Scala
- Implemented Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the Hive queries
- Designed and implemented Pig UDFs for evaluating, filtering, loading and storing data
- Importing data from Amazon S3 to HIVE using Sqoop & Kafka and maintained multi-node Dev and Test Kafka Clusters
- Imported data from MySQL and MongoDB to HDFS and Hbase using Sqoop
- Extracted data from Agent Nodes into HDFS using Python scripts and performed UNIX shell commands using pythonsub-process
- Developed applications using SCRUM and Agile Methodology
- Executed hundreds of Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows
- Built Hadoop clusters on multiple EC2 instances and Used Amazon Simple Storage Service(S3) for storing and accessing data from Hadoop clusters
- Performed Hadoop updates, patches and version upgrades as required
Java Developer
Confidential
Technology/Tools: Java, Eclipse, JSP, JQuery, JavaScript, HTML, CSS, Spring, SOAP, XML, Maven, Jenkins, Struts MVC, WebLogic
Responsibilities:
- Designed highly-performing, scalable, enterprise-grade Java application for United Health Group
- Involved in the full life cycle of software design process including prototyping, design, interface implementation, testing and maintenance
- Designed the screens of applications using HTML, JSP, JavaScript and CSS
- Developed dynamic and browser compatible pages using HTML5, DHTML, CSS3, JQuery and JavaScript
- Used Spring Validation framework to implement the server side validations and used Angular JS to get the data from server asynchronously by using JSON objects
- Implemented Object-relation mapping in the persistence layer using Hibernate framework in conjunction with Spring functionality
- Implemented Cross cutting concerns like logging and declarative transaction management using Spring AOP
- Created JUnit test cases for unit testing and developed generic JS functions for validations
- Optimized SQL queries to improve the loading times of web pages
- Performed reviews for Code, Design and Technical Specifications