Hadoop / Spark Developer Resume NJ - Hire IT People

SUMMARY

Professional IT experience in ingestion, storage, querying, processing and analysis of Big Data using various Hadoop Ecosystem applications like MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Scala, Flume, Zookeeper, Kafka & Impala
Specialized in writing complex scripts and User Defined Functions in Pig & Hive and customized MapReduce jobs in Java
Expertise in implementing Spark and Scala applications using higher order functions for both batch and interactive analysis
Proficient in importing and exporting data using Sqoop and coordinating cluster resources using Zookeeper
Hands on experience in spinning up different AWS instances namely EC2 - classic & EC2-VPC using cloud formation templates
Strong analytical, quantitative, problem solving and communication skills

TECHNICAL SKILLS

Hadoop/Big Data: Apache Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, Flume, Zookeeper, Impala, Spark, Scala, Ambari, Impala, Kafka, YARN, HDFS, Ranger, Hortonworks & Cloudera distributions

NOSQL Databases: HBase, Cassandra, MongoDB

RDBMS: Oracle, MySQL, SQL Server, Teradata, DB2

Languages: C, C++, Objective C, Java, Scala, R, Python, OpenGL, MIPs, MATLAB, COBOL

Scripting Languages: Unix, Perl, Java Script, Linux Bash Shell Scripting

Operating Systems: Windows, UNIX, Linux, Mac OS X and Mainframes

Tools: Tableau, Erwin Data Modeler, Weka, RapidMiner, Orange, Jenkins, Talend, Maven, GitHub, Informatica, Subversion, Excel, Netbeans IDE, Eclipse

TECHNICAL EXPERIENCE

Hadoop / Spark Developer

Confidential, NJ

Technology/Tools: Hadoop-Hortonworks, HDFS, MapReduce, Hive, Sqoop, Kafka, Scala, Spark, Hbase, Talend, Oozie, Maven

Responsibilities:

Installed, Configured and performed Troubleshooting of Hadoop ecosystem components like MapReduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Kafka & Impala
Imported and exported data from various RDBMS and NOSQL databases to HDFS and vice-versa using Sqoop
Collected and aggregated large amount of web log data using Apache Kafka and stored the data into HDFS for analysis
Programmed MapReduce jobs for analyzing petabytes of data sets on daily basis and derive data patterns
Created Managed and External Hive tables and implemented static/dynamic partitioning and Bucketing
Developed complex queries and User Defined Functions to extend core functionality of PIG & HIVE for data analysis
Implemented a streaming process using Spark to pull data from an external REST API
Performed advanced procedures like text analytics and processing, using the in-memory computing of Spark using Scala
Migrated complex Map Reduce programs into Apache Spark RDD transformations
Used Talend for connecting, cleansing and sharing cloud and on-premises data
Scheduled workflow using Oozie for Map Reduce jobs, Pig & Hive Queries and managed cluster resources using Zookeeper
Migrated entire Data Centers to the cloud using VPC, EC2, S3, EMR, RDS, Splice Machine and DynamoDB services

Big Data Research Assistant

Confidential

Technology/Tools: Hadoop, HDFS, MapReduce, Hive, Sqoop, Zookeeper, Spark, Scala, Hbase, Python, Shell Scripting, Oozie

Responsibilities:

Implemented Data Summarization, Segmentation, Clustering, and Predictive Analysis using Apache Spark for research on "West Nile Virus Surveillance" to determine the correlations between weather and mosquitoes infected with Zika Virus
Installed and Configured Hadoop ecosystem components like MapReduce, HDFS, Hive, Pig, Sqoop, Spark and Zookeeper
Developed MapReduce jobs for Consolidating data from various sources and deriving data patterns
Imported and exported data from HDFS to HBase & Hive and vice-versa using Sqoop
Performed data cleansing and resolved integrity problems using Pig & Used Spark API over Hadoop to analyze data in Hive
Scheduled workflow using Oozie for Map Reduce jobs, Pig & Hive Queries and managed cluster resources using Zookeeper

Hadoop / Spark developer

Confidential

Technology/Tools: Hadoop-Hortonworks/Cloudera, HDFS, MapReduce, Hive, Sqoop, Hbase, Spark, Scala, Kafka, Oozie

Responsibilities:

Developed and tested complex MapReduce jobs for aggregating identified and validated data
Implemented Spark applications using Scala and Spark SQL for faster testing and processing of data
Developed an equivalent Spark Scala code for existing SAS code to extract summary insights on the hive tables
Designing and executing Spark SQL queries on data in Hive in Spark context and ensured performance optimization
Integrated Amazon Redshift with Spark using Scala
Implemented Partitioning, Bucketing, Map side Join, Parallel execution for optimizing the Hive queries
Designed and implemented Pig UDFs for evaluating, filtering, loading and storing data
Importing data from Amazon S3 to HIVE using Sqoop & Kafka and maintained multi-node Dev and Test Kafka Clusters
Imported data from MySQL and MongoDB to HDFS and Hbase using Sqoop
Extracted data from Agent Nodes into HDFS using Python scripts and performed UNIX shell commands using pythonsub-process
Developed applications using SCRUM and Agile Methodology
Executed hundreds of Sqoop queries, Pig scripts, Hive queries using Oozie workflows and sub-workflows
Built Hadoop clusters on multiple EC2 instances and Used Amazon Simple Storage Service(S3) for storing and accessing data from Hadoop clusters
Performed Hadoop updates, patches and version upgrades as required

Java Developer

Confidential

Technology/Tools: Java, Eclipse, JSP, JQuery, JavaScript, HTML, CSS, Spring, SOAP, XML, Maven, Jenkins, Struts MVC, WebLogic

Responsibilities:

Designed highly-performing, scalable, enterprise-grade Java application for United Health Group
Involved in the full life cycle of software design process including prototyping, design, interface implementation, testing and maintenance
Designed the screens of applications using HTML, JSP, JavaScript and CSS
Developed dynamic and browser compatible pages using HTML5, DHTML, CSS3, JQuery and JavaScript
Used Spring Validation framework to implement the server side validations and used Angular JS to get the data from server asynchronously by using JSON objects
Implemented Object-relation mapping in the persistence layer using Hibernate framework in conjunction with Spring functionality
Implemented Cross cutting concerns like logging and declarative transaction management using Spring AOP
Created JUnit test cases for unit testing and developed generic JS functions for validations
Optimized SQL queries to improve the loading times of web pages
Performed reviews for Code, Design and Technical Specifications

We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship