Hadoop / Spark Developer Resume Bridgewater, NJ - Hire IT People

PROFESSIONAL SUMMARY:

Around 5 Years of professional IT experience in Big Data/Hadoop Ecosystem components such as HDFS, MapReduce, Pig, Hive, ZooKeeper, HBase, Sqoop, Oozie, Kafka, Impala and Flume for data storage and analysis
Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache and Cloudera
Very good experience in complete project life cycle like development, testing, design and AWS services like EC2 & S3
Specialized in developing complex MapReduce jobs in java and User Defined Functions in Pig & Hive
Experience in project planning, setting up standards for implementation and designing of Hadoop based applications
Experience in job/workflow scheduling and monitoring tools like Oozie & Zookeeper
Proficient in fixing production issues and providing error free solution
Good communication and team player Skills, ability to analyze and problem - solving skills

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, Yarn, HBase, MapReduce, Hive, Pig, Sqoop, Oozie, Zookeeper, Flume, Spark & Kafka.

Programming Languages: C, C++, Shell Scripting, Scala, Java, SQL, Python.

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans.

Version Control: Git, Svn.

Databases: Oracle 10g/9i/8i, MySQL, SQL Server, Teradata, DB2, Informix.

Web Technologies: HTML, XML, jQuery, PHP, CSS.

IDE Tools: MyEclipse, Eclipse, IntelliJ IDEA, NetBeans, WSAD.

NoSQL Databases: HBase, Cassandra, MongoDB.

Operating Systems: Windows variants, UNIX, LINUX.

Other Tools: SQL Developer, Maven, JUnit.

PROFESSIONAL EXPERIENCE:

Confidential, Bridgewater, NJ

Hadoop / Spark Developer

Responsibilities:

Designing, implementing and maintaining spark applications to drive quality and consistency within design and development phases and analyzing the scope of the project
Identifying the production and non-production application issues
Handling the data coming from different data sources and loading them from various file systems and databases into HDFS using sqoop
Transforming large sets of structured, semi structured and unstructured data using hive and pig based on business requirement
Created Hive internal and external tables and implemented partitioning and bucketing
Developed PigLatin and HiveQL scripts for Data Analysis & ETL purposes and extended their functionality by developing complex Pig & Hive User Defined Functions in Java
Developed Hive queries which will invoke and run Map Reduce jobs in the backend
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
Developed complex MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables
Managing and scheduling Jobs on a Hadoop cluster using Oozie work flows and Oozie Coordinator engine
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Collected huge amounts of data from different sources and aggregated the data using Apache Kafka and store the data into HDFS for analysis
Supported formal testing and resolved test defects and also executed unit test cases to ensure software quality
Prepared deliverables for client review and approval
Setting up Kerberos principals and testing HDFS, Hive, Pig, and MapReduce access for the new users
Controlling the version of the project related components and documents and conducting knowledge sharing sessions and weekly status report meetings within the project
Co-ordination with onsite/offshore team members on daily basis

Technology/Tools: Hadoop-Cloudera(CDH3/4 ), HDFS, MapReduce, Hive, Pig, Sqoop, Kafka, Scala, Spark, Hbase, Talend, Oozie, Maven, Java, SQL, Oracle, UNIX, SqlDeveloper, Putty

Confidential, Waltham, MA

Hadoop / Spark Developer

Responsibilities:

Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software
Developed MapReduce jobs to convert data files into Parquet file format and included MRUnit to test the correctness of MapReduce programs
Design and architecture of batch implementation on Hadoop environment
Created Job Streams/Jobs in Talend Administration Center (TAC) to run the Hadoop jobs
Worked on Data loading into Hive for Data Ingestion history and Data content summary
Worked on Spark SQl, Created dataframes by loading data from Hive tables and created prep data and stored in AWS S3
Programmed MapReduce jobs for analyzing petabytes of data sets on daily basis and derive data patterns
Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS and vice-versa
Optimized MapReduce codes, Pig scripts, Hive queries and involved in performance tuning and analysis
Used Spark for series of dependent jobs and for iterative algorithms.
Developed a data pipeline using Kafka and Spark Streaming to store data into HDFS
Used Apache Flume for Streaming the Data from various sources into HDFS
Implemented a streaming process using Spark to pull data from an external REST API
Used Kafka for Website activity tracking, Stream processing and for auto-scaling the backend Servers based on the events throughput
Extensively worked on Oozie and UNIX scripts for batch processing and scheduling workflows dynamically
Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment
Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior
Used Hive to partition and bucket data
Converting the existing relational database model to Hadoop ecosystem

Technology/Tools: Hadoop-Hortonworks, HDFS, MapReduce, Hive, Sqoop, Kafka, Scala, Spark, Hbase, Talend, Oozie, Maven, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, UNIX, MySQL, RDBMS, Ambari, Cron

Confidential, Woonsocket, RI

Hadoop Developer

Responsibilities:

Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
Worked on Creating Kafka topics, partitions, writing custom partitioner classes
Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS
Imported and exported the data using SQOOP from HDFS to Relational Database systems and vice-versa
Integrating bulk data into Cassandra using MapReduce programs
Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure
Processed data into HDFS, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems
Involved in the POC implementation of migrating map reduce programs into spark transformations using Spark and Scala
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data
Optimized the Hive queries by setting different combinations of Hive parameters and developed UDFs (User Defined Functions) to extend core functionality of PIG and HIVE queries as per requirement
Extracted the data from Teradata into HDFS using the Sqoop.
Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari
Exported the result set from Hive to MySQL using Shell scripts
Involved in writing custom Pig Loaders and Storage classes to work with a variety of data formats such asJSON, Compressed CSV, etc
Documenting the procedures performed for the project development

Technology/Tools: Hadoop-Hortonworks/Cloudera, HDFS, MapReduce, Hive, Sqoop, Spark, Hbase, Pig, flume, Scala, Kafka, Oozie, Zookeeper, Teradata, Windows 7, SSH Tectia client, remedy and Jira ticketing tools

Confidential

Hadoop Developer

Responsibilities:

Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
Extracted data from Agent Nodes into HDFS using Python scripts and performed UNIX shell commands using python sub-process.
Ingested data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
Imported data from Amazon S3 to HIVE using Sqoop & Kafka and maintained multi-node Dev and Test Kafka Clusters
Configured Hive and written Hive UDF's and UDAF's Also, created partitions such as Static and Dynamic with bucketing.
Integrated Amazon Redshift with Spark using Scala.
Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like spark.
Imported and exported data into HDFS and hive using Sqoop and Kafka with batch and streaming
Experienced with Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
Implemented performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
Used Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting.
Build the automated build and deployment framework using GitHub and Maven etc.
Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.

Technology/Tools: Scala, Hadoop, HDFS, Hive, Oozie, Sqoop, NiFi, Spark, Kafka, Elastic Search, Shell Scripting, HBase, Python, GitHub, Tableau, Oracle, MySQL, Teradata and AWS

We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

Bridgewater, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship