Sr. Hadoop, Spark Developer Resume Â Menomonee, WIÂ - Hire IT People

SUMMARY:

IT Professional with 8+ years of extensive experience in all phases of Software Development Life Cycle (SDLC) with expertise in data analysis, design, development, testing and deployment of software systems.
4+ years of hands - on experience with BigData and Apache Hadoop Ecosystem components like Map-Reduce, HDFS, Sqoop, Flume, Spark, Spark Streaming Pig, Hive, HBase, Oozie, Kafka, and Zookeeper.
Experience on different flavors of Hadoop distributions, which includes Cloudera (CDH4&5 Distributions) and Hortonworks.
Excellent understanding/ knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager and Map Reduce.
Expertise in installation, configuration, support and managing of Big Data and underlying infrastructure of Hadoop Cluster
Experience on Cloud computing infrastructure such as EC2 Cloud Computing with AWS.
Experience in Partitioning, Bucketing, Join Optimizations and Query Optimizations in Hive and automating the Hive Queries with the Dynamic Partitioning.
Experience and Good understanding of NoSQL databases like Cassandra and Mongo DB
Hands on experience on SPARK, Spark Streaming, SCALA. Creating the Data Frames handle in SPARK with Scala
Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache SPARK, Cloudera and AWS Service console.
Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver efficient results.
Proficient in ETL tools for Designing Data warehouse, Business Intelligence, Analytics, Data Mining, Data Mapping, Data conversion, Data Migrations and Transformations from Source to Target Systems
Experienced in using Kafka as a distributed publisher-subscriber messaging system.
Experience in Continuous integration and automated deployment and management using Jenkins and UDeploy.
Diverse experience in working with variety of Database like Oracle, MySQL, IBM DB2 and Netezza.
Experienced in using IDEs and Tools like Eclipse, NetBeans, GitHub, Jenkins, Maven and IntelliJ.
Extensive Experience in creating Tableau Dashboards using Stack Bars, Bar Graphs, and geographical maps.
Good knowledge on various scripting languages like Linux/Unix shell scripting and Python.
Experience in developing Client-Side Web applications using Core Java and J2EE technologies such as HTML, JSP, jQuery, JDBC, Hibernate and Custom Tags while implementing the client-side validations using JavaScript and Server-side validations using Struts and Spring Validations Framework.
Versatile team player with excellent communication, project management, documentation, interpersonal skills with ability to adapt to rapidly changing environment and quickly learn new technologies.

TECHNICAL SKILLS:

Big Data: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Spark, Spark Streaming, Storm, Kafka, YARN, Zookeeper, HBase, Impala, Cassandra

Hadoop Distributions: Cloudera, Hortonworks and MapR

Databases: SQL Server, MySQL, Oracle, Netezza

Languages: Java, C, HTML Scala, SQL, PL/SQL, UNIX Shell Script, Python

JEE Technologies: JSP, JDBC

FRAME WORKS: MVC Struts, spring, Hibernate

Build Tools: SBT, Maven and Gradle

IDE s: Eclipse, InteliJ

CI Tools: Hudson/Jenkins, NetBeans

Cloud Solutions: AWS EMR, S3

Version Control / Configurations: GIT, SVN, CVS

Defects Triage: JIRA and Bugzilla

Operating Systems: Windows, UNIX, LINUX, Ubuntu, Cent OS

Packages: MS Office Suite, MS Visio, MS Project Professional

File Formats: Avro, JSON, Parquet, Sequence, XML, CSV

Reporting Tools: Tableau

PROFESSIONAL EXPERIENCE:

Confidential, Menomonee, WI

Sr. Hadoop, Spark Developer

Roles & Responsibilities:

Coordinated with business customers to gather business requirements. And, interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents
Worked on analyzing Hadoop 2.7.2 cluster and different Big Data analytic tools including Pig 0.16.0, Hive 2.0 HBase 1.1.2 database and SQOOP 1.4.6
Implemented Spark 2.0 using Scala 2.11.8 and Spark SQL for faster processing of data.
Implemented algorithms for real time analysis in Spark.
Used Spark for interactive queries, processing of streaming data and integration
Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Oozie, Zookeeper, HBase, Flume and Sqoop.
Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
Worked in a team with 30 node cluster and increased cluster by adding Nodes, the configuration for additional data nodes were done by Commissioning process in Hadoop.
Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Used IMPALA for querying the HDFS data.
Developed and implemented two Service Endpoints (end to end) in Java using Play framework, Akka server Hazelcast.
Services like EC2 and S3 for small data sets.
Ingested data from RDBMS and performed data transformations, and then export to Cassandra.
Developed the Pig UDF'S to pre-process the data for analysis.
Used Apache Kafka to get the data from Kafka producer which in turn pushes data to broker.
Wrote robust/reusable Hive Scripts and UDF's in Hive using Java.
Worked on Test Driven Development (TDD) and acceptance- test using Behave.
Implemented partitioning, bucketing in Hive for better organization of the data.
Designed and built unit tests and executed operational queries on HBase.
Built Apache Avro schemas for publishing messages to topics and enabled relevant serializing formats for message publishing and consumption.
Connected Tableau from client end with AWS IP addresses and view the end results.
Used Spark API over Hadoop YARN to perform analytics on data in Hive.
Implemented a script to transmit information from Oracle to HBase using Sqoop.
Installed Hadoop, Map Reduce, and HDFS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
Wrote MapReduce programs to convert text files into AVRO and loading into Hive (Hadoop) tables
Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
Performed real time analysis on the incoming data using Pig, Hive and Map Reduce.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce and loaded final data into HDFS.
Loaded data into HBase using Bulk Load and Non-bulk load.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Imported the data from different sources like HDFS/HBase into Spark RDD.
Developed a data pipeline using Kafka to store data into HDFS.
Connected Tableau from client end with AWS IP addresses and view the end results.
Used Spark API over Hadoop YARN to perform analytics on data in Hive.
Implemented a script to transmit information from Oracle to HBase using Sqoop.
Worked on migrating MapReduce programs into Spark transformations using Spark
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
Involved in the identifying, analyzing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, HDFS, MapReduce, YARN, Spark, Pig, Hive, Sqoop, Flume, Kafka, HBase, Oozie, Scala, Java, SQL Scripting, Linux Shell Scripting, Eclipse, AWS, HBase, AVRO, Oracle, Unix, Tableau

Confidential, Atlanta, GA

Sr. Hadoop Developer

Roles & Responsibilities:

Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming.
Developed Map-Reduce programs to clean and aggregate the data.
Responsible for building scalable distributed data solutions using Hadoop and Spark
Worked hands on with ETL process using Java
Implemented Hive Ad-hoc queries to handle Member data from different data sources such as Epic and Centricity.
Implemented Hive UDF's and did performance tuning for better results.
Analyzed the data by performing Hive queries and running Pig Scripts.
Involved in loading data from UNIX file system to HDFS
Implemented optimized map joins to get data from different sources to perform cleaning operations before applying the algorithms.
Used Sqoop to import and export the data from Netezza and Oracle DB into HDFS and HIVE.
Implemented POC to introduce Spark Transformations.
Worked with NoSQL database HBase, MongoDB to create tables and store data
Handled importing data from various data sources, performed transformations using Hive and Map Reduce, streamed using Flume and loaded data into HDFS
Worked in transforming data from map reduce into HBase as bulk operations.
Implemented CRUD operations on HBase data using thrift API to get real time insights.
Installed Ozzie workflow engine to run multiple MapReduce, Hive, Impala, Zookeeper and Pig jobs which run independently with time and data availability
Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster for generating reports on nightly, weekly and monthly basis.
Used Zookeeper to manage Hadoop clusters and Oozie to schedule job workflows.
Implemented test scripts to support test driven development and continuous integration.
Involved in data ingestion into HDFS using Apache Sqoop from a variety of sources using connectors like JDBC and import parameters
Coordination with Hadoop Admin's during deployment to production
Developed Pig Latin Scripts to extract data from log files and store them to HDFS. Created User Defined Functions (UDFs) to pre- process data for analysis.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Continuously monitoring and managing the Hadoop cluster through Cloudera Manager
Participated in design and implementation discussion for the developing Cloudera 5 Hadoop eco system.
Used JIRA and Confluence to update tasks and maintain documentation.
Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
Created final reports of analyzed data using Apache Hue and Hive Browser and generated graphs for studying by the data analytics team.
Used SQOOP to export the analyzed data to relational database for analysis by data analytics team.

Environment: Hadoop, Cloudera Hadoop, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Java, JSON, Spark, HDFS, YARN, Oozie Scheduler, Zookeeper, Mahout, Linux, UNIX, ETL, MySQL

Confidential, Birmingham, AL

Sr. Hadoop Developer

Roles & Responsibilities:

Design, Installation and Configuration of Flume, Hive, Pig and Oozie on the Hadoop Cluster.
Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
Effectively used Sqoop to transfer data between databases and HDFS.
Import data from open data sources into Amazon S3 and other private clusters.
Developed scripts to automate the creation of Sqoop Jobs for various workflows.
Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
Using HiveQL developed many queries and extracted the business required information.
Developed scripts to automate the creation of hive tables and partitions.
Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
Developed MR jobs for analyzing the data stored in the HDFS by performing map-side joins, reduce-side joins.
Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Designed and developed the framework to log information for auditing and failure recovery.
Design & Develop ETL workflow using Oozie for business requirements, which includes automating the extraction of data from MySQL database into HDFS using Sqoop.

Environment: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume

Confidential, Columbus, OH

Sr. ETL Data Stage Developer

Roles & Responsibilities:

Designed jobs involving various cross reference lookups and joins, shared containers which can be used in multiple jobs.
Created Sequencers at job level to include multiple jobs and a layer level sequence which include all job level sequences.
Involved in the designing of marts and dimensional and fact tables.
Extensively used Parallel Stages like Row Generator, Column Generator, Head, and Peek for development and de-bugging purposes.
Worked on with ETL process using Python and Java
Migrated the jobs from 7.5 to 8.1 and developed new DataStage jobs using data stage/quality stage designer Imported and exported repositories across projects.
Used DataStage Designer for developing jobs and DataStage Director to view the log file for execution errors.
Created DataStage Parallel Jobs to Fact and Dimension Tables.
Wrote Shell Scripts to run data stage jobs, PL/SQL blocks.
Wrote SQL queries for checking the data from Source system as well as Staging.
Used Parallel Extender for splitting the data into subsets, utilized Lookup, Sort, Merge and other stages to achieve job performance.
Worked on DataStage tools like DataStage Designer and DataStage Director for developing the jobs and view the log for errors.

Environment: IBM WebSphere Data stage 8.1, DataStage 7.5.2, Python, UNIX Shell Scripting (Korn /KSH), SQL, Oracle 9i/10g, UNIX and Windows XP

Confidential, Kalamazoo, MI

Sr. Java Developer

Roles & Responsibilities:

The application was developed in J2EE using an MVC based architecture.
Implemented MVC design using Struts1.3 frameworks, JSP custom tag Libraries and various in-house custom tag libraries for the presentation layer.
Created tile definitions, Struts-Config files, validation files and resource bundles for all modules using Struts framework.
Wrote prepared statements and called stored Procedures using callable statements in MySQL.
Executed SQL queries to perform crud operations on customer records.
Gathered requirements and then developed complex workflows which involved Templates. Open Deploy.
Used Eclipse 6.0 as IDE for application development Configured Struts framework to implement MVC design patterns.
Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer.
Designed and developed GUI using JSP, HTML, DHTML and CSS. Worked with JMS for messaging interface.
Used Hibernate for handling database transactions and persisting objects deployed the entire project on WebLogic application server.
Part of the team involved in the design and coding of the Data capture templates, presentation & component templates.
Developed and configured templates to capture and generate multi-lingual content. With this approach North US branch content is encoded in BIG5.
Used Apache web sphere as the application server for deployment.
Used Web services for transmission of large blocks of XML data over HTTP.

Environment: Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, EJB, Struts, Hibernate, WebLogic 8.0, HTML, AJAX, Java Script, JDBC, XML, JMS, XSLT, UML, JUnit, Log4j, Eclipse 6.0

We provide IT Staff Augmentation Services!

Sr. Hadoop, Spark Developer Resume

Menomonee, Wi

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship