Sr. Hadoop Developer Resume
Franklin, TN
SUMMARY
- Over 8+ years of progressive experience in the IT industry wif proven expertise in Analysis, Design, Development, Implementation and Testing of software applications using Big Data Technologies and Java based technologies.
- Over 4+ years of experience on Hadoop working environment includes Map Reduce(MR1), YARN(MR2), Spark, HDFS, Hive, Impala, HBase, Cassandra, Pig, Zookeeper, Oozie, Kafka, Storm, Sqoop and Flume.
- Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures, Functions, DDL, DML SQL queries.
- Experienced wif performing real time analytics on NoSQL data bases like HBase, MongoDB and Cassandra.
- Experience in importing and exporting data using Sqoop from Relational Database Systems (RDBMS) to HDFS.
- Hands on experience in writing Ad - hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Experience in extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
- Developed Pig Latin scripts for data cleansing and Transformation.
- Implemented batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Worked on HBase to load and retrieve data for real time processing using Rest API.
- Imported data from RDBMS to column families in Cassandra through Storage Handler.
- Experienced wif different file formats like CSV, Text files, Sequence files, XML, JSON and Avro files.
- Hands on experience on MapR, Hortonworks distribution (HDP2.2 and 2.3) and Cloudera Hadoop distribution versions CDH4 and CDH5 for executing the respective scripts.
- Worked on real time batch processing using Spark and Kafka.
- Well versed in developing complex MapReduce programs using Apache Hadoop for analyzing Big Data.
- Collecting and aggregating large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
- Experience in Installation, Configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache Hadoop, Cloudera and Hortonworks.
- Experience in developing applications using Java, J2EE, JSP, MVC, Hibernate, JMS, JSF, EJB, XML, AJAX and web based development tools.
- Experience in developing service components using JDBC.
- Experience working wif popular frame works like Spring MVC, Hibernate.
- Implemented SOAP based web services.
- Experience on working wif Build tools like Maven, Ant and Gradle.
- Used Curl scripts to test RESTful Web Services.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experienced in both Waterfall and Agile Development (SCRUM) methodologies.
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Excellent global exposure to various work cultures and client interaction wif diverse teams.
TECHNICAL SKILLS
Programming Language: C++, JAVA, Python, Scala
Hadoop/Big Data Stack: Hadoop, HDFS, MapReduce, Hive, Pig, Spark SQL, Spark-Streaming,PySpark, Scala, Kafka, Storm, Zookeeper, HBase, Cassandra, Sqoop, Flume.
Hadoop Distributions: MapR, Cloudera, Horton works, Apache
Query Languages: Hive QL, SQL, PL/SQL, Pig
Web Technologies: Java, J2EE, Struts, Spring, JSP, Servlet, JDBC, JavaScript
Frameworks: MVC, Struts, Spring, Hibernate
IDE's: Intellij, Eclipse, NetBeans
Build Tools: Ant, Maven, Gradle
Databases: Oracle, MYSQL, MS Access, DB2, Teradata
NO SQL: HBase, Cassandra, MongoDB
Operating Systems: Windows, Linux, Unix, CentOS
Scripting Languages: Shell scripting
Version Control system: SVN, GIT, CVS
PROFESSIONAL EXPERIENCE
Confidential - Franklin, TN
Sr. Hadoop Developer
Responsibilities:
- Good Experience in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark and Kafka wif MapR distribution.
- Experience in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into BDPaaS (Hadoop) using Sqoop imports.
- Experience wif Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD, Spark YARN.
- Experience in integrating Confluent Kafka connect API wif Apache Spark and created data pipelines for real time processing.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Implemented Spark stack to develop preprocessing job which includes DataframesApi's for faster processing of data and to transform the data for upstream consumption.
- Experience in implementing Spark RDD transformations, actions to implement business analysis and Worked wif Spark accumulators and broadcast variables.
- Experience in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Implemented Spark to map provisional calims, member and provider data into common unified data model.
- Implemented intermediate functionalities like events or records count from the flume sinks or Kafka topics by writing Spark programs in java and Scala.
- Hands-on experience wif production analyzing Hadoop applications viz. development, monitoring, debugging and performance tuning.
- Ingested data from RDBMS and performed data transformations, tan export the transformed data to MySQL as per the business requirement and also used MySQL through Java services.
- Implemented message broker such as Apache Kafka and Confluent kafka.
- Integrated Kafka wif Flume in sand box Environment using Kafka source and Kafka sink.
- Developed UDFs in Java as and when necessary to use in PIG and HIVE queries. Followed best practices for preparing and maintaining Apache Hadoop in production.
- Support development wif application architecture in both real time and batch big data processing.
- Written Python scripts that run multiple Hive jobs which halps to automate different hive tables incrementally which are used to generate different reports using Tableau for the Business use.
- Worked wif both MapReduce 1 (JobTracker) and MapReduce 2 (YARN) setups.
- Prepared technical design documents, detailed design documents.
- Written CI/CD pipeline to automate and build the jobs using Jenkins and Gradle.
- Used GIT, codehub and Github as code repository.
- Knowledgeable in regards wif Mesosphere DC/OS and automation wif Docker containers using container management software like Mesos/Marathon, Docker Swarm and Kubernetes.
- Worked on a Streamsets POC to build data pipelines.
- Worked totally in agile methodology.
Environment: MapR, Hadoop, Hive,Pig,Spark,Confluent Kafka,Sqoop, Tableau, Map Reduce, Yarn, Oozie,Java, Maven, JUnit, agile methodologies, MySQL,Scala, python, Unix, Git, Intellij.
Confidential - Atlanta, GA
Sr. Hadoop Developer
Responsibilities:
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Designed workflows and coordinators in Oozie to automate and parallelize Hive and Pig jobs on Apache Hadoop environment by Hortonworks (HDP 2.3)
- Developed Sqoop scripts to import, export data from relational sources and handled incremental loading on the customer data by date.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Experience in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Developed custom aggregate functions using Spark SQL and performed interactive querying on a POC level.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Implemented Spark applications from existing MapReduce framework for better performance.
- Implemented Kafka Java producers, create custom partitions, configured brokers and implemented High level consumers to implement data platform.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Used cluster co-ordination services through Zookeeper.
- Used OOZIE operational services for batch processing and scheduling workflows dynamically.
- Created Hive tables, loaded data and wrote Hive queries that run wifin the map.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Knowledge on handling Hive queries using Spark SQL that integrates wif Spark environment.
- Advanced noledge in performance troubleshooting and tuning Hadoop clusters.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
Environment: Hortonworks distribution (2.3), Map Reduce, HDFS, Spark, Hive, Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Storm, Zookeeper, Tez, J2EE, Eclipse, Cassandra, Tableau.
Confidential - New York, NY
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple Map Reduce programs in Java for Data Analysis.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
- Developed Pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented plan for POC on Impala.
- Experienced in migrating Hive QL into Impala to minimize query response time.
- Experience in using Zookeeper and Oozie operational services for coordinating the cluster and scheduling workflows.
- Experience on Kafka-Storm on Cloudera platform for real time analysis.
- Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements.
- Implemented daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources.
- Implemented Hive Generic UDF's to implement business logic.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Responsible for performing extensive data validation using Hive.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare wif historical data.
- Used Pig UDF's in Python, Java code and uses sampling of large data sets.
- Responsible for writing python program to monitor workflow jobs through Oozie
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Log data Stored in HBase DB is processed and analyzed and tan imported into Hive warehouse.
- Along wif the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Involved in loading data from Teradata database into HDFS using Sqoop queries.
- Involved in submitting and tracking MapReduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Cloudera Hadoop (CDH 4.3), Map Reduce, Mongo, HDFS, Pig, Hive, HBase Sqoop, Flume, Oozie, Storm, Java, Python, Linux, Maven, Teradata, Zookeeper.
Confidential - Jersey City, NJ
Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Designing and developing tables in HBase and storing aggregating data from Hive.
- Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Expertise in developing Hive DDLs to create, alter and drop Hive TABLES.
- Experience in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Implemented HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Computed various metrics using Java MapReduce to calculate metrics that define user experience, revenue etc.
- Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS.
- Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Eclipse and Ant to build the application.
- Performed unit testing using MRUnit.
- Used SQOOP for importing and exporting data into HDFS and Hive.
- Involved in writing MapReduce jobs.
- Involved in SQOOP, HDFS Put or CopyFromLocal to ingest data.
- Involved in processing ingested raw data using MapReduce, Apache Pig and Hive.
- Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or CopyToLocal.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, MapReduce) and move the data files wifin and outside of HDFS.
Environment: Hadoop, MapReduce MRv1, MRv2, Yarn, Hive, Pig, HCatalog, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java.
Confidential - Salem, NC
Java developerResponsibilities:
- Participated in major phases of software development cycle wif requirement gathering, Unit testing, development, and analysis and design phases using Agile/SCRUM methodologies.
- Interaction wif onsite team for change requests and understanding the requirement changes for implementation of the same.
- Implemented the application wif Spring Framework for implementing Dependency Injection and provide abstraction between presentation layer and persistence layer.
- Developed POC for Spring Batch for running batch jobs. Documented how Spring Batch is useful for the current project.
- Developed multiple batch jobs using Spring Batch to import files of different formats like XML, CVS etc.
- Designed and developed the database for the application in DB2. Integrated Spring framework wif JPA that is used for Database operations.
- Developed SQL and JPQL queries, triggers, views to interact wif Database and configured and Scheduled batch jobs using CRON expressions and Quartz Scheduler.
- Developed JUnit test cases for Unit Testing and functional testing for modules and prepared Code Documentation for future reference and upgrades.
- Used Log4j for logging for warnings, errors etc. Involved in Defect fixing and maintenance.
- Used maven 3.0 and Spring 3.0 and eclipse to develop the coding to batch jobs.
- Involved in monitoring production deployments, load testing the applications for scalability wif contributing to hands on development.
- Actively involved in analyzing and fixing the root cause of the technical issues and defects during development.
- Involved in total backend development and deployment of the application. Deployed Application using Web Sphere IBM.
- Maintain high-quality of RESTful services guided by best practices found in the Richardson Maturity Model. Designed REST web services supporting both XML and JSON for handling AJAX requests.
- IBM RTC has been used as a Version Controlling system.
- Involved in document preparation of the application.
- Involved in Unit & Integration Testing of the application.
Environment: JAVA/J2EE, JPA, Spring 3.0, Hibernate, JSON, XML, Web Sphere, Maven, Unix/Linux, JUnit 4, MySQL 5.1, Eclipse, RTC, Log4j, Web Sphere IBM, AJAX.
ConfidentialJava Developer
Responsibilities:
- Design of the application model using Rational Rose by utilizing Struts framework (Model View Controller) and J2EE design patterns.
- Designed Class diagrams of modules using Rational Rose (UML).
- Designed and developed user interfaces using JSP, html.
- Developed Struts components, Servlets, JSPs, EJBs, other Java components to fulfill the requirements.
- Designed and implemented all the front-end components using Struts framework.
- Designed various applications using multi-threading concepts, mostly used to perform time consuming tasks in the background.
- Developed JSP & Servlets classes to generate dynamic HTML.
- Developed JSP pages using Struts custom tags.
- Developed the Presentation layer, which was built using Servlets and JSP and MVC architecture on WebSphere Studio Application Developer.
- Design and develop XML processing components for dynamic menus on the application.
- Persistence layer is implemented using Entity Beans.
- Developed SQL queries efficiently for retrieving data from the database.
- Used Rational Clear case for controlling different versions of the application code.
- Business delegate and Service locator patterns were used to separate the client from invoking the direct business logic implementation and prevented the duplication of code.
- Involved in the integration testing and addressed the Integration issues of the different modules of the application.
- The application was run and deployed in IBM's WebSphere Application Server 5.1. The build process was controlled using Apache Jakarta's Ant.
- Used Log4J for logging purposes.
Environment: Java/J2EE, JDBC, Servlets 2.4, JSP 2.0, EJB 2.0, Struts 1.1, Rational Clear case, WebSphere 5.1, WSAD, UML, UNIX, java-script, Ant 1.6.1, XML, DB2 and Log4J