We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

5.00/5 (Submit Your Rating)

Los Angeles, CA

SUMMARY:

  • 7+ years of experience in IT industry, which includes hands on experience in Bigdata ecosystem related technologies.
  • Possesses 4+ years of rich Hadoop experience in design and development of Big Data applications, which involves Apache Hadoop Map/Reduce, HDFS, Hive, HBase, Pig, Oozie, Sqoop, Kafka, Flume and Spark.
  • Expertise in developing solutions around NOSQL databases like MongoDB and Cassandra.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Hortonworks Distribution (HDP2.X).
  • Experience with all flavor of Hadoop distributions, including Cloudera, Horton works.
  • Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN).
  • Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns.
  • Hands on experience in working with Ecosystems consisting Hive, Pig, Sqoop, Map Reduce, Flume, and Oozie.
  • Worked extensively over semi - structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
  • Expertise in different data loading techniques (Flume, Sqoop) onto HDFS.
  • Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Experience in handling continuous streaming data using Flume and memory channels.
  • Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
  • Extensive Experience in Developing and maintaining Big Data streaming applications using Kafka, Storm, Spark and other Hadoop Components
  • Good Experience in writing complex SQL queries with databases like Oracle 10g, MySQL, and SQL Server
  • The concepts of Objects, Classes and their relationships and how to model them and good hands on experience on Spring 2.5 framework
  • Knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Expertise in using ETL tool Informatica to Extract, Transform and Load the data into ware house.
  • Hands on experience with Spark using Scala and Python.
  • Hands on experience in Spark architecture and its integrations like Spark SQL , Data Frames and Datasets API.
  • Made POC on Spark Real Time Streaming using Kafka into HDFS.
  • Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating and Storing data in S3 buckets and creating Elastic Load Balancers (ELB) for Hadoop front end Web UI’s.
  • Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them through Ambari and using IAM (Identity and Access Management) for creating groups, users.
  • Extensively worked with object oriented Analysis, Design and development of software using UML methodology.
  • Exposed into methodologies like scrum, agile and waterfall.
  • Good knowledge of Normalization, Fact Tables and Dimension Tables, also dealing with OLAP and OLTP systems.
  • Strong Experience in Unit Testing and System testing in Bigdata.
  • Experience working on Version control tools like SVN and GIT revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig, Hive 2.3, Sqoop 1.4, Apache Impala 2.1, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper

Cloud Platform: Amazon AWS, EC2, EC3, MS Azure, Azure SQL Database, Azure Data Lake, Data Factory

Hadoop Distributions: Cloudera, Hortonworks, MapR

Programming Language: Java, Scala, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets

Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS

Web Technologies: HTML, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX

Databases: Oracle 12c/11g, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

IDE and Tools: Eclipse 4.7, NetBeans 8.2, IntelliJ, Maven

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Version Control: GIT, SVN, CVS

PROFESSIONAL EXPERIENCE:

Confidential, Los Angeles, CA

Sr. Hadoop Developer

Responsibilities:

  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Involved Low level design for MR, Hive, Impala, Shell scripts to process data.
  • Involved in complete Big Data flow of the application starting from data ingestion upstream to HDFS, processing the data in HDFS and analyzing the data.
  • Knowledge on handling Hive queries using Spark SQL that integrate with Spark environment implemented in Scala.
  • Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team.
  • Installing and configuring Hive, Sqoop, Flume, Oozie on the Hadoop clusters.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python.
  • Integrated a shell script to create Collections/morphine, Solr Indexes on top of table directories using Map Reduce Indexer Tool within Batch Ingestion Framework.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed Hive Scripts to create the views and apply transformation logic in the Target Database.
  • Involved in the design of Data Mart and Data Lake to provide faster insight into the Data.
  • Involved in using Stream Sets Data Collector tool and created Data Flows for one of the streaming application.
  • Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer).
  • Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
  • Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Configured Zookeeper for Cluster co-ordination services.
  • Developed a unit test script to read a Parquet file for testing Pypark on the cluster.
  • Involved in exploration of new technologies like AWS, Apache Flink, and Apache NIFIetc which can increase the business value.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Impala, Java(jdk1.6), Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Oozie, Scala, Spark, Sqoop, Python, kafka, PySpark.

Confidential, Columbus, OH

Hadoop Developer

Responsibilities:

  • Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka and JMS.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
  • Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
  • Involved in loading data from Linux file systems, servers, java web services using Kafka producers and partitions.
  • Applied Kafka custom encoders for custom input format to load data into Kafka Partitions.
  • Implement POC with Hadoop. Extract data with Spark into HDFS.
  • Used Spark SQL with Scala for creating data frames and performed transformations on data frames.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed code to read data stream from Kafka and send it to respective bolts through respective stream.
  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Developed Map Reduce jobs using Map Reduce Java API and HIVEQL.
  • Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
  • Developing Scripts and Batch Job to schedule a bundle (group of coordinators) which consists of various Hadoop Programs using Oozie.
  • Experienced in optimizing Hive queries, joins to handle different data sets.
  • Involved in ETL, Data Integration and Migration by writing pig scripts.
  • Integrated Hadoop with Solr and implement search algorithms.
  • Experience in Storm for handling realtime processing.
  • Hands on Experience working in Hortonworks distribution.
  • Worked hands on No-SQL databases like MongoDB for POC purpose in storing images and URIs.
  • Designed and implemented MongoDB and associated RESTful web service.
  • Involved in writing test cases and implement test classes using MRUnit and mocking frameworks.
  • Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Used Talend tool to create workflows for processing data from multiple source systems.

Environment: Map Reduce, HDFS, Sqoop, LINUX, Oozie, Hadoop, Pig, Hive, Solr, Spark Streaming, Kafka, Storm, Spark, Scala, Python, MongoDB, Hadoop Cluster, Amazon Web Services, Talend.

Confidential, Stamford, CT

ETL/Hadoop Developer

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle database into HDFS using Sqoop.
  • Gathered business requirements, definition and design of the data sourcing and data flows, data quality analysis, working in conjunction with the data warehouse architect on the development of logical data models.
  • Create, develop, modify and maintain Database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources.
  • Extracted data from various location and load them into the oracle table using SQL*LOADER.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Developed the Pig Latin code for loading, filtering and storing the data.
  • Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Successfully managed Extraction, Transformation and Loading (ETL) process by pulling large volume of data from various data sources using BCP in staging database from MS Access and excel.
  • Was responsible for detecting errors in ETL Operation and rectify them.
  • Incorporated Error Redirection during ETL Load in SSIS Packages.
  • Implemented various types of SSIS Transformations in Packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.
  • Implemented the Master Child Package Technique to manage big ETL Projects efficiently.
  • Involved in Unit testing and System Testing of ETL Process.

Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6, MS SQL Server 2005/08, SQL SSIS.

Confidential, Merrimack, NH

Hadoop Developer Admin

Responsibilities:

  • Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi-structured and unstructured data. .
  • Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Oozie jobs.
  • Developed Oozie workflow’s for executing Sqoop and Hive actions.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
  • Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
  • Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
  • Developed Python utility to validate HDFS tables with source tables.
  • Designed and developed UDF’S to extend the functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications..
  • Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment: Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.

Confidential

Java Developer

Responsibilities:

  • Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards. Followed test-driven development (TDD) and participated in scrum status reports.
  • Provided full SDLC application.
  • Development services including design, integrate, test, and deploy enterprise mission-critical billing solutions.
  • Participated in designing of Use Case, Class Diagram and Sequence Diagram for various Engine components and used IBM Rational Rose for generating the UML notations.
  • Developing Ant, Maven and Shell Scripts to automatically compile, package, deploy and test J2EE applications to a variety of Web Sphere platforms.
  • Experience in developing Business Applications using JBoss, Web Sphere and Tomcat.
  • Perl scripting, shell scripting and PL/SQL programming to resolve business problems of various natures.
  • Client side validations and server side validations are done according to the business needs. Written test cases and done Unit testing and written executing Junit tests.
  • Written ANT Scripts for project build in LINUX environment.
  • Involved in Production implantation and post production support.

Environment: Spring MVC, J2EE, Java, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, JUnit, Apache Tomcat, My SQL Server 2008.

We'd love your feedback!