We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

TN

SUMMARY:

  • Around 7 years of experience as a professional software developer with technical expertise in all phases of Software development life cycle ( SDLC ), specializing in BigData t echnologies like Spark and Hadoop Ecosystem .
  • 5 years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Ecosystem MapReduce, HDFS, Yarn/MRv2, Pig, Hive, HBase, Spark, Kafka, Flume, Oozie, Sqoop, AWS, NiFi and Zookeeper .
  • Hands on expertise in working and designing of Row keys & Schema Design with NoSQL databases like HBase.
  • Excellent programming skills with a higher level of abstraction using Scala and Python .
  • Hands on experience in developing SPARK applications using Spark libraries like Spark core, Spark MLlib, Spark Streaming and Spark SQL .
  • Strong experience in real time data analytics using Spark Streaming, Kafka and NiFi .
  • Working knowledge of Amazon’s Elastic Cloud Compute ( EC2 ) infrastructure for computational tasks and Simple Storage Service ( S3 ) as the storage mechanism.
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Worked on GUI Based Hive Interaction tools like Hue, Hive View for querying data.
  • Experienced in migrating from on - premise to AWS using AWS Data Pipeline and AWS Firehose.
  • Experience writing python script to spin up EMR cluster along with shell scripting.
  • Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, triggers .
  • Experience in OLTP and OLAP design, development, testing and support of Data warehouses .
  • Experience working with OLAP, star pattern and snow flake pattern data warehousing.
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
  • Hands on experience with NoSQL Databases like HBase for performing analytical operations.
  • Extensive knowledge of data serialization techniques like Avro, Sequence Files, Parquet, JSON and ORC .
  • Hands-on experience in using various Hadoop distributions like Cloudera, Hortonworks and Amazon EMR.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
  • Hands-on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API .
  • Knowledge of job work-flow scheduling and service communication tools like Oozie and Zookeeper.
  • Good knowledge of Amazon Web Services (AWS) cloud services like EC2, S3, and EMR .
  • Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud.
  • Knowledge of exporting and importing data into S3 .
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and worked on various version control tools like CVS, GIT and SVN .
  • Experienced in checking the status of clusters using Cloudera manager and Ambari .
  • Ability to work with Onsite and Offshore Teams.
  • Experience in writing Shell Scripts in Unix/Linux.
  • Good experience with use-case development and methodologies like Agile and Waterfall .
  • Proven ability to manage all stages of project development. Strong Problem Solving and Analytical skills and ability to take balanced and independent decisions.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, NiFi.

Cloud Environment: AWS, Google Cloud

Hadoop Distributions: Cloudera, Hortonworks

ETL: Talend

Languages: Python, Shell Scripting, Scala.

NoSQL Databases: MongoDB, HBase, DynamoDB.

Development / Build Tools: Eclipse, Git, IntelliJ and log4J.

RDBMS: Oracle 10g,11i, MS SQL Server, DB2

Testing: MRUnit Testing, Quality Center (QC)

Virtualization: VMWare, Docker, AWS/EC2, Google Compute Engine, Vagrant.

Build Tools: Maven, Ant, Gradle

PROFESSIONAL EXPERIENCE:

Confidential, TN

Hadoop Engineer

Responsibilities :

  • Implemented Spark/Scala framework code using IntelliJ and UNIX scripting to implement the workflow for the jobs.
  • Involved in gathering business requirement, analyzing the use case and implementing the use case end to end.
  • Worked closely with the Architect; enhanced and optimized product Spark and Scala code to aggregate, group and run data mining tasks using Spark framework.
  • Loaded the raw data into RDD s to validate and converted the RDDs into Data frames for further processing.
  • Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
  • Worked in Agile methodologies, used Rally scrum tool to track the user stories and team performance.
  • Worked extensively in Impala to analyze the processed data and to generate the end reports.
  • Worked with Hive database through beeline .
  • Imported data from AWS S3 into Spark RDD. Performed transformations and actions on RDDs .
  • Worked with different file-formats like Avro , Parquet and JSON file formats.
  • Developed UDF’s for processing complex data, making use of Eval, Load and Filter Functions.
  • Implemented the Spark Scala code for Data Validation in Hive.
  • Work with IT marketing analytics to assist with data-related technical issues and provide support
  • Developed data pipeline using Sqoop to ingest customer behavior data and purchase histories into HDFS for analysis.
  • Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis using HIVE .
  • Developed Sqoop- Scripts to extract data from RDBMS like MYSQL, Teradata into HDFS and HIVE.
  • Developed Kafka cluster to load data from data source to HDFS, Spark.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines .
  • Continuously monitored and managed the Hadoop cluster through Cloudera Manager .
  • Worked with different teams to ensure data quality and availability.
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively .
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint .
  • Worked on analyzing and resolving the production job failures in several scenarios.
  • Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.

Environment: CDH-5.10, Spark, HDFS, Hive, Apache Kafka, Flume, Sqoop, Scala, Shell scripting, Linux, Jenkins, IntelliJ, maven, Git, Oozie, AWS, Talend and Agile Methodology.

Confidential, Jersey, NJ

Hadoop Engineer

Responsibilities :

  • Worked on migrating MapReduce programs into Spark transformations using Spark and Python .
  • Developed Spark jobs using python along with Yarn/MRv2 for interactive and Batch Analysis.
  • Queried data using Spark SQL with Spark engine for faster data set processing.
  • Extensively used Elastic Load Balancing mechanism with Auto Scaling feature to scale the capacity of EC2 instances across multiple availability zones in a region to distribute incoming high traffic for the application with zero downtime.
  • Created Partitioned Hive tables and worked on them using HiveQL .
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
  • Used Data Frames and Datasets APIs for performing analysis on Hive tables .
  • Monitored Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixed them as per the recommendations.
  • Responsible for Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Used Sqoop for large data transfers from RDBMS to HDFS/HBase/Hive and vice-versa.
  • Worked on continuous integration tools like Jenkins and automated jar files at the end of the day.
  • Developed Unix shell scripts to load a large number of files into HDFS from Linux File System .
  • Monitored workload, job performance and capacity planning using Cloudera Manager .
  • Used Impala connectivity from the User Interface (UI) and query the results using Impala SQL .
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Managed and scheduled several jobs to run over a certain period on Hadoop cluster using Oozie .
  • Supported the setting up of QA environment and implemented scripts with Pig, Hive and Sqoop .
  • Followed Agile Methodology for entire project and supported testing teams.

Environment: Hadoop, HDFS, Hive, MapReduce, Impala, Sqoop, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux-Ubuntu, AWS, Tableau, Maven, Jenkins, Cloudera, JUnit, agile methodology.

Confidential, Detroit, MI

Big Data Hadoop Consultant

Responsibilities:

  • Migrated and transformed large sets of Structured, semi structured and Unstructured RAW data from HBase through Sqoop and placed in HDFS for processing.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Loaded data into HBase using Bulk Load and Non-bulk load .
  • Wrote Java program to retrieve data from HDFS and provided it to REST Services.
  • Used Sqoop for large data transfers from RDBMS to HDFS/HBase/Hive and vice-versa.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Involved in using HCatalog to access Hive table metadata from MapReduce or Pig code .
  • Created HBase tables, HBase sinks and loaded data into them to perform analytics using Tableau .
  • Installed, configured and maintained Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Created multiple Hive tables, ran hive queries in those data, and implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
  • Created tables, with dropping and altering at run time without blocking using HBase and Hive .
  • Responsible for running batch processes using Pig Latin Scripts and developing Pig UDFs for data manipulation according to the business requirements.
  • Developed optimal strategies for distributing the web log data over the cluster, and importing and exporting of stored web log data into HDFS and Hive using Sqoop .
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI .
  • Implemented Partitioning and bucketing in Hive.
  • Managed and scheduled several jobs to run over a certain period on Hadoop cluster using Oozie .
  • Used MAVEN for building jar files of MapReduce programs and deployed them to the cluster .
  • Involved in final reporting of data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector .
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Performed Cluster tasks like adding, removing of nodes without any effect on the jobs that are running.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Helped in design of Scalable Big Data Clusters and solutions and involved in defect meetings.
  • Followed Agile Methodology for the entire project and supported testing teams.

Environment: Apache Hadoop, MapReduce, HDFS, HBase, CentOS 6.4, Unix, REST web Services, Hive, Oozie, JSON, Eclipse, Qlik Sense, Jenkins, Maven, Sqoop.

Confidential

Hadoop Developer

Responsibilities:

  • Involved in requirement analysis, design, coding and implementation.
  • Used Sqoop to extract data from various sources like Teradata, Oracle and SQL Server .
  • Worked on logging framework to log application level data into Hadoop for future analysis.
  • Responsible for data integrity checks and duplicate checks to make sure the data is not corrupted.
  • Wrote MapReduce, Spark, Hive and Pig for processing and analyzing data.
  • Developed python scripts for auto generation of HQL queries to reduce manual effort.
  • Wrote UDFs and UDAs for extended functionality in Pig and Hive.
  • Worked on data quality check framework for reporting to business.
  • Scheduled workflows though oozie and AutoSys Schedulers . Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala .
  • Experience with NoSQL databases like HBase, as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, AWS Redshif t, etc.
  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Worked with scrum teams to achieve enhanced levels of Agile transformation .
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Worked on real time streaming of the data using Spark with Kafka .
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming .
  • Executed parameterized Pig, Hive, impala, and UNIX batches in production.
  • Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
  • Wrote python scripts for internal testing which pushes the data reading from a file into Kafka queue which is in turn consumed by the Storm application.
  • Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.
  • Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
  • Installed and configured Hive and wrote Hive UDFs.
  • Used Impala to obtain statistical information about the Operational Data .

Environment: CDH 5.4.5, Map-Reduce, Spark, AVRO, Parquet, Hive, Java (jdk1.7), Python, Teradata, Sql Server, Oozie, Autosys.

Confidential

Software Developer

Responsibilities:

  • Involved in the Design, Development and Support phases of Software Development Life Cycle (SDLC).
  • Reviewed the functional, design, source code and test specifications.
  • Involved in complete front-end development using Java Script and CSS.
  • Developed web components using JSP, Servlets and JDBC.
  • Designed tables and indexes.
  • Designed, implemented, tested and deployed Enterprise Java Beans, both Session and Entity, using WebLogic as the Application Server.
  • Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Implemented Backend, Configuration DAO, and XML generation modules of DIS.
  • Used JDBC for database access.
  • Used Spring Framework for developing the application and used JDBC to map to Oracle database.
  • Used Data Transfer Object (DTO) design patterns.
  • Involved in unit testing and rigorous integration testing of the whole application.
  • Wrote and executed test scripts using JUNIT.

Environment: JSP, XML, Spring Framework, Eclipse (IDE), Micro Services, Java Script, Struts, Tiles, Ant, PL/SQL, Windows, UNIX, Jasper reports.

Hire Now