We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

4.00/5 (Submit Your Rating)

TN

SUMMARY:

  • Around 7 years of Professional Software developer with technical expertise in all phases of Software development life cycle (SDLC), expertizing in BigData Technologies like Hadoop and Spark Ecosystem.
  • 5+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Ecosystem MapReduce, HDFS, Yarn/MRv2, Pig, Hive, HBase, Spark, Kafka, Flume, Oozie, Sqoop, AWS, NiFi and Zookeeper.
  • Hands on expertise in working and designing of Row keys & Schema Design with NoSQL databases like HBase.
  • Extensively worked on Spark using Scala on cluster for computational (analytics), On top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
  • Excellent Programming skills at a higher level of abstraction using Scala and Python.
  • Hands on experience in developing SPARK applications using Spark libraries like Spark core, Spark MLlib, Spark Streaming and Spark SQL.
  • Strong experience on real time data analytics using Spark Streaming, Kafka and NiFi.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism.
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Worked on GUI Based Hive Interaction tools like Hue, Hive View for querying data.
  • Experienced in migrating form on premise to AWS using AWS Data Pipeline and AWS Firehose.
  • Experience writing python script to spin up EMR cluster along with shell scripting.
  • Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, triggers.
  • Experience in OLTP and OLAP design, development, testing and support of Data warehouses.
  • Experience working with OLAP, star pattern and snow flake pattern data warehousing.
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
  • Hands on experience with NoSQL Databases like HBase for performing analytical operations.
  • Had competency in using Chef, Puppet and Ansible configuration and automation tools. Configured and administered CI tools like Jenkins, Hudson Bambino for automated builds.
  • Extensive knowledge on data serialization techniques like Avro, Sequence Files, Parquet, JSON and ORC.
  • Hands on experience in using various Hadoop distros Cloudera, Hortonworks and Amazon EMR.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
  • Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
  • Job work - flow scheduling and service communication tools like Oozie and Zookeeper.
  • Good knowledge on Amazon Web Services (AWS) cloud services like EC2, S3, and EMR.
  • Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud.
  • Exporting and importing data into S3.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like CVS, GIT, SVN.
  • Experienced in checking status of cluster using Cloudera manager, Ambari, Ganglia and Nagios.
  • Ability to work with Onsite and Offshore Teams.
  • Experience in writing Shell Scripts in Unix/Linux.
  • Good experience with use-case development, with methodologies like Agile and Waterfall.
  • Good understanding of all aspects of Unit, Regression, Agile, White & Black-box testing.
  • Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.

TECHNICAL SKILLS:

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Ambari, NiFi.

Cloud Environment: AWS, Google Cloud

Hadoop Distributions: Cloudera, Hortonworks

ETL: Talend

Languages: Python, Shell Scripting, Scala.

NoSQL Databases: MongoDB, HBase, DynamoDB.

Development / Build Tools: Eclipse, Git, IntelliJ and log4J.

RDBMS: Oracle 10g,11i, MS SQL Server, DB2

Testing: MRUnit Testing, Quality Center (QC)

Virtualization: VMWare, Docker, AWS/EC2, Google Compute Engine, Vagrant.

Build Tools: Maven, Ant, Gradle

PROFESSIONAL EXPERIENCE:

Confidential, TN

Hadoop Engineer

Responsibilities:

  • Experience in implementing Scala framework code using IntelliJ and UNIX scripting to implement the workflow for the jobs.
  • Involved in gathering business requirement, analyse the use case and implement the use case end to end.
  • Worked closely with the Architect; enhanced and optimized product Spark and Scala code to aggregate, group and run data mining tasks using Spark framework.
  • Experienced in loading the raw data into RDDs and validate the data.
  • Experienced in converting the validated RDDs into Data frames for further processing.
  • Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
  • Experienced in fine tuning the jobs for better performance in the production cluster space.
  • Worked totally in Agile methodologies, used Rally scrum tool to track the User stories and Team performance.
  • Worked extensively in Impala Hue to analyse the processed data and to generate the end reports.
  • Experienced working with hive database through beeline.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3 and VPC.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.
  • Experience in using Avro, Parquet and JSON file formats, developed UDFs in Hive.
  • Worked with Log4j framework for logging debug, info & error data.
  • Used Amazon DynamoDB to gather and track the event-based metrics.
  • Implemented the Spark Scala code for Data Validation in Hive.
  • Developed data pipeline using Sqoop to ingest customer behavioural data and purchase histories into HDFS for analysis.
  • Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis using HIVE.
  • Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Written several MapReduce Jobs using Java API, Jenkins for Continuous integration.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Worked with different teams to ensure data quality and availability.
  • Responsible for generating actionable insights from complex data to drive real business results for various applications teams and worked in Agile Methodology projects extensively.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
  • Worked on analysing and resolving the production job failures in several scenarios.
  • Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
  • Knowledge on implementing the JILs to automate the jobs in production cluster.

Environment: Spark, Spark-Streaming, Spark SQL, Redshift, Python, DynamoDB, HDFS, Hive, Pig, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, Jenkins, Eclipse, Git, Oozie, Talend, Soap and Agile Methodology, Nagios.

Confidential, NJ

Hadoop Engineer

Responsibilities:

  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python.
  • Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Experienced in querying data using Spark SQL on top of Spark engine for faster data sets processing.
  • Extensive use of Elastic Load Balancing mechanism with Auto Scaling feature to scale the capacity of EC2 Instances across multiple availability zones in a region to distribute incoming high traffic for the application with zero downtime.
  • Created Partitioned Hive tables and worked on them using HiveQL.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
  • Used Data Frames and Datasets APIs for performing analysis on Hive tables.
  • Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Sqoop for large data transfers from RDBMS to HDFS/HBase/Hive and vice-versa.
  • Worked on continuous Integration tools Jenkins and automated jar files at end of day.
  • Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
  • Experience in setting up app stack and debug Logstash to send Apache logs to AWS Elastic search.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Used Impala connectivity from the User Interface (UI) and query the results using Impala SQL.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings.
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Managed and scheduled several jobs to run over a time on Hadoop cluster using Oozie.
  • Supported in setting up QA environment and updating for implementing scripts with Pig, Hive and Sqoop.
  • Followed Agile Methodology for entire project and supported testing teams.

Environment: Hadoop, HDFS, Hive, MapReduce, AWS Ec2, Impala, Sqoop, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux-Ubuntu, Scala, Tableau, Maven, Jenkins, Cloudera, JUnit, agile methodology.

Confidential

Big Data Hadoop Consultant

Responsibilities:

  • Experienced in migrating and transforming of large sets of Structured, semi structured and Unstructured RAW data from HBase through Sqoop and placed in HDFS for processing.
  • Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Written Java program to retrieve data from HDFS and providing it to REST Services.
  • Sqoop for large data transfers from RDBMS to HDFS/HBase/Hive and vice-versa.
  • Implemented partitioning, bucketing in Hive for better organization of the data.
  • Involved in using HCatalog to access Hive table metadata from MapReduce or Pig code.
  • Created HBase tables, HBase sinks and loaded data into them to perform analytics using Tableau.
  • Installed, configured and maintained Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Created multiple Hive tables, running hive queries in those data, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
  • Experience in creating tables, dropping and altering at run time without blocking using HBase and Hive.
  • Experienced in running batch processes using Pig Latin Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
  • Hands on experience in Developing optimal strategies for distributing the web log data over the cluster, importing and exporting of stored web log data into HDFS and Hive using Sqoop.
  • Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
  • Implemented Partitioning and bucketing in Hive.
  • Managed and scheduled several jobs to run over a time on Hadoop cluster using Oozie.
  • Used MAVEN for building jar files of MapReduce programs and deployed to cluster.
  • Involved in final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
  • Configured Flume to extract the data from the web server output files to load into HDFS
  • Performed Cluster tasks like adding, removing of nodes without any effect on running jobs.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
  • Helped in design of Scalable Big Data Clusters and solutions and involved in defect meetings.
  • Followed Agile Methodology for entire project and supported testing teams.

Environment: Apache Hadoop, MapReduce, HDFS, HBase, CentOS 6.4, Unix, REST web Services, Hive, Pig, Oozie, JSON, Eclipse, QlikView, Qlik Sense, Jenkins, Maven, Sqoop.

Confidential

Hadoop Developer

Responsibilities:

  • Involved in requirement analysis, design, coding and implementation.
  • Sqooping data from various sources like Teradata, Oracle and SQL Server.
  • Worked on logging framework to log application level data into Hadoop for future analysis.
  • Responsible for data integrity checks, duplicate checks to make sure the data is not corrupted.
  • Writing MapReduce, Spark, Hive & Pig for processing & analyzing data.
  • Developing python scripts for auto generation of HQL queries to reduce manual effort.
  • Writing UDF's & UDA's for extended functionality in Pig & Hive.
  • Working on data quality check framework for reporting to business.
  • Scheduling workflows though oozie & AutoSys Schedulers. Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experience with NoSQL databases like HBase as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, AWS Redshift etc.
  • Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Worked with scrum teams to achieve enhanced levels of Agile transformation
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Real time streaming the data using Spark with Kafka.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Built real time pipeline for streaming data using Kafka and Spark Streaming.
  • Executing parameterized Pig, Hive, impala, and UNIX batches in Production.
  • Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
  • Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
  • Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.
  • Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
  • Installed and configured Hive and written Hive UDFs.
  • Use Impala to determine statistical information about Operational Data.

Environment: CDH 5.4.5, Map-Reduce, Spark, AVRO, Parquet, Hive, Java (jdk1.7), Python, Teradata, Sql Server, Oozie, Autosys.

Confidential

Software Developer

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
  • Reviewed the functional, design, source code and test specifications
  • Involved in developing the complete front-end development using Java Script and CSS
  • Author for Functional, Design and Test Specifications.
  • Developed web components using JSP, Servlets and JDBC
  • Designed tables and indexes
  • Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server
  • Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements
  • Implemented the presentation layer with HTML, XHTML and JavaScript
  • Implemented Backend, Configuration DAO, XML generation modules of DIS
  • Analyzed, designed and developed the component
  • Used JDBC for database access
  • Used Spring Framework for developing the application and used JDBC to map to Oracle database.
  • Used Data Transfer Object (DTO) design patterns
  • Unit testing and rigorous integration testing of the whole application
  • Written and executed the Test Scripts using JUNIT

Environment: JSP, XML, Spring Framework, Hibernate, Eclipse (IDE), Micro Services, Java Script, Struts, Tiles, Ant, PL/SQL, Windows, UNIX, Soap, Jasper reports.

We'd love your feedback!