We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

Peoria, IL

SUMMARY

  • Overall 6 years of IT experience in a variety of industries, which includes hands on experience in Hadoop developer.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, flume, Spark, HBase, Yarn, Oozie, Kafka, and Zookeeper.
  • Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Strong experience in writing applications using python, Scala and MySQL
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera, MapR and Horton Works.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Excellent Java development skills using J2EE, J2SE web services.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Good Knowledge on Cloud Computing with Amazon Web Services like EC2, S3 which provides fast and efficient processing of Big Data.
  • Having experience in developing a data pipeline using Kafka to store data into HDFS.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume .
  • Developed multiple Kafka Producers and Consumers from as per the software requirement specifications.
  • Experienced in using the spark application master to monitor the spark jobs and capture the logs for the spark jobs.

TECHNICAL SKILLS

Big Data/ Hadoop: HDFS, MapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Spark, HBase, Spark, and Apache Kafka

Cloud Computing: Amazon Web Services.

Java/J2EE Technologies: J2EE, Python MySQL and Scala

Database: Oracle (SQL & PL/SQL), My SQL, HBase.

IDE: Eclipse

XML Related and Others: XML, DTD, XSD, XSLT, JAXB, JAXP, CSS, AJAX, JavaScript.

PROFESSIONAL EXPERIENCE

Confidential, Peoria, IL

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
  • Developed Spark Applications by using Python and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR
  • Developed highly optimized Spark applications to perform data cleansing, validation, transformation and summarization activities
  • Created Sqoop scripts to import/export data from RDBMS to S3 data store
  • Data pipeline consists Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data
  • Worked on Installation and configuring of Zookeeper to co-ordinate and monitor the cluster resources.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for loading data files from various external sources like MySQL into staging area in MySQL databases.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Good experience in handling data manipulation using python Scripts.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Optimized Hive QL/ pig scripts by using execution engine like Spark.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.

Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Apache Spark, Kafka, Scripting, HBase, Python, Zookeeper, MySQL.

Confidential

Hadoop Developer

Responsibilities:

  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way.
  • Created batch analysis job prototypes using Hadoop, Pig, Oozie and Hive.
  • Assisted with data capacity planning and node forecasting.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Responsible for technical ecosystem (software, data, interfaces, integration) and strategic roadmaps.
  • Responsible for designing and developing modern, cross-browser compatible user interfaces.
  • Researching and evaluation of new tools and technologies to solve business problems.
  • Translates business concepts to technical implementations to drive alignment and decision making
  • Work on a geographically dispersed team embracing Agile and DevOps strategies for themselves and others while driving adoption to enable greater technology and business value
  • Effective and efficient utilization of programming tools and techniques.
  • Mentors others and continually develops themselves.
  • Open to new ideas and technologies with a strong desire to learn.
  • Experience with Agile development methodologies and tools to iterate quickly on product changes, developing user stories and working through backlog (XP, Continuous Integration and JIRA a plus).
  • Ability to engage subject matter experts and translate business goals into actionable solutions.
  • Ability to work effectively with business and technical teams.
  • Ability to identify and drive aligned technical direction with all stakeholders.
  • Ability to meet deadlines, goals and objectives.
  • Load and transform large sets of structured, semi structured and unstructured data.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Agile and Jira.

Confidential, MA

Hadoop/Spark Developer

Responsibilities:

  • Gathered and analyzed requirements from the business users and translated the business requirements into conceptual and detailed technical design
  • Bulk importing of data from various data sources into Hadoop 2.5.2 and transform data in flexible ways by using Apache Nifi 0.2.1, Kafka 2.0.x, Flume 1.6.0 and Storm 0.9.x
  • Developed Map reduce program to extract and transform the data sets and resultant dataset were loaded to Cassandra and vice versa using Kafka 2.0.x
  • Used Spark API 1.4.x over Cloudera Hadoop YARN 2.5.2 to perform analytics on data in Hive
  • Exploring with the Spark 1.4.x, improving the performance and optimization of the existing algorithms in Hadoop 2.5.2 using Spark Context, SparkSQL, and Data Frames
  • Implemented Batch processing of data sources using Apache Spark 1.4.x
  • Developed analytical components using Spark 1.4.x, Scala 2.10.x and Spark Stream
  • Imported data from different sources like HDFS/Hbase 0.94.27 to Spark RDD
  • Developed Spark scripts by using Scala Shell commands as per the requirement
  • Developed numerous MapReduce jobs in Scala 2.10.x for Data Cleansing and Analyzing Data in Impala 2.1.0
  • Loaded and extracted the data using Sqoop 1.4.6 from Oracle 12.1.0.1 into HDFS
  • Worked on implementation and maintenance of Cloudera Hadoop cluster
  • Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines
  • Produced a quality Technical documentation for operating, complex configuration management, architecture changes and maintaining HADOOP Clusters
  • Used Jira 6.4 for project tracking, Bug tracking and Project Management
  • Involved in Scrum calls, Grooming and Demo meeting

Environment: Hadoop 2.5.2, HDFS, Spark 1.4.x, MapReduce, Impala 2.1.0, Sqoop 1.4.6, Nifi 0.2.1, Kafka 2.0.x, Flume 1.6.0, Storm 0.9.x, HBase 0.94.27, Scala 2.10.x, Cloudera CDH 4.7.1, Oracle 12.1.0.1, Scrum, JIRA 6.4

Confidential

Hadoop Engineer

Responsibilities:

  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Written MapReduce code to parse the data from various sources and storing parsed data into HBase and Hive.
  • Worked on creating combiners, partitions, and distributed cache to improve the performance of MapReduce jobs.
  • Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
  • Responsible for debug, optimization of Hive scripts and implementing DE duplication logic in Hive using a rank key function (UDF).
  • Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
  • Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Oracle.
  • Used Impala to read, write and query the Hadoop data in HDFS and HBase.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
  • Experience in streaming log data using Flume and data analytics using Hive.
  • Extracted the data from RDBMS (Oracle, MySQL & Teradata) to HDFS using Sqoop.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive QL, HBase, Zookeeper, Oozie, Flume, Impala, Cloudera, MySQL, UNIX Shell Scripting, Tableau, Python, Spark.

We'd love your feedback!