We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

Deerfield, IL

SUMMARY

  • Overall 6 years of IT experience in a variety of industries, which includes hands on experience in Hadoop developer.
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, flume, Spark, HBase, Yarn, Oozie,and Zookeeper.
  • Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Strong experience in writing applications using python, Scala and MySQL
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera,MapRand Horton Works.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Excellent Java development skills using J2EE, J2SE web services.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Worked in large and small teams for systems requirement, design & development.
  • Preparation of Standard Code guidelines, analysis and testing documentations.
  • Experience in working with Hadoop in Stand-alone, pseudo and distributed modes.
  • Good Knowledge on Cloud Computing with Amazon Web Services like EC2, S3 which provides fast and efficient processing of Big Data.

TECHNICAL SKILLS

Big Data/ Hadoop: HDFS, MapReduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Spark, HBase, Spark, and Apache Kafka

Cloud Computing: Amazon Web Services.

Java/J2EE Technologies: J2EE, Python MySQL and Scala

Database: Oracle (SQL & PL/SQL), My SQL, HBase.

IDE: EclipseXML Related and Others XML, DTD, XSD, XSLT, JAXB, JAXP, CSS, AJAX, JavaScript.

PROFESSIONAL EXPERIENCE

Confidential, Deerfield, IL

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me.
  • I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
  • Worked on Installation and configuring of Zookeeper to co-ordinate and monitor the cluster resources.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for loading data files from various external sources like MySQL into staging area in MySQL databases.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Actively involved in code review and bug fixing for improving the performance.
  • Good experience in handling data manipulation using python Scripts.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Created HBase tables to store various data formats of incoming data from different portfolios.

Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Apache Spark, Shell Scripting, HBase, Python, Zookeeper, MySQL.

Confidential, TX

Hadoop Developer

Responsibilities:

  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way.
  • Created batch analysis job prototypes using Hadoop, Pig, Oozie and Hive.
  • Assisted with data capacity planning and node forecasting.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future s.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Performed CRUD operations in HBase.
  • Developed Hive queries to process the data.
  • Monitoring, Performance tuning of Hadoop clusters, Screening Hadoop cluster job performances and capacity planning Monitor Hadoop cluster connectivity and security Manage and review Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux, Cluster Management

Confidential

Hadoop Engineer

Responsibilities:

  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Written MapReduce code to parse the data from various sources and storing parsed data into HBase and Hive.
  • Worked on creating combiners, partitions, and distributed cache to improve the performance of MapReduce jobs.
  • Developed Shell Script to perform data profiling on the ingested data with the help of HIVE Bucketing.
  • Responsible for debug, optimization of Hive scripts and implementing DE duplication logic in Hive using a rank key function (UDF).
  • Experienced in writing Hive validation scripts that are used in validation framework (for daily analysis through graphs and presented to business users).
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre - processing with Pig and Hive.
  • Imported all the customer specific personal data to Hadoop using Sqoop component from various relational databases like Netezza and Oracle.
  • Used Impala to read, write and query the Hadoop data in HDFS and HBase.
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
  • Experience in streaming log data using Flume and data analytics using Hive.
  • Extracted the data from RDBMS (Oracle, MySQL & Teradata) to HDFS using Sqoop

Environment: Hadoop, MapReduce, HDFS, Pig, Hive QL, HBase, Zookeeper, Oozie, Flume, Impala, Cloudera, MySQL, UNIX Shell Scripting, Tableau, Python, Spark.

We'd love your feedback!