We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

IL

PROFESSIONAL SUMMARY

  • 7+ years of total IT experience this includes 5+ years of experience in Hadoop and Big data.
  • Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume, and Oozie.
  • Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
  • Written multiple Map Reduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Experience in working on the Hadoop Eco system, also have extensive experience in installing and configuring of the Horton works (HDP) distribution and Cloudera distribution (CDH3 and CDH4).
  • Experience in NoSQL database HBase, MongoDB and Cassandra.
  • Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming.
  • Extensive experience with SQL, PL/SQL and database concepts
  • Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
  • Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
  • Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
  • Extensive experience with Agile Development, Object Modeling using UML.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers. Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
  • Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
  • Experienced in building tool Maven, ANT and logging tool Log4J.
  • Experience in working with Eclipse IDE, NetBeans.

TECHNICAL SKILLS

Big data/Hadoop: Hadoop2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue

NoSQL Databases: HBase, MongoDB3.2 & Cassandra

Programming Languages: Java, Python, SQL, PL/SQL, Hive QL, Unix Shell Scripting, Scala

IDE and Tools: Eclipse 4.6, Netbeans 8.2

Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014

Operating Systems: Windows8/7, UNIX/Linux and Mac OS.

Other Tools: Maven, ANT, WSDL, SOAP, REST.

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile UML, Design Patterns (Core Java and J2EE)

PROFESSIONAL EXPERIENCE

Confidential, IL

Hadoop Developer

Responsibilities:

  • Objective of this project is to build a data lake as a cloud based solution in HDFS using Apache Spark.
  • Analytical solutions, billing solutions, product building, notifications, paper to digital.
  • Helped with team management and played a important part in building and acquiring
  • Developed Spark applications using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created Hive External tables to stage data and then move the data from Staging to main tables.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Spark-SQL/Streaming for faster processing of data.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Used Flume/Sqoop to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Pig, Sqoop, Kafka, Oozie, Cloudera, AWS, Apache Hadoop, HDFS, Hive, Map Reduce, MySQL, Eclipse, PL/SQL, GIT.

Confidential, Bentonville, Arkansas

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
  • Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.

Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, H Catalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.

Confidential, Atlanta, GA

Hadoop Developer/Admin

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
  • Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP.
  • Performed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Created tables in HBase to store variable data formats of PII data coming from different portfolios.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Implemented Map Reduce jobs in HIVE by querying the available data.
  • Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Performance tuning of Hive queries, MapReduce programs for different applications.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, Bit Bucket.

Confidential, Columbus, OH

Big Data Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
  • Successfully managed Extraction, Transformation and Loading (ETL) process by pulling large volume of data from various data sources using BCP in staging database from MS Access and excel.
  • Was responsible for detecting errors in ETL Operation and rectify them.
  • Incorporated Error Redirection during ETL Load in SSIS Packages.
  • Implemented various types of SSIS Transformations in Packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.
  • Implemented the Master Child Package Technique to manage big ETL Projects efficiently.
  • Involved in Unit testing and System Testing of ETL Process.
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System using Oozie Workflow Scheduler.
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.

Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, ETL, Sqoop, crunch API, Pig, HCatalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.

Confidential, Folsom, CA

Big Data Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications.
  • Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop Map Reduce jobs along with components on HDFS, Hive.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
  • Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
  • Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
  • Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.

Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, H Catalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.

Confidential

Software Developer

Responsibilities:

  • Developed using new features of Java 1.5 Annotations, Generics, enhanced for loop and Enums.
  • Used Struts and Hibernate for implementing IOC, AOP and ORM for back end tiers.
  • Designing of the system as per the change in requirement using Struts MVC architecture, JSP, DHTML
  • Designed the application using J2EE patterns.
  • Design of REST APIs that allow sophisticated, effective and low cost application integrations.
  • Developed the presentation layer using Struts Framework.
  • Wrote Java utility classes common for all of the applications.
  • Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
  • Deployed the jar files in the Web Container on the IBM Web Sphere Server 5.x.
  • Designed and developed the screens in HTML with client side validations in JavaScript.
  • Developed the server side scripts using JMS, JSP and Java Beans.
  • Adding and modifying Hibernate configuration code and Java/SQL statements depending upon the specific database access requirements.
  • Design database Tables, View, Index's and create triggers for optimized data access.
  • Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
  • Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies.
  • Developed Web Services using Apache AXIS tool.

Environment: Java 1.5, Struts MVC, JSP, Hibernate 3.0, JUnit, UML, XML, CSS, HTML, Oracle 9i, Eclipse, JavaScript, Web Sphere 5.x, Rational Rose, ANT.

We'd love your feedback!