Hadoop Developer Resume IL - Hire IT People

PROFESSIONAL SUMMARY

7+ years of total IT experience this includes 5+ years of experience in Hadoop and Big data.
Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume, and Oozie.
Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
Written multiple Map Reduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
Experience in working on the Hadoop Eco system, also have extensive experience in installing and configuring of the Horton works (HDP) distribution and Cloudera distribution (CDH3 and CDH4).
Experience in NoSQL database HBase, MongoDB and Cassandra.
Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming.
Extensive experience with SQL, PL/SQL and database concepts
Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
Extensive experience with Agile Development, Object Modeling using UML.
Experience in using PL/SQL to write Stored Procedures, Functions and Triggers. Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
Experienced in building tool Maven, ANT and logging tool Log4J.
Experience in working with Eclipse IDE, NetBeans.

TECHNICAL SKILLS

Big data/Hadoop: Hadoop2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue

NoSQL Databases: HBase, MongoDB3.2 & Cassandra

Programming Languages: Java, Python, SQL, PL/SQL, Hive QL, Unix Shell Scripting, Scala

IDE and Tools: Eclipse 4.6, Netbeans 8.2

Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014

Operating Systems: Windows8/7, UNIX/Linux and Mac OS.

Other Tools: Maven, ANT, WSDL, SOAP, REST.

Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile UML, Design Patterns (Core Java and J2EE)

PROFESSIONAL EXPERIENCE

Confidential, IL

Hadoop Developer

Responsibilities:

Objective of this project is to build a data lake as a cloud based solution in HDFS using Apache Spark.
Analytical solutions, billing solutions, product building, notifications, paper to digital.
Helped with team management and played a important part in building and acquiring
Developed Spark applications using Scala and Spark-SQL/Streaming for faster processing of data.
Created Hive External tables to stage data and then move the data from Staging to main tables.
Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
Developed scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Developed Spark code using Spark-SQL/Streaming for faster processing of data.
Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
Developed complete end to end Big-data processing in Hadoop eco system.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Performed File system management and monitoring on Hadoop log files.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
Implemented partitioning, dynamic partitions and buckets in HIVE.
Used Flume/Sqoop to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Pig, Sqoop, Kafka, Oozie, Cloudera, AWS, Apache Hadoop, HDFS, Hive, Map Reduce, MySQL, Eclipse, PL/SQL, GIT.

Confidential, Bentonville, Arkansas

Hadoop Developer

Responsibilities:

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
Responsible for design development of Spark SQL Scripts based on Functional Specifications.
Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.

Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, H Catalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.

Confidential, Atlanta, GA

Hadoop Developer/Admin

Responsibilities:

Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP.
Performed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
Created tables in HBase to store variable data formats of PII data coming from different portfolios.
Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
Implemented Map Reduce jobs in HIVE by querying the available data.
Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
Performance tuning of Hive queries, MapReduce programs for different applications.
Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Used Cloudera Manager for installation and management of Hadoop Cluster.
Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
Integrated Kafka-Spark streaming for high efficiency throughput and reliability
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, Bit Bucket.

Confidential, Columbus, OH

Big Data Engineer

Responsibilities:

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
Responsible for design development of Spark SQL Scripts based on Functional Specifications.
Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
Successfully managed Extraction, Transformation and Loading (ETL) process by pulling large volume of data from various data sources using BCP in staging database from MS Access and excel.
Was responsible for detecting errors in ETL Operation and rectify them.
Incorporated Error Redirection during ETL Load in SSIS Packages.
Implemented various types of SSIS Transformations in Packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.
Implemented the Master Child Package Technique to manage big ETL Projects efficiently.
Involved in Unit testing and System Testing of ETL Process.
Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System using Oozie Workflow Scheduler.
Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.

Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, ETL, Sqoop, crunch API, Pig, HCatalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.

Confidential, Folsom, CA

Big Data Engineer

Responsibilities:

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
Responsible for design development of Spark SQL Scripts based on Functional Specifications.
Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
Developed Kafka producer and consumers, HBase clients, Spark and Hadoop Map Reduce jobs along with components on HDFS, Hive.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.

Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, H Catalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.

Confidential

Software Developer

Responsibilities:

Developed using new features of Java 1.5 Annotations, Generics, enhanced for loop and Enums.
Used Struts and Hibernate for implementing IOC, AOP and ORM for back end tiers.
Designing of the system as per the change in requirement using Struts MVC architecture, JSP, DHTML
Designed the application using J2EE patterns.
Design of REST APIs that allow sophisticated, effective and low cost application integrations.
Developed the presentation layer using Struts Framework.
Wrote Java utility classes common for all of the applications.
Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
Deployed the jar files in the Web Container on the IBM Web Sphere Server 5.x.
Designed and developed the screens in HTML with client side validations in JavaScript.
Developed the server side scripts using JMS, JSP and Java Beans.
Adding and modifying Hibernate configuration code and Java/SQL statements depending upon the specific database access requirements.
Design database Tables, View, Index's and create triggers for optimized data access.
Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies.
Developed Web Services using Apache AXIS tool.

Environment: Java 1.5, Struts MVC, JSP, Hibernate 3.0, JUnit, UML, XML, CSS, HTML, Oracle 9i, Eclipse, JavaScript, Web Sphere 5.x, Rational Rose, ANT.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship