Hadoop Developer Resume
IL
PROFESSIONAL SUMMARY
- 7+ years of total IT experience this includes 5+ years of experience in Hadoop and Big data.
- Experience in Apache Hadoop ecosystem components like HDFS, Map Reduce, Pig, Hive, Impala, HBase, SQOOP, Flume, and Oozie.
- Improved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Data Frame, Pair RDD's, Spark YARN.
- Written multiple Map Reduce programs in Python for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Experience in working on the Hadoop Eco system, also have extensive experience in installing and configuring of the Horton works (HDP) distribution and Cloudera distribution (CDH3 and CDH4).
- Experience in NoSQL database HBase, MongoDB and Cassandra.
- Good understanding of Hadoop architecture and hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming.
- Extensive experience with SQL, PL/SQL and database concepts
- Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
- Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
- Experience in writing Pig and Hive scripts and extending the core functionality by writing custom UDF's.
- Extensive experience with Agile Development, Object Modeling using UML.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers. Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Experienced in building tool Maven, ANT and logging tool Log4J.
- Experience in working with Eclipse IDE, NetBeans.
TECHNICAL SKILLS
Big data/Hadoop: Hadoop2.7/2.5, HDFS1.2.4, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue
NoSQL Databases: HBase, MongoDB3.2 & Cassandra
Programming Languages: Java, Python, SQL, PL/SQL, Hive QL, Unix Shell Scripting, Scala
IDE and Tools: Eclipse 4.6, Netbeans 8.2
Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014
Operating Systems: Windows8/7, UNIX/Linux and Mac OS.
Other Tools: Maven, ANT, WSDL, SOAP, REST.
Methodologies: Software Development Lifecycle (SDLC), Waterfall, Agile UML, Design Patterns (Core Java and J2EE)
PROFESSIONAL EXPERIENCE
Confidential, IL
Hadoop Developer
Responsibilities:
- Objective of this project is to build a data lake as a cloud based solution in HDFS using Apache Spark.
- Analytical solutions, billing solutions, product building, notifications, paper to digital.
- Helped with team management and played a important part in building and acquiring
- Developed Spark applications using Scala and Spark-SQL/Streaming for faster processing of data.
- Created Hive External tables to stage data and then move the data from Staging to main tables.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Developed scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Spark-SQL/Streaming for faster processing of data.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Performed File system management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Used Flume/Sqoop to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Pig, Sqoop, Kafka, Oozie, Cloudera, AWS, Apache Hadoop, HDFS, Hive, Map Reduce, MySQL, Eclipse, PL/SQL, GIT.
Confidential, Bentonville, Arkansas
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
- Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
- Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
- Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
- Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, H Catalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.
Confidential, Atlanta, GA
Hadoop Developer/Admin
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Worked in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
- Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS which consists of key attributes consumed by Business and other down streams.
- Worked on NoSQL (HBase) for support enterprise production and loading data into HBASE using Impala and SQOOP.
- Performed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Worked on AWS provisioning EC2 Infrastructure and deploying applications in Elastic load balancing.
- Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
- Created tables in HBase to store variable data formats of PII data coming from different portfolios.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
- Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Worked on importing data from HDFS to MYSQL database and vice-versa using SQOOP.
- Implemented Map Reduce jobs in HIVE by querying the available data.
- Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
- Performance tuning of Hive queries, MapReduce programs for different applications.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
- Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
- Integrated Kafka-Spark streaming for high efficiency throughput and reliability
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Oracle 12c, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera, Bit Bucket.
Confidential, Columbus, OH
Big Data Engineer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
- Successfully managed Extraction, Transformation and Loading (ETL) process by pulling large volume of data from various data sources using BCP in staging database from MS Access and excel.
- Was responsible for detecting errors in ETL Operation and rectify them.
- Incorporated Error Redirection during ETL Load in SSIS Packages.
- Implemented various types of SSIS Transformations in Packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.
- Implemented the Master Child Package Technique to manage big ETL Projects efficiently.
- Involved in Unit testing and System Testing of ETL Process.
- Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
- Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System using Oozie Workflow Scheduler.
- Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, ETL, Sqoop, crunch API, Pig, HCatalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.
Confidential, Folsom, CA
Big Data Engineer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Involved in gathering requirements from client and estimating time line for developing complex queries using HIVE and IMPALA for logistics application.
- Responsible for design development of Spark SQL Scripts based on Functional Specifications.
- Exploring with the Spark improving the Performance and Optimization of the existing algorithms in Hadoop.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop Map Reduce jobs along with components on HDFS, Hive.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Developed Simple to complex Map/reduce Jobs using Scala and Java in Spark.
- Developed data pipeline using Flume, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
- Worked on importing data from HDFS to Oracle database and vice-versa using SQOOP to configure Hive Meta store with MySQL, which stores the metadata for Hive tables.
- Written Hive and Pig scripts as per requirements to automate the workflow using shell scripts.
- Participated in Rapid Application Development and Agile processes to deliver new cloud platform services.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
Environment: Apache Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, crunch API, Pig, H Catalog, UNIX, Java, Oracle, SQL Server, MYSQL, Oozie, Python.
Confidential
Software Developer
Responsibilities:
- Developed using new features of Java 1.5 Annotations, Generics, enhanced for loop and Enums.
- Used Struts and Hibernate for implementing IOC, AOP and ORM for back end tiers.
- Designing of the system as per the change in requirement using Struts MVC architecture, JSP, DHTML
- Designed the application using J2EE patterns.
- Design of REST APIs that allow sophisticated, effective and low cost application integrations.
- Developed the presentation layer using Struts Framework.
- Wrote Java utility classes common for all of the applications.
- Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
- Deployed the jar files in the Web Container on the IBM Web Sphere Server 5.x.
- Designed and developed the screens in HTML with client side validations in JavaScript.
- Developed the server side scripts using JMS, JSP and Java Beans.
- Adding and modifying Hibernate configuration code and Java/SQL statements depending upon the specific database access requirements.
- Design database Tables, View, Index's and create triggers for optimized data access.
- Analyzed and fine Tuned RDBMS/SQL queries to improve performance of the application with the database.
- Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies.
- Developed Web Services using Apache AXIS tool.
Environment: Java 1.5, Struts MVC, JSP, Hibernate 3.0, JUnit, UML, XML, CSS, HTML, Oracle 9i, Eclipse, JavaScript, Web Sphere 5.x, Rational Rose, ANT.