Hadoop Developer Resume
Detroit, MI
SUMMARY:
- Around 5 years of professional experience in Requirements Analysis, Design, Development and Big Data technologies such as HDFS, MapReduce, PIG, Hive, Sqoop, Spark, Storm, Kafka, and Flume.
- HDP Certified Developer and Amazon AWS Associate Developer.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
- Excellent knowledge of Hadoop components such as HDFS, Job Tracker, Name Node, Data Node.
- Good Knowledge in designing and developing POCs in Spark using Scala for performance comparison between Spark and Hive, SQL.
- Experience in using Hadoop distributions like Cloudera, Hortonworks, Hue.
- Experience in writing Pig scripts to transform raw data from several data sources into required data format.
- Experience in integrating Pig with Hive and HBase using HCatalog.
- Knowledge in Handling different file formats like Parquet, Avro files, RC files using SerDe in Hive.
- Experience in importing and exporting data using Sqoop from HDFS/Hive to Relational Database Systems and vice versa.
- Experience in using job workflow schedulers and monitoring applications such as Oozie and Zookeeper.
- Experience in collecting, aggregating and moving huge chucks of data and streaming data from different sources using Flume and Kafka.
- Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig.
- Good Knowledge on writing User Defined Functions and using them in Hive, Pig.
- Good Knowledge in writing Map Reduce codes for specific functionalities.
- Experience in working with UNIX/LINUX environments and writing shell scripts.
- Experience in working with traditional databases such as Oracle and MySql.
- Good understanding of ETL processes and Data warehousing.
- Experience in writing SQL scripts using different joins and subqueries.
- Good Knowledge on query optimization in SQL.
- Good Knowledge on reporting tools like Tableau and Talend.
- Good Knowledge on integrating Talend with Hadoop.
- Good Knowledge on integrating Tableau with relational databases.
- Good Knowledge in developing applications using Waterfall model, Agile methodology.
- Excellent interpersonal and communication skills, creative, research - minded, technically competent and result-oriented with problem solving and leadership skills and highly motivated with the ability to work independently or as an integral part of a team.
TECHNICAL SKILLS:
Technologies: Hadoop Core HDFS, YARN, MapReduce
Hadoop Ecosystems: Pig, Hive, Sqoop, Spark, Spark SQL, Spark Streaming, Flume, HBase, Zookeeper, Oozie, Kafka.
Hadoop Distributions: Hortonworks (HDP 2.2/2.4), Cloudera (CDH 4/5), Hue Sql
Databases: Oracle 11g,12c, MySql, Microsoft Access
NoSql Databases: HBase, Mango DB, Cassandra
Languages: Scala, Core Java, Python, C, UNIX Shell Scripting
IDEs: IntelliJ IDEA, Eclipse, Net Beans
Reporting Tools: Tableau, Talend, Microsoft Excel
PROFESSIONAL EXPERIENCE:
Confidential, Detroit, MI
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Worked on performance analysis and improvements for Hive and Pig scripts.
- Involved in Optimization of Hive Queries.
- Loaded and transformed large sets of structured, semi-structured and unstructured data from UNIX system to HIVE tables.
- Created different types of tables (Internal and External) in hive based on requirement and loaded data for analysis.
- Created partitioned and bucketed tables in hive based on requirement.
- Extensively developed Hive Queries using multiple joins and tuned them for faster performance.
- Involved in Data Ingestion to HDFS from various data sources.
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase and Cassandra.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Maintained System integrity of all sub-components like Pig, Hive, Spark, Kafka, HBase and Cassandra.
- Involved in exploration of new technologies like AWS, Apache Flink, and Apache Nifi etc. which can increase the business value.
- Maintained and optimized AWS infrastructure (EC2, S3 for users/system).
- Implemented SPARK batch jobs on AWS instances through Amazon Simple Storage Service (Amazon S3).
- Extensively used Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases.
- Create appropriate data flows using Apache Nifi
- Create custom processors for different data flows in Apache Nifi.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's, and Scala.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Automated Sqoop, hive and pig jobs using Oozie scheduling.
- Loaded streaming data on to HDFS using Flume engine and Kafka.
- Helped business team by installing and configuring Hadoop ecosystem (Hive) components along with Hadoop admin.
- Created and maintained technical documentation for executing Hive queries and Pig Scripts.
- Involved in requirement gathering and converting the requirement into technical specifications.
- Used Tableau to generate reports in tabular format and graphical format for monthly analysis.
Environment: Hadoop 1.2.1, Yarn, Hive 0.13.0, Cassandra, Pig 0.12.1, Sqoop 1.4.4, CDH5, Flume, Spark SQL, Spark Streaming, Kafka, Apache Nifi, Scala, AWS, Oozie 3.3.0, Oracle 12c, SQL, UNIX, Java, Tableau, Agile methodology.
Confidential, Detroit, MI
Hadoop Developer
Responsibilities:
- Loaded large sets of structured, unstructured and semi structured data from different sources like relational databases (Oracle) using Sqoop and streaming data using Flume onto Hadoop Distributed File System.
- Developed Sqoop Jobs for loading data.
- Used Git for version controller.
- Loaded text files from UNIX Systems to HDFS using Hadoop Commands.
- Transformed data which is moved onto HDFS into single file using Pig Scripts, Python.
- Created Hive tables (both internal and external) using partitioning, dynamic partitioning and bucketing based on application requirement.
- Developed Hive Queries for Data analysis and Managing the data in hive tables.
- Hands-on experience on data warehouse star schema modeling, snow-flake modeling, fact and dimension tables.
- Implemented a prototype to integrate PDF documents into a web application using Github.
- Worked with the team lead in requirement gathering.
- Bulk loaded data into HBase after cleaning and accessed using hive for analysis.
- Reviewed Hadoop Log files.
- Automated Hive, Sqoop, Pig and Flume scripts using Oozie.
- Integrated Talend with Hadoop to develop reports for further analysis.
Environment: Hadoop 0.20.2, Hive 0.2.0, Pig 0.11.1, Sqoop, Flume, Oracle 10g, YARN, Oozie, UNIX, Python, SQL, HBase, HDP 2.2, Git, Talend.
Confidential, Princeton, NJ
Data Analyst/Hadoop Developer
Responsibilities:
- Loaded data from flat files onto tables in Oracle Database using SQL*Loader.
- Data before loading is cleaned which includes removing multiple unnecessary columns based on univariate and bivariate analysis.
- Created and managed SQL tables on Oracle Database.
- Developed SQL Queries using multiple joins and subqueries, and tuned them for better performance.
- Loaded already available transaction level data from relational database onto Hadoop Distributed File System using Sqoop.
- Created tables in Hive and loaded the data into tables using Hive Queries.
- Analyzed data using SQL queries and Hive Queries for future predictions.
- Compared the speeds for query processing between SQL and Hive.
- Generated graphical reports using Tableau and Microsoft Excel.
- Used waterfall model for SDLC (Software Development Life Cycle).
Environment: Hadoop, Hive, Sqoop, Oracle 10g, UNIX, SQL, HDP, Tableau, Microsoft Office.