Sr. Hadoop Developer Resume
Princeton, NJ
SUMMARY
- Around 8 years of overall experience as a developer and data analyst.
- Around 6+years of experience on Hadoop and its ecosystems like Hive, Spark, Scala, Spark SQL, Pig,Sqoop, Hbase, Flume and Zookeeper.
- HDP Certified Developer(Hortonworks).
- Hands on experience with YARN, Hadoop ecosystems like pig, hive for analyzing data, sqoop for data ingestion.
- Expertise in working with Hive like Table Creation, Data distribution into tables by implementing partitioning and bucketing.
- Well - versed in spark components like Spark SQL, MLib, Spark streaming.
- Used Scala and Python to perform RDD transformationsin Apache Spark.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Expertise in transformation and analysis of data using Pig Scripts like sort, join, filter data.
- Experience in importing and exporting data from Relation Databases(Oracle, MySql) to Hive and Hbase and vice versa.
- Experience in cloud platforms like AWS, AZURE.
- Hands-on experience in using Big Data Transformation ETL techniques like Talend
- to extract the data from external repositories and Internal sources.
- Experience in using Python for data transformation.
- Hands on experience on shell scripting and UNIX.
- Expertise in writing Adhoc queries using Hive Query Language.
- Experience in Data warehouse technologies, database concepts and SQL.
- Experience on Database design, RDBMS, data warehouse Knowledge in writing Complex SQL Queries involving multiple tables inner and outer joins.
- Experience in working with different modes of Hadoop like Stand-alone mode, Pseudo mode and distributed mode.
- Excellent understanding of Hadoop Architecture and hands-on experience in writing MapReduce programs
- Experience in integrating Pig with Hive and Hbase using HCatalog.
- Good knowledge on Big Data, Hadoop Architecture, Core Hadoop (HDFS, Map Reduce).
- Good Knowledge on HDFS Daemons which includes Resource Manager, Node Manager, Name Node and Data Node.
- Good Knowledge on programming Languages like Core Java, C, Python.
- Good Knowledge in using IDE like Eclipse, NetBeans, IntelliJ IDEA.
- Good Knowledge on Hue an open source web interface, Ambari(Hortonworks) and Cloudera Distribution Including Apache Hadoop(CDH) for analyzing data with Apache Hadoop.
- Good Knowledge in importing streaming data into HDFS using Flume.
- Good Knowledge on writing custom UDFs for Pig and Hive for generating required results.
- Good Knowledge on ETL tools like Talend.
- Good knowledge on integrating Talend with Hadoop.
- Experience in using Tableau.
- Knowledge on setting up workflow using Apache Oozieworkflow engine for managing and scheduling Hadoop jobs using Oozie-Coordinator.
- Certified as an AWS Associate Developer.
- Knowledge in developing application using SCRUM or Agile methodology.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
TECHNICAL SKILLS
- Hadoop 1.2.1
- Hive 0.13.0
- Spark
- Spark SQL
- pig 0.12.1
- Sqoop 1.4.4
- YARN
- CDH5
- Oozie 3.3.0
- Flume 1.4.0
- Hbase 0.94.11
- Core Java
- Kafka
- UNIX
- Data warehouse
- Oracle 11g
- MySql
- Azure
- Talend 5.5
- SQL
- Microsoft Office
- C
- Python
PROFESSIONAL EXPERIENCE
Sr. Hadoop Developer
Confidential - Princeton, NJ
Responsibilities:
- Loaded already available transaction level data into Hadoop Distributed File System.
- Data before loading is cleaned which includes removing multiple unnecessary columns based on univariate and bivariate analysis.
- The data is cleaned in SQL and loaded into a new table which is moved into HDFS using sqoop.
- Analyzed data using hive for generation of required reports.
- Used tableau to produce reports.
- Used waterfall approach.
- Worked on structured data.
Environment: Hadoop, Hive, UNIX, Sqoop, MySql, SQL, Microsoft Office, Tableau
Hadoop Developer
Confidential - New York, NY
Responsibilities:
- Exported and imported data from RDBMS in different countries to Hadoop using sqoop.
- Preprocessed data using pig scripts so that it can be used for data analysis.
- Moved data onto HDFS from local system and vice versa.
- Data is made available to business analysts by storing in Hive Warehouse which helps them to get the required information using hive.
- Working with ETL team and Web team to explain approach and directions
- Used Oozie to automate the process of loading data into HDFS.
- Used Yarn to manage resources.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in Near real time and persist it to Cassandra.
- Loaded data into hive tables which helps in providing sql like access to the data.
- Generated Adhoc reports on data in hive tables using Adhoc queries.
- Managed and reviewed Hadoop log files.
- Create data pipelines in cloud using Azure Data Factory.
- Provided analytical reports to end users using tableau in integration with Hadoop.
- Worked and transformed structured, semi structured and unstructured data and loaded into Hbase.
Environment: Hadoop 0.20.2, Hive 0.2.0,Spark, pig 0.11.1, Sqoop, Flume, Oracle 10g, Oozie, YARN, Microsoft Office 2010,Azure Data Factory, Azure Storage, UNIX, Microsoft Outlook 2010.
Sr.Hadoop Developer
Confidential - Indianapolis, IN
Responsibilities:
- Loaded large sets of structured, unstructured and semi structured data from different places onto Hadoop Distributed File System.
- Migrated existing Hive processes to Spark to improve performance drastically
- Transformed data which is moved onto HDFS into single file using PigScripts, Python.
- Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
- Created Hive tables (both internal and external) using partitioning, dynamic partitioning and bucketing based on application requirement.
- Hands-on experience on data warehouse star schema modeling, snow-flake modeling, fact & dimension
- Created and managed databases like AICLAIMS.db in hive.
- Loaded the transformed data into Hive tables like CD CLAIMDETAILSfor analysis.
- Loaded data from different relational databases (Oracle and MySql) into HDFS and Vice Versa using sqoop.
- Developed ETL objects for data warehousing purpose in local database and migrated into QA environment for testing purposes.
- Involved in requirement gathering.
- Implemented POC on Launching HDInsights on Azure.
- Developed different Map Reduce applications on Hadoop.
- Participated in conversion of requirements to technical specifications.
- Bulk loaded data into Hbase after cleaning which is accessed by hive for further analysis.
- Used Kafka along with Hbase and Hive.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's, and Scala, Python.
- Wrote Map reduce programs and responsible for troubleshooting issues in execution of MapReduce jobs by inspecting and reviewing log files.
- Loaded data from Linux file system to Hadoop Distributed File System and vice versa.
- Reviewed Hadoop Log files.
- Used YARN for cluster resource management.
- Managed and automated data loading using Oozie.
- Used Talend for integrating Data from Relational Databases like oracle and mysql.
- Used Talend to generate analyzed reports for end users based on the requirements.
Environment: Hadoop 1.2.1, Hive 0.13.0,Spark,Spark SQL, pig 0.12.1, Sqoop 1.4.4, YARN, CDH5, Oozie 3.3.0, Flume 1.4.0, Hbase 0.94.11, Core Java, Kafka, UNIX, Data warehouse,Oracle 11g, MySql, Azure,Talend 5.5, SQL, Microsoft Office.
SQL Developer
Confidential
Responsibilities:
- Develop the business requirement based on the client requirement.
- Develop SQL scripts based on the requirement.
- Created tables and views based on requirement.
- Used different types of joins as per requirement.
- Connect SQL databases with Tableau.
- Used Tableau to generate reports for the end user.
- Used Tableau to completely automate the reporting without any further manual intervention.
- Used waterfall approach.
Environment: SQL, MySql, Microsoft Office, Tableau.
