Hadoop Developer Resume
Columbus, OH
SUMMARY:
- 10 years of total Software development experience with Hadoop Ecosystem, Big Data and Data Science Analytical Platforms, and Enterprise - level Cloud Base Computing and Applications.
- Working on Agile Methodology for more than 6 Years.
- Around 3years of experience in Design and Implementation of Big data applications using Hadoop stack Spark Hive, Pig, Oozie, Sqoop, Flume, HBase and NoSQL Data bases.
- Hands on experience in writing complex Hive, NiFi and data modeling.
- Have experience creating batch style distributed computing applications using Apache Spark and Flume.
- Have hands-on experience in SPARK SQL and usage of Hadoop Architecture frameworks and various components
- Experience and in-depth understanding of analyzing data using HIVEQL, PIG.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs).
- Good hands-on experience with PIVOTAL'S query processing model HAWQ.
- In-depth understanding of NoSQL databases such as HBase.
- Proficient knowledge and hands on experience in writing shell scripts in Linux.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Have a fairly good understanding of Kafka.
- Experienced in job workflow scheduling and monitoring tools like Oozie and ESP.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks etc.) to fullyimplement and leverage new Hadoop features
- Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC).
- Experience in ETL tools like Informatica Power Center (Repository Manager,Mapping Designer, Workflow Manager and Workflow Monitor).
- Hand on experience in reporting tools such as Microstrategy and Tableau
- Hands-on experience on working with schedulers like ESP, DAC, AutoSys, Control-M,SOS- Berlin
TECHNICAL SKILLS:
Hadoop/BigData: HDFS, HBase, Pig, Hive, Sqoop, Flume, MongoDB, oozie, Zookeeper
ETL Tools: Informatica 9.6, 9.5.1, 9.1, 8.6
Business Intelligence Tools: R Studio, Tableau 9.1, MicroStrategy,MS Excel - Analytical Solver
Databases & Tools: Teradata 13, Netezza, Oracle 11g
Scheduler: Database Administration Console 10(DAC), Autosys.
Project Planning & Tracking: HP ALM, JIRA
Content management: Confluence
Release management: TFS, Tortoise SVN
Programming: R, Python, SPARK,SQL
Database & Skills: Oracle 11 g,Netezza
Operating Systems: Windows 2000, NT, XP, UNIX.
PROFESSIONAL EXPERIENCE:
Confidential - Columbus, OH
Hadoop Developer
Responsibilities:
- Spear lead the POC and migration of data warehouse from Informatica 9.1 to Hadoop with Hadoop too tools HDFS, Hive, Pig and NiFi.
- Hands on experience in data ingestion using NiFi
- Created reusable NiFi templates to load the data from different source systems into Raw layer.
- Experienced on loading and transforming of large sets of structured and semi structured datafrom HDFS through Sqoop and placed in HDFS for further processing.
- Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis using HIVE.
- Involved in processing the data in the Hive tables using HQL high-performance, low-latency queries.
- Transferred the analyzed data across relational database from HDFS using Sqoop enabling BI team to visualize analytics.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Managing and scheduling Jobs on a Hadoop cluster using Airflow DAG.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive working knowledge of partitioned table performance tuning, compression related properties in Hive.
- Work with Data Engineering Platform team to plan and deploy new Hadoop Environments and expand existing Hadoop clusters.
Environment: Informatica,Teradata,Tableau, MicroStrategy 10.7,CDH 5.4.5, Hive1.2.1, HBase1.1.2, Flume1.5.2, MapReduce, Sqoop1.4.6, Spark,NiFi(Standalone Cluster), Nagios, Shell Script, Oozie 4.2.0, Zookeeper 3.4.6.
Confidential
Hadoop Developer
Responsibilities:
- Working on projecting involving migration of data from the mainframes to HDFS data lake and creating reports by performing transformations on the data put in the Hadoop data lake.
- Built python script to extract the data from the Hawq tables and generated a "dat" file for the downstream application
- Built a generic framework to parse raw data with fixed length using python which takes JSON
- Layout for the fixed positions of the strings and load the data into Hawq tables.
- Built generic framework that transforms two or more data sets in HDFS using python.
- Built generic frameworks for Sqoop/Hawq to load data from SQL server to HDFS and HDFS to
- Hawq using python.
- Performed extensive data validation using Hawq partitions for efficient data access.
- Built generic framework that allows for us to update the data in a Hawq tables using python.
- Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for the new application.
- Created automated workflows that schedule jobs daily for loading data and other transformation jobs in using CA-ESP.
- Developed functions using PL python for various use cases.
- Documented technical design documents and production support documents.
- Wrote python scripts to create automated workflows.
- Technology Platforms: PHD-2.0, HAWQ 1.2, SQOOP 1.4, Python 2.6, SQL
Environment: Informatica, Netezza,H adoop HDP 2.1, Oracle, SQL Server, Zookeeper3.4.6, Oozie 4.1.0, MapReduce, YARN,2.6.1, HDFS, Sqoop1.4.6, Hive 1.2.1, Pig 0.15.0.
Maersk Line - Charlotte, NC
ETL Lead
Responsibilities:
- Analyzing content and quality of databases, recommending data management procedures, and developing extraction/ ETL processes.
- Acted as an offshore coordinator and lead the team in offshore by providing them mapping documents and acted as a point of contact for the onsite team.
- Documented user requirements, translated requirements into system solutions and develop implementation plan and schedule.
- Responsible to migrate the Informatica code from one environment to another by creating the xml files using informatica repository manager.
- Developed informatica mappings to load the data into dimension and Fact tables.
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spreadsheets and staged into a single place and applied business logic to load them in the central oracle database.(Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Developed complex mappings in Informatica to load the data from various sources.
- Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Created procedures to truncate data in the target before the session run.
- Extensively used Toad utility for executing SQL scripts and worked on SQL for enhancing the performance of the conversion mapping.
- Created the ETL exception reports and validation reports after the data is loaded into the warehouse database.
- Written documentation to describe program development, logic, coding, testing, changes and corrections.
- Created Test cases for the mappings developed and then created integration Testing Document.
- Followed Informatica recommendations, methodologies and best practices.
Environment: Informatica 9.6,Oracle,OGG, Teradata, HP ALM,Teradata GCFR Frame Work, Teradata BI Temporal Framework, IBM WS MQ, SAP-BO-XI
FICO - San Jose, CA
ETL Developer
Responsibilities :
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spread sheets and staged into a single place and applied business logic to load them in the central oracle database.(Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Developed complex mappings in Informatica to load the data from various sources.
- Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Created Test cases for the mappings developed and then created integration Testing Document
Environment: Traid Importer, X-Book, Query center, Informatica9.1, SQL Developer, Oracle 11g, MySql
Confidential
ETL Developer
Responsibilities:
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spread sheets and staged into a single place and applied business logic to load them in the central oracle database.(Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.
- Developed complex mappings in Informatica to load the data from various sources.
- Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
- Created Test cases for the mappings developed and then created integration Testing Document.
Environment :Informatica9.1, SQL Developer, Oracle 11g, SQL Server 2005, Netezza, Sybase, UNIX
Confidential
ETL Developer
Responsibilities:
- Developed Informatica mappings to load the data into dimension and Fact tables.
- Analyzed the business requirements and functional specifications.
- Extracted data from oracle database and spread sheets and staged into a single place and applied business logic to load them in the central oracle database.(Warehouse)
- Used Informatica Power Center for extraction, transformation and load (ETL) of data in the data warehouse.
- Extensively used Transformations like Router, Aggregator, Normalizer, Joiner, Expression and Lookup, Update strategy and Sequence generator
- Parameterized the mappings and increased the re-usability.
- Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.
Environment: Informatica 9.1,8.6, SQL Developer, Oracle 11g, Netezza, QC, DAC, UNIX.
Confidential
ETL Developer
Responsibilities:
- Used Informatica to populate data into staging area and Warehouse, Operational data store.
- Created transformations and mappings including expression, aggregators, Filter router, joiner, and lookup.
- Experience of Slowly Changing Dimensions.
- Parameterized the mappings and increased the re-usability.
- Written documentation to describe program development, logic, coding, testing, changes and corrections.
- Created Test cases for the mappings developed and then created integration Testing Document. Followed Informatica recommendations, methodologies and best practices.
- Extensively used Toad utility for executing SQL scripts and worked on SQL for enhancing the performance of the conversion mapping.
Environment :Informatica8.6, Toad, Oracle 10g.