Senior Hadoop Developer Resume
Tampa, FloridA
SUMMARY
- 8 years of extensive Professional IT experience, including 2.5 years of Hadoop/Big data experience, capable of processing large sets of structured,semi - structuredand unstructured data and supporting systems application architecture.
- 2.5 years of experience in Big Data Analytics using HDFS, HIVE, FLUME, SQOOP, GPLOADER, HBase, HUE, Linux and Python automation scripting, and Informatica BDE.
- Over 2 years of Data warehousing and ETL experience using Informatica Power Center 9.6.1/9.5.1/9.1.0/8.6.1/8.1/7.1, Cloud Integration and IDQ as an Analyst.
- Experience in Importing and Exporting the Data using SQOOP from HDFS to Relational Database systems.
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Java into Pig Latin and HQL (Hive QL).
- Experience in Streaming the Data to HDFS using Flume.
- Experienced in defining work flows with Oozie.
- Expertise in writing ETL Jobs for analyzing data using Pig.
- Expertise in Linux shell scripting (ksh, sh), Python scripting, DOS scripting, job scheduling in CRON, Control-M and IBM Workload Manager.
- Experience with FTP/Sftp/Scp for transfering files between various systems.
- Extensive Database experience using Oracle 11g/10g/9i/, Teradata and MS SQL Server.
- Responsible for all activities related to the development, implementation, administration and support of ETL processes for large data warehouse using Power Center.
- Experience on using Informatica command line utilities like pmcmd and pmrep.
- Extensively worked on Informatica Designer components Source Analyzer, Target Designer, Transformation Developer, Mapping Designer and Mapplet Designer.
- Extensively worked on Informatica Power Center Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Joiner, Update Strategy, Rank, Aggregator, Stored Procedure, Transaction Control, Java, SQL, Sorter, and Sequence Generator.
- Proficient in Data warehouse design based on Ralph Kimball and Bill Inmon methodologies.
- Extensively designed and developed Slowly Changing Dimension SCD 1, 2 & 3 mappings.
- Good experience in Informatica and SQL Performance Tuning.
- Strong hands on experience using Teradata standalone loader utilities like FastExport, FastLoad, MultiLoad, Tpump and TPT) and BTEQ scripts.
- Maintained outstanding relationship with Business Analysts and Business Users to identify information needs as per the business requirement.
- Followed waterfall and agile methodologies with scrum process.
- Excellent written and communication skills, analytical skills with the ability to perform independently as well as in a team.
- Proven ability in defining goals, coordinating teams and achieving results.
TECHNICAL SKILLS
Hadoop/Ecosystem: Cloudera CDH5, HDFS, HIVE, SQOOP, HUE, Flume, PIG, HBASE, Spark, HDFS File system commands.
LANGUAGES: SQL, PL/SQL, Java, C, C++
Database: Oracle, DB2, Teradata and MS SQL Server
Methodologies: Agile, waterfall.
BI TOOLS: Informatica 7x, 8x and 9x, Informatica BDE
Operating Systems: Windows Server 2003/2000/XP, Windows 7, 8,UNIX, Linux 6.5/6.7
Tools: MS Office, TOAD, SQL Developer, SQL Assistant, SharePoint, AutoSys, WIT, JIRA.
PROFESSIONAL EXPERIENCE
Confidential, Tampa, Florida
Senior Hadoop Developer
Responsibilities:
- Hands on experience working on Hadoop echo system using HDFS, HIVE, Sqoop, Flume, HBase, HUE, Spark, Storm and Linux/Python data movement and data validation scripts.
- Creating Hive tables, loading data and writing hive queries.
- Hands on experience in writing complex queries in Hive QL andGreenplum
- Built and maintained standard operational procedures for all neededGreenplumimplementations.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Partitioning, Bucketing in Hive for better organization of the data.
- Developed Sqoop commands to pull the data from Teradata, Oracle and export into Greenplum
- Created Linux, Python scripts for file validation, data movement and file archival.
- Gather the requirements from users and by analysis of current systems.
- Prepare the impact analysis document and high-level design for the requirements.
- Create probabilistic models for the classification of data.
- Coordination of deployment across SIT, Preproduction and Production environments.
- Worked with business to prioritize and Implement the change requests.
Environment: Cloudera CDH5, HDFS, HIVE, HUE, SQOOP, Flume, PIG, HBase, Oozie work flow, Autosys, JIRA, Informatica Big Data Edition, Teradata, Oracle, WLM and Linux.
Confidential, Texas
Sr. ETL Lead Developer
Responsibilities:
- Conducting interviews with various business users in identifying and capturing business requirements.
- Proposed and documented the solutions to ensure all the required objectives are met.
- Interpret measurement definitions and perform data decomposition. Perform Gap analysis.
- Perform volumetric analysis, database sizing and data profiling activities.
- Preparation of data flow models, high level technical design documents.
- Worked with Data modeler in developing STAR Schemas and Snowflake schemas.
- Preparation of coding standards documents in line with the enterprise architecture.
- Work with the project manager to prepare and revise the work hour estimates.
- Lead and mentor a team of 6 offshore resources during the development and test phases.
- Assist the Data Stewards in updating Data Dictionary / Metadata for the implemented changes in Data Warehouse.
- Creation of complex Informatica mappings to meet functional and performance objectives.
- Extensively used push down optimization techniques to improve the performance.
- Designed anomaly handling logic to ensure bad records are reported and corrected.
- Created numerous materialized views for handling data replication in an effective manner.
- Created shell scripts for managing and scheduling the Informatica workflows.
- Developed mappings to load into staging tables and tan to Dimensions and Facts.
- Responsible for code migration across different environments during the lifecycle.
- Closely worked with DBA for creating the indexes, partitions etc. to avoid performance bottlenecks
Environment: Informatica Power center 8.6, Golden Gate, Teradata, Oracle, Control-M and UNIX.
Confidential, Richmond
ETL Developer
Responsibilities:
- Converted business requirements into technical design documents.
- Profile various source systems and validate the mapping document.
- Created numerous ETL mappings using Informatica to load data into data warehouse system.
- Implemented Dynamic Lookup Transformation for change data capture
- Designed reusable modules where in the data quality checks like numeric and date checks are performed prior to load
- Created Informatica workflows using various tasks like command, decision, email, control and File watcher.
- Created shell scripts to send email notifications for reconciliation reporting.
- Used Debugger to test and determine the logical errors in the mappings.
- Implemented complex slowly changing dimensions like SCD2 and hybrid versions.
- Involved in Performance tuning at source, target, mappings, sessions, and system level
- Involved in code reviews to ensure the compliance of coding standards.
- Worked closely with SIT and UAT testing teams for data validations
- Involved in project releases, configuration management and migration activities.
- Created Oracle Stored Procedures, Packages to implement complex logics.
- Created Oracle Triggers to populate audit columns for tracking any DML operations.
- Created job schedules to perform initial loads to the new warehouse platform.
- Involved in base lining the code for migration to higher environments.
Environment: Informatica Power center 8.6, SQL Server, Oracle, AutoSys and Linux.
Confidential
Support Analyst
Responsibilities:
- Perform deep-level troubleshooting on escalation from Level 1 and Level 2.
- Perform root cause analysis on recurring issues.
- Document job run logs, dependencies and their schedules.
- Perform SQL score carding, optimization and performance tuning.
- Perform monthly and quarterly maintenance releases.
- Find and fix data issues to support other vendor requests.
- Create job aids and solution scripts.
- Attend the production handover calls to validate deliverables.
- Responsible for implementing changes and enhancements to the applications
- Assist data governance and compliance teams.
Environment: Informatica Power center 7.1, AbInitio, Oracle, CRON, Windows and Linux.
Confidential
ETL Analyst
Responsibilities:- Extensively used ETL to load data from Flat Files, XML, Oracle to oracle 8i
- Involved in Designing of Data Modeling for the Data warehouse
- Involved in Requirement Gathering and Business Analysis
- Developed data Mappings between source systems and warehouse components using Mapping Designer
- Worked extensively on different types of transformations like source qualifier, expression, filter, aggregator, rank, update strategy, lookup, stored procedure, sequence generator, joiner, XML.
- Involved in the performance tuning of the Informatica mappings and stored procedures and the sequel queries inside the source qualifier.
- Involved in the Performance Tuning of Database and Informatica. Improved performance by identifying and rectifying the performance bottle necks.
- Used Server Manager to schedule sessions and batches.
- Involved in creating Business Objects Universe and appropriate reports
- Wrote PL/SQL Packages and Stored procedures to implement business rules and validations.
Environment: Informatica 7.1.3, ORACLE 10g, UNIX, Windows NT 4.0, UNIX Shell Programming, PL/SQL, TOAD Quest Software