We provide IT Staff Augmentation Services!

Lead Hadoop Developer Resume

0/5 (Submit Your Rating)

TX

SUMMARY

  • Over 9 years of experience in Information Technology with a strong background in Data warehousing and Three plus years of experience as a Lead Hadoop Developer.
  • Extensively worked with large Databases in Production Environments.
  • Well versed with BigData solution planning, designing, development and POC's.
  • Experience in installation, configuration, support and management of a Hadoop Cluster.
  • Excellent understanding / knowledge of Hadoop architecture and core components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map - Reduce (including Yarn) programming paradigm.
  • Experience working with Hadoop Stack - MapReduce, HDFS, Pig, Hive, Impala, Sqoop, Flume, Tez, Storm, Oozie, HBase, Apache Kafka, Storm, Spark and installing and configuring using Ambari/Hue.
  • Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
  • Experience in NoSQL Column-Oriented Databases like HBase, Apache Cassandra and its Integration with Hadoop cluster.
  • Experienced in writing custom MapReduce programs & UDF's in Java to extend Hive and Pig core functionality.
  • Worked with Sqoop to move (import/export) data from a relational database into Hadoop and used FLUME to collect web data and populate Hadoop.
  • Experience working with Hadoop clusters using Cloudera and HortonWorks distributions.
  • Knowledge in Data Modeling and strong in Data warehousing concepts, Dimensional Star Schema and Snowflake Schema methodologies.
  • Experience in Dimensional Layer Modeling (Facts, Dimensions, Business Measures and Grain of the Fact etc.), Entity-Relationship Modeling and in-depth knowledge of Complex Data Models of Relational, Star, and Snowflake schema.
  • Comprehensive knowledge and working experience with relational databases MySQL, Oracle, Teradata and DB2.
  • Strong statistical, mathematical and predictive modeling skills and experience.
  • Extensive experience in implementation of Data Cleanup procedures, transformations, Scripts, Stored Procedures and execution of test plans for loading the data successfully into the targets.
  • Expertise in developing SQL and PL/SQL codes through various procedures, functions, and packages to implement the business logic of database in Oracle.
  • Knowledge in UNIX Shell Scripting and Perl Scripting.
  • Strong knowledge of Software Development Life Cycle (SDLC) including requirement analysis, design, development, testing, and implementation. Provided End User and Support.
  • Experience in Agile Methodology, Management tracking and bug tracking using JIRA. Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
  • Experience in working along with senior management to identify strategic goals, objectives, scope, budgets and process for internal and external development and to manage risk analysis and mitigation plans, status reports, and client presentations.
  • Flexible, enthusiastic and project oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, HDFS, Flume, Sqoop, Yarn, Pig, Hive, MapReduce, MRv1 Classic, MRv2 Yarn, Oozie, Python, Apache Spark, Kafka, Storm, Tez.

Programming: Java, C, C++, HTML, Perl and Shell Programming, Python, SQL, PL/PLSQL.

Databases: Oracle 7.x/8/9i/10g, Teradata V2R6, MS SQL Server 2000/2005, IBM DB2 8.x.

Reporting Tools: Crystal Reports 6.5, Cognos 8.0, Reportnet, OBIEE 11g, Tableau.

IDE & Build Tools: Maven, Eclipse.

Operating System: Red Hat Linux (6.x/5.x/4.x), Microsoft Windows XP/7, IBM-AIX, HP-UX

Web Servers: Java WebServer2.0, Netscape Enterprise Server, Web Logic 6.0.

Migration/Replication: Sqoop, Hive, DataPump, GoldenGate, BCP, SQL Loader.

Data Warehousing: Informatica PowerCenter 9.0/8.x/7.x, (Source Analyzer, Warehouse Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Repository Manager, Workflow Manager, Workflow Monitor and Informatica Server), Datastage, Autosys, Sqlplus, TOAD

PROFESSIONAL EXPERIENCE

Confidential - TX

Lead Hadoop Developer

Responsibilities:

  • Worked with Line of Business and stake holders to gather requirements, implement and test Big Data based analytical solution.
  • Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Cluster maintenance as well as creation and removal of nodes.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
  • Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
  • Developed Java MapReduce programs on log data to transform into structured way.
  • Written the Apache PIG scripts to process the HDFS data, and extending Hive functionality by writing custom UDF’s.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Additionally, took the responsibility of Hadoop Administrator, which included managing the cluster, Hadoop Ecosystem Upgrades, Cloudera Manager Upgrades and installation of tools that that uses Hadoop ecosystem.
  • Created, updated and maintained ETL technical documentation.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Data Migration from Production Database (Oracle) into Hadoop cluster in the lab environment using Sqoop.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Flume, Cloudera, Oozie, Apache Spark, MySQL, UNIX, Core Java, Oracle.

Confidential - Charlotte, NC

Lead Hadoop Developer

Responsibilities:

  • Implemented best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
  • Effectively used Sqoop to transfer data between databases (RDBMS) and HDFS.
  • Designed workflow by scheduling Hive processes for Log file data, which is streamed into HDFS using Flume.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Pig Latin scripts to extract the data from the mainframes output files to load into HDFS.
  • Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Written Hive queries for analyzing and reporting purposes of different streams in the company.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
  • Use Avro serialization technique to serialize data. Applied transformations and standardizations and loaded into HBase for further data processing.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Exported the analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
  • Documented all the requirements, code and implementation methodologies for reviewing and analyzation purposes.
  • Used Sqoop extensively to ingest data from various source systems into HDFS.
  • Written Hive queries for data analysis to meet the business requirements.
  • Created Hive tables and worked on them using HiveQL.
  • Installed cluster, worked on commissioning & decommissioning of DataNode, NameNode recovery, capacity planning, and slots configuration.
  • Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured).
  • Installed and configured Flume, Sqoop, Pig, Hive, and HBase on Hadoop clusters.
  • Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
  • Wrote test cases in JUnit for unit testing of classes, documented Unit testing, Logged and resolved defects in the roll out phase.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Flume, Cloudera, Oozie, MySQL, UNIX, Teradata, Oracle 11g, HP Quality Center and Application Lifecycle Management.

Confidential - Scranton, PA

Senior Informatica Developer

Responsibilities:

  • Extensively involved in almost all the phases of Project life cycle right from Requirements Gathering to Testing and Implementation etc.
  • Created Mappings to move from Various Systems into the Data Warehouse.
  • Created different Transformations for loading the data into target like Source Qualifier, Joiner, Update Strategy, Connected Lookup and unconnected Lookup, Rank, Expression, Router, Filter, Aggregator and Sequence Generator transformations.
  • Lead an Off-Shore team in Developing Mappings and Mapplet to extract Claims, Policies and Customer data from flat files and load into the Data Warehouse.
  • Used Lookup Transformation to access data from tables, which are not the source for mapping and also used Unconnected Lookup to improve performance.
  • Created reusable transformations and Mapplets to use in multiple mappings.
  • Created, scheduled and monitored sessions and batches on the Informatica server using Informatica Workflow manager.
  • Used parallel processing capabilities, Pipeline-Partitioning and Target Table partitioning.
  • Implemented Slowly Changing Dimensions (SCDs, Both Type I & II).
  • Modified the existing Mappings and created new Mappings as per the requirement.
  • Implemented performance tuning logic on Targets, Sources, mappings, sessions to provide maximum efficiency and performance.
  • Defined Target Load Order Plan for loading data into Target Tables.
  • Experience in using Debugger in Informatica Designer tool to test and fix errors in the mappings.
  • Created, updated and maintained ETL technical documentation.
  • Monitored workflows Using Workflow monitor.
  • Performed Unit testing of mappings and sessions.
  • Used DataStage as an ETL tool to extract data from sources systems, loaded the data into the IBM DB2 database.
  • Knowledge of Repository manager and Repository Server Administration Console
  • Developed and customized Cognos Impromptu Reports and queried from different database tables as per requirement. Also, built Multi-dimensional cubes using Cognos Transformer.

Environment: Informatica PowerCenter 9.0, IBM Web Sphere DataStage 8.1, Omniplus 5.95, Cognos Reportnet, Cognos Impromptu/PowerPlay (7.1), Oracle 9i, Windows 2003 Server, TOAD, Erwin 4.1, HP Quality Center and Application Lifecycle Management, Linux, Toad.

Confidential - Charlottesville, VA

ETL Developer / Data Warehousing Developer

Responsibilities:

  • Interacted with users to capture, analyze business needs and user’s strategic information requirements.
  • Involved in data design and modeling by specifying the physical infrastructure, system study, design, and development by applying Ralph Kimball methodology of dimensional modeling and using ERwin.
  • Performed source data assessment, validation and identified the quality and consistency of the source data.
  • Developed new mappings and enhanced existing mappings to meet the Business requirements.
  • Worked with various transformations such as Source Qualifier, Aggregator, Update Strategy, Filter, Router, Expression, Look-up, Sequence Generator and Joiner.
  • Built Informatica Mappings to automate Business Process and reduce manual intervention.
  • Worked extensively on Informatica Designer to create Mapplets and Reusable transformations.
  • Extensive experience with Data Profiling.
  • Defined Target Load Order Plan for loading Target when control table logic is used.
  • Configured the sessions using Workflow manager to have Multiple Partitions on Source data and to improve performance.
  • Experience in using Debugger in Informatica Designer tool to test and fix errors in the mappings.
  • Hands on experience in performance tuning, identifying Bottlenecks at various levels of SDLC lifecycle for optimal performance.
  • Handled data exceptions in Informatica using Reject Loader utility.
  • Hands on experience in UNIX shell scripting being used at various levels of Informatica development.
  • Scheduled the Jobs using Maestro Scheduler tool.
  • Prepared Unit test cases and Integration Testing documents.
  • Implemented Unit testing, System Integrated testing and User Acceptance testing to test and deploy the Informatica jobs.
  • Created List Reports, Interactive Dashboards, Drill Through reports and Ad hoc reports using OBIEE.

Environment: Informatica 8.1/7.1,OBIEE, HP- UNIX, Oracle 10g, Windows XP, TOAD 8.0.6, Reflection (Maestro Tool) 14, Rational Clear Case, Erwin 4.1.

Confidential - TX

ETL/BI Reports Developer

Responsibilities:

  • Assisted in creating fact and dimension table implementation in Star Schema model based on requirements.
  • Involved in gathering Business requirements and translating into technical specifications for reporting purposes.
  • Extensively used Informatica Designer to develop various mappings to extract, cleanse, transform, integrate and load data into Oracle tables.
  • Developed Informatica Mappings using Lookup, Aggregator, Router, Stored Procedure, Union and Sequence Generator Transformations.
  • Involved in performance tuning of Informatica mappings using various components like Parameter files, Variables and Dynamic Cache. Also used round robin, hash auto key and key range partitioning.
  • Migrated mappings and workflows from development server to test server to perform integration and system testing.
  • Automated execution of workflows and sessions associated with the mappings using Batch Files.
  • Implemented Data Warehouse techniques like Star-Schema and Snowflake schema based on the environment.
  • Automated various reports like Asset Summary level, Equity Details, Account history, Performance summary, Gain/Loss trade detail, Profit/Loss details and summary level reports for Client Users.
  • Developed PL/SQL Procedures, Packages and Functions for the input of data into Crystal Reports as per business logic.
  • Involved in monitoring BO Server and System usage, BO Security design and implementation, BO Central Management Console, BO Info View.

Environment: Informatica Power Center 8.6, DataStage 7.5, Oracle 9i/10g, MS Access, SQL Server 2005, Crystal Reports 10, Business Objects XI, Eclipse, Java, UNIX, PVCS.

Confidential, Manhattan, New York

Technical Specialist

Responsibilities:

  • Designed and developed UNIX shell scripts as part of the ETL process to automate the process of loading, pulling the data for testing ETL loads.
  • Written several shell scripts using UNIX Korn shell for file transfers, data archiving, error log creations and log file cleanup process.
  • Developed and Tested UNIX shell scripts as part of the ETL process, automate the process of loading, pulling the data.
  • Written several complex PL/SQL statements for various business scenarios.
  • Loaded data from operational data store (ODS) to data warehouse tables by writing and executing foreign key validation programs to validate where exactly star schema appears, with fact tables and dimensions/lookup tables.
  • Writing Triggers enforcing Integrity constraints, Stored Procedures for Complex mappings, and cursors for data extraction.
  • Worked extensively with mappings using expressions, aggregators, filters, lookup and procedures to develop and feed Data Mart.
  • Did data parsing, text processing and connecting to the database using PERL.
  • Developed UNIX Shell scripts to automate repetitive database processes
  • Tested several ETL routines and procedures.
  • Identify the primary key (logical / physical ) and put update or insert logic
  • Deleting the target data before processing based on logical or physical primary key
  • Design and execute the test cases on the application as per company standards
  • Preventing occurrences of multiple runs by flagging processed dates
  • Written Teradata MLOAD, FLOAD, BTEQ, TPUMP, FEXPORT & Case statements.
  • Tuned database and SQL statements and schemas for optimal performance.
  • Expertise in SQL queries for the cross verification of data.

Environment: Oracle 7.0, SQL*Plus, SQL, Test Director, SQL Server 2000, T-SQL, SQL, PL/SQL, Visual Basic 6.0, Windows 95, XML, XSLT, XSD, UNIX, Korn Shell Scripting, PERL, MVS, JCL, ISPF, VSAM Files, OS/390.

We'd love your feedback!