We provide IT Staff Augmentation Services!

Hadoop / Big Data Developer Resume

3.00/5 (Submit Your Rating)

Des Moines, IA

SUMMARY:

  • Around 6 Years of experience in information technology and Data Warehouse Development, Design, Mapping, Extraction, Migration, Data Conversion, Data validation and Development of ETLs
  • Experienced on major Hadoop ecosystems such as Sqoop, Spark, Hive, Kafka, HDFS etc.
  • Strong experience in Dimensional Modeling using Star and Snow Flake Schema, Identifying Facts and Dimensions
  • Expertise in understanding and building the various phases in life cycle of Data warehousing, Data Mart concepts
  • Experienced in writing and understanding of MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows
  • Very good understanding of the entire process flow in Cloud and AWS environment and successfully involved in entire SDLC process for Hadoop / Big Data project and its software applications using Ralph Kimball and Bill Inmon to solve the business problems
  • Strong Data Warehousing ETL experience of using Informatica PowerCenter Client tools - Designer, Source Analyzer, Target Designer, Transformation Developer, Mapping and Mapplet Designer along with Workflow Manager and Monitor tools
  • Experience and Familiarity working with Waterfall, Agile and Scrum methodologies
  • Expert in handling the various source schemes such as Flat files, DB2, MS SQL server, Excel, Oracle, Csv files, Teradata, XML files, VSAM, Cobol Files
  • Extensively worked in a custom migration project that virtually eliminates the complexity and workload associated with promoting a Markit EDM project through multiple environments
  • Excellent skills in communication, documentation, presentation using Microsoft PowerPoint, Outlook and Visio
  • Extensively used different features of Teradata such as BTEQ, Fastload, Multiload, TPT, SQL Assistant, DDL and DML commands
  • Designed and developed Slowly changing dimension types (I, II & III) methodologies and change data capture solutions (CDC) for the project, which captures and analyses changes from daily feeds to maintain history tables
  • Above all, team spirit, zeal for exploration and willing to quickly learn and adapt to new technologies

TECHNICAL SKILLS:

Operating Systems: Windows XP, 7, 8.1, Unix/Linux, Mac OS

Databases & Tools: Teradata 13.0, Big Data, Oracle 11g, 10i, Microsoft SQL Server, SQL plus, SQL developer, Toad, Netezza, MySQL, Hive QL, Sqoop, Kafka

Languages: C, C++, Java, XML, SQL, PL/SQL, Perl

Software Tools: MS Excel, Word, Power Point, Visio, Outlook, E cl i ps e, Net Beans

Data Warehousing/ETL: Informatica PowerCenter 9.x/8.x (Workflow Manager, Workflow Monitor, Source Analyzer, PowerCenter Designer, Transformation Developer, Mapplet Designer, Mapping Designer, Repository manager), Datamart, OLAP, OLTP, Power Connect, Markit EDM, Dataporter, DataConstrutor, Rule Builder, etc.

Application Tools: Service Now, CA Work bench, SVN version control, Git-Hub

Scheduling Tools: Informatica Workflow Manager, Autosys, ESP CA7, Control-M, Oozie

PROFESSIONAL EXPERIENCE:

Confidential, Des Moines, IA

Hadoop / Big Data developer

Responsibilities:

  • Works directly with the Big Data Architect and actively creating the foundation for the Enterprise Analytics initiative in a Hadoop-based Data Lake
  • Involved in setting up the environment and configuring of AWS cluster-node management.
  • Used Kafka and Sqoop as data ingestion tools for loading unstructured / structured data into the Hadoop cluster respectively
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language
  • Automated all the jobs for extracting the data from different Data Sources like MySQL / UNIX to pushing the result set data to Hadoop Distributed File System which is installed AWS using Oozie Workflow Scheduler.
  • Involved in managing and reviewing the Hadoop log files
  • Experienced with handling different optimization techniques in Hive joins.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Created HBase tables to store various data formats of data coming from different portfolios.
  • Used AVRO, Parquet file formats for serialization of data.
  • As a developer, my responsibilities also include Data cleansing, data quality tracking and process balancing checkpoints

Environment: Hadoop 2.x, AWS, YARN, Map-Reduce, Hive, Pig, Sqoop, Flume, Kafka, Oozie, HBase, Teradata.

Confidential

ETL / BI Developer

Responsibilities:

  • Involved in understanding and evaluating the business requirements
  • Worked with data modelers in end to end analysis of EDW to prepare the business rules
  • Developed complex SQL queries to extract the latest policy and claim information for all lines of business from Foundation tables
  • Increased the performance of the data loads by incorporating temporary tables and decreased the lookup cache sizes
  • Created complex Informatica mappings for Extracting the data from Teradata , transform as per business requirements and generate XML files of specified size
  • Designed and developed complex ETL mappings making use of transformations like Source Qualifier, Joiner, Update Strategy, Connected and unconnected Lookup, Rank, Expression, XML Parser, XML Generator, Router, Filter, Aggregator and Sequence Generator transformations
  • Extensively worked on XML generator transformation with the business provided XSD definition
  • Implemented the Transaction control transformation to spit the large XML files into small 5MB files as per the requirement
  • Written Perl and UNIX Shell scripts for file manipulation, reconciliation and balancing processes and to schedule daily ESP jobs.
  • Managed Change control implementation and coordinating daily, monthly releases and reruns.
  • Involved in setting up the FTP process to send XML files to the AuSuM systems

Environment: Informatica 9.6, Oracle 11g, Flat Files, Teradata 13.0, Oracle 11g, XML Files, HP-UNIX 10.2, PL/SQL, Service-Now, CA WA workstation ESP edition

Confidential

Data warehouse Developer

Responsibilities:

  • Involved in understanding the JMS queue functionality and successfully implemented the XML source qualifier transformation to read data from JMS to Informatica
  • Extracted data from various source systems like XML, fixed width and de-limited flat files of commercial property and auto information and loaded into RDBMS TERADATA Data Warehouse
  • Extensively worked on Mapping Variables, Mapping Parameters, Workflow Variables, Informatica PowerExchange and Session Parameters
  • Wrote, tested and implemented Teradata Fastload, Multiload and Bteq scripts, DML and DDL
  • Extensively optimized all the Informatica sources, targets, mappings and sessions by finding the bottlenecks in different areas and debugged some existing mappings using the Debugger to test and fix the mappings
  • Extensively worked with Teradata utilities like BTEQ, Fast Export, Fast Load, Multiload to export and load data to/from different source systems including flat files, VSAM, Cobol Files
  • Effectively understood session error logs and used debugger to test mapping and fixed bugs in DEV in following change procedures and validation
  • Involved in writing UNIX shell script for automating the batch jobs.
  • Worked with Developers in troubleshooting and optimizing SQL queries in Teradata and resolve mapping / workflow / session logic, as well as performance issues in Development, Test and Production repositories

Environment: Informatica 9.6, Oracle 11g, Flat Files, Teradata 13.0, Oracle 11g, XML Files, HP-UNIX 10.2, Altova XMLSpy, Service-Now, CA WA workstation ESP edition

Confidential, Stamford, CT

ETL Developer

Responsibilities:

  • Evaluating business requirements, Technical specification, source repositories and physical data models for ETL mapping and process flow.
  • Used Markit as ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into databases.
  • Worked with various databases such as MS SQL Server and Oracle.
  • Worked on various components of Markit EDM tool such as Data Porter, Constructor, Inspector, Rule Builder, Matcher and Solution.
  • Created workflows and user interfaces to manage all entity masters, process monitoring , exception handling , reference data, and exports.
  • Worked in implementing the consolidation of reporting fields for different trade data coming from multiple channels.
  • Involves in migrate the application executables to UAT and PROD environments, also support the UAT testing by doing necessary fixes.
  • Expertise in Debugging and Performance tuning of targets, sources, mappings and sessions. Managed Scheduling of Tasks to run any time without any operator intervention.
  • Worked in analyzing the portfolio asset management reporting data along with the clients
  • Used AppWorx in scheduling and running the jobs in Markit and with control-M in Informatica for daily and monthly loads.

Environment: Markit EDM 10.2.2.1, SQL Server, Oracle 11g, Informatica 9.5.1, MS SQL, flat files, XML files

Confidential, CA

Informatica Developer

Responsibilities:

  • Converted the data mart from Logical design to Physical design, defined data types, Constraints, Indexes, generated Schema in the Database, created Automated scripts, defined storage parameters for the objects in the DB2 Database
  • Created complex mappings using Unconnected Lookup, Sorter, Aggregator, newly changed dynamic Lookup and Router transformations for populating target table in efficient manner.
  • Performance tuning of the Informatica mappings using various components like Parameter files, variables and dynamic Cache
  • Performance tuning using round robin, hash auto key, Key range partitioning
  • Designed and developed Oracle PL/SQL scripts for Data Import/Export
  • Managed Change control implementation and coordinating daily, monthly releases and reruns
  • Involved in design and implementation and modifying the ruby code in Linux environment

Environment: Informatica Power Center 9.1, Oracle 10g, MS SQL Server, Excel, Flat files, SQL Developer.

Confidential

Informatica Developer

Responsibilities:

  • Responsible for using Informatica parameters and variables in effectively developing mappings and change capture process
  • Extracted the data from SQL Server, Oracle into Data warehouse
  • SQL Loader was used for loading data based on the volume of source data
  • Responsible for designing, developing and implementing the ETL process which include the incremental load strategy, full load and disaster recovery
  • Developed automated scripts for migration of the ETL processes from one environment to the other.
  • Developed complex transformations like connected/unconnected lookups, Type 1, Type 2 Slowly Changing Dimensions
  • Responsible for developing Mapplet to implement the reusable logic
  • Responsible for troubleshooting the Informatica tasks and performance related issues and data issues using the session logs, debugger and performance indicators provided by Informatica
  • Coordinated with System Operators to Schedule the batch jobs in control M

Environment: Informatica 8.5.1, 7.x, Oracle 10g, Erwin 4.0, SQL, PL/SQL, Netezza, TOAD and Windows NT/2000, Control M

We'd love your feedback!