Big Data Consultant/ Data Engineer Resume
SUMMARY:
- Over all 11+ years of IT experience in Analysis, Design, Data Modelling, Development, Maintenance, Testing and documentation.
- Having Proficient experience in various big data technologies like Hadoop, Apache Nifi, Hive Query Language, HBase NoSQL database, Sqoop, Spark, Scala, OOZIE and Pig etc.), Oracle Database and Unix shell Scripting technologies.
- Good Knowledge on Python, Spark and Scala.
- Implemented Enterprise Data Lakes using Apache Nifi.
- Good Knowledge on Apache Kafka and Apache Phoenix.
- Implemented Big Data solutions using Hortonworks Sandbox and AWS platform.
- Very strong in ETL - Informatica PowerCenter, Data Validation Option (DVO) for PowerCenter, Database Design and Programming.
- Good experience in developing Perl, UNIX scripts, Windows batch scripts.
- Experienced in SAP-Business Objects, Informatica Data Quality, Informatica MDM, Informatica Power Exchange, Data Migration and Data Warehousing.
- Strong base in Financial (Credit Risk, Recovery), Banking and Insurance domains.
- Experience in understanding the Business requirements and translating them into detailed design along with Technical specifications.
- Developed several complex custom mappings in Informatica a variety of Power Center transformations, Mapping Parameters, Mapping Variables, Mapplets & Parameter files in Mapping designer using the Informatica PowerCenter
- Experience in Informatica Metadata and Repository Management, was directly responsible for the Extraction, Transformation and Loading of data from Multiple Sources to Data Warehouse.
- Having experience in Informatica IDQ 9.6.1 for Analysis, data cleansing, data matching, data conversion, exception handling, reporting and to complete initial data profiling, matching/removing duplicate data. Implemented Address doctor validation using IDQ.
- Experience in Virtual Source Safe (VSS), Clear Case and TFS for version controlling and promoting the code to higher region.
- Extensive experience in Production Support participating with disciplines of supporting the IT systems/applications which are currently being used by the end users. Involved in performance tuning.
- Involved in Informatica upgrade from 8.x to 9.x and 9.x to 10.x
- Strong experience working with different RDBMS like Oracle, SQL Server, DB2 and with different file systems like Flat Files, COBOL VSAM files and XML Files both as Source and as well target.
- Checking data for inconsistencies through Data Validation Option (DVO) which is a ETL Testing tool which comes along with Power Center and enables to test and validate data.
- Created the Unit Test Case document and performed various testing like Unit Testing (UT) and System Integration Testing (SIT).
- Created the Deployment Plan and made use of versioning tool ClearCase.
- Hands on experience in all aspects of Software Development Life Cycle (SDLC), Agile Methodology.
- Excellent analytical, problem solving, and communication skills with ability to interact with individuals at all levels.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop Framework (HDFS, Map Reduce implementation), YARN,Spark, Apache Nifi, Scala, Kafka, Pig, Hive, HBase, Sqoop, Storm, Impala.
ETL: Informatica PC 8x & 9x, Informatica Data Quality (IDQ), Informatica MDM,Informatica DVO, Power Exchange Express CDC
Databases: Oracle v8-10g/11g, SQL Server 2008 and 2012, Teradata, Netezza, DB2
Languages: Java, Scala, Python, PL/SQL, SQL, UNIX Shell Scripting, Batch Scripting, C, C++,Microsoft.Net Framework 4.0, Ant, Nant scripts.
Methodologies: SDLC, Agile
Tools: /Applications: SAP PowerDesigner, Erwin, Clearcase, Remedy, JIRA, Toad, Zena, Autosys, Tidal,Control-M, SQL*Loader, Dataflux, Version One, FileZilla, Putty, Winscp, Ambari
Reporting Tools: SAP Business Objects
Environment: Windows 95/98/NT/2000/XP, UNIX, Linux
PROFESSIONAL EXPERIENCE:
Confidential
Big Data Consultant/ Data Engineer
Responsibilities:
- Implemented python scripts for streaming the data.
- Implemented service using Sprint Boot framework.
- Implemented the service by calling rules engine service, MariaDB database service which is present is pivotal cloud foundary(pcf) hosted on AWS platform and finally writing the request and response messages through Kafka into a hive tables by performing streaming.
- Performed load testing using JMeter.
- Implemented unit testing and integration testing.
- Implement the Kafka to hive streaming process flow and batch loading of data into MariaDB using Apache Nifi.
- Implemented Batch and Real-time ingestion and integration.
Environment: HDFS, Java, hive, Python scripting, putty, secure CRT, Spring Boot, MariaDB, Apache Kafka, AWS Cloud, hortonworks sandbox, Spark, Linux and Windows Operating system etc.
ConfidentialBig Data Consultant/ Data Engineer
Responsibilities:
- Analyzed the user requirements and implemented the use cases using Apache Nifi.
- Implemented data quality checks using Hive QL.
- Read the source data from ETL grid and process the data using Pig Scripts and Hive with underlying storage as HDFS.
- Converted the raw data to hive understandable format.
- Implemented unit testing and integration testing.
- Implement end-end data flow using Apache Nifi.
- Implemented custom nifi processors using java for cataloguing the HDFS metadata into a database.
- Implemented python scripts and shell scripts for remote script execution and integration between different systems.
- Implemented Batch Processing.
Environment: HDFS, Java, Pig, hive, Shell scripting, Perl scripting, Python scripting, putty, secure CRT, hortonworks sandbox, Unix, Linux and Windows Operating system etc.
ConfidentialBig Data Developer
Responsibilities:
- Analyzed the user requirements and implemented use cases in pig and hive.
- Implemented data validations using Linux shell scripting.
- Designed process flow and integration diagrams.
- Completed unit testing and integration testing.
- Created Workflow jobs by using oozie.
- Implemented SQL queries for reporting.
- Using Hbase as the storage mechanism to store the processed data.
Environment: Hadoop Framework, Java, Pig, hive, Shell scripting, putty, winscp, hortonworks sandbox, oozie, Linux and Windows Operating system etc.
ConfidentialInformatica Designer &Technical Lead
Roles & Responsibilities:
- Drive key initiatives/projects to completion and ensure deliverables meet business needs.
- Involved in gathering business requirements and attended technical review meetings to understand the data warehouse model.
- Responsible for end-to-end ETL and Database design, modelling and Informatica design.
- Developed technical specifications of the ETL process flow.
- Interact with upstream and downstream application teams.
- Provide on-going feedback and status of projects to IT and business areas, identifying any variances to plan and proposing solutions and alternatives for the variances.
- Provided technical leadership to the team in order to produce system designs for the framework and code that are scalable, robust, reusable and flexible.
Environment: Informatica 9.5.1, PL/SQL, SQL Server 2008,2012, IBM Netezza, Informatica Data Quality(IDQ), Power Centre DVO, Zena, Toad, Remedy, HP Quality Center.
ConfidentialSr ETL Developer /Project Lead
Roles & Responsibilities:
- Designed Source to Target Mapping Specification Documents.
- Extensive experience in developing complex mappings from varied transformations like Source Qualifier, Connected and Unconnected lookups, Router, Filter, Sorter, Expression, Aggregator, Joiner, Union, Update Strategy, Sequence Generator, Java Transformation etc.
- Maintained Type II Slowing Changing Dimensions using Lookup Transformations and Update Strategy Transformations.
- Extensive use of SQL and PL/SQL in developing complex Stored Procedures and Functions
- Coordination with Admin Team for deploying code in all upper environments.
- Created mappings for various sources like Oracle, SQL server, flat files and Oracle Database to load and integrate data to warehouse.
- Developed various mappings, Reusable transformations and validated the ETL logic coded into mappings.
- Implemented lookups and different transformations in the mappings.
- Have implemented SCD (Slowly Changing Dimensions) Type I and II for data load.
- Worked with creating Dimensions and Fact tables for the data mart. Created Informatica mappings, sessions, workflows, etc., for loading fact and dimension tables for data mart presentation layer.
- Extensive use of SQL and PL/SQL in developing complex Stored Procedures and Functions
- Involved in Development Integration Testing, System Testing, User Acceptance Testing, End-to-End Testing, and Performance Testing.
- Involved in writing UNIX shell scripts to run and schedule batch jobs.
- Provided production support by monitoring the processes running daily.
- Build, unit test and deployment automation to streamline a repeatable development & test process - regardless of technology.
- Build, Automate, unit test case using Power Centre Data Validation Option.
- Worked closely with reporting team and helped them whenever they had any ETL issues.
- Involved in Informatica upgrade from 8.x to 9.x
- Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment
- Involved in Optimizing the Performance by eliminating Target, Source, Mapping, and Session bottlenecks.
Environment: Informatica PowerCenter 8.x and 9.x, Oracle 11g, TOAD, Jira, Remedy PL/SQL, Flat files, Sybase, UNIX Shell Scripting and SAP Business Objects.
ConfidentialInformatica Developer
Roles & Responsibilities:
- Analysis of Business & Technical Requirements.
- Worked closely with the Project Manager and Data Architect. Assisted Data Architect in design by doing source data analysis, rectifying the requirement documents, creating source to target mappings.
- Coordination with on-site team to understand the requirements.
- Preparation of the estimates, time lines of the deliverables and project execution plan.
- Prepare and validate product architecture and design model.
- Coordinating with the support and development team by explaining them the business functionality and assisting them in coding and testing the application programs.
- Involved in performance tuning and optimization of Informatica mappings and sessions using features like partitions and data/index cache to manage very large volume of data.
- Designed and developed the logic for handling slowly changing dimension table’s load by flagging the record using update strategy for populating the desired.
- Implemented source and target-based partitioning for existing workflows in production to improve performance so as to cut back the running time.
- Created complex workflows, with multiple sessions, Worklets with consecutive or concurrent sessions.
- Used Timer, Event Raise, Event Wait, Decisions, and Email tasks in Informatica Workflow Manager.
- Used Informatica reusability at various levels of development.
- Resolve issues for team members through proper channels on time.
- Documented ETL test plans, test cases, test scripts and validations based on design specifications for unit testing, system testing, expected results, preparing test data and loading for testing, error handling and analysis.
- 24/7 Credit Risk Application support
Environment: Informatica PowerCenter 8.x and 9.x, Oracle 11g, Jira, Remedy, TOAD, PL/SQL, Flat files, Sybase, UNIX Shell Scripting and SAP Business Objects.
ConfidentialInformatica Developer
Roles & Responsibilities:
- Involved in gathering requirements from key credit risk business stakeholders.
- Prepared technical and functional designs in collaboration with teams and business users.
- Maintained program documentation, operational procedures and user guidelines as per client requirements.
- Implemented Slowly Changing Dimension methodology for Historical data.
- Coordinated with technical team members for coordination and conducted system integration testing services.
- Developed data warehouse solutions and application by translation of business and functional needs.
- Developed Informatica mappings and Mapplets and also tuned them for Optimum performance, Dependencies and Batch Design.
- Responsible for continuous integration and overnight builds in ETL and Database technology forte and working towards zero broken builds.
- Build, unit test and deployment automation to streamline a repeatable development & test process - regardless of technology.
- Provide on-going feedback and status of projects to IT and business areas, identifying any variances to plan and proposing solutions and alternatives for the variances.
- Worked with XML sources, used the XML Parser transformation.
- Responsible for end to end ETL batch process in all lower environments (Test & Development) and thereby ensuring environment availability to Testing team.
- Analysis of Data related queries from Business key stakeholders.
- 24/7 Credit Risk Application support
Environment: Informatica PowerCenter 8.x and 9.x, Oracle 11g, Jira,Remedy, TOAD, PL/SQL, Flat files, Sybase, UNIX Shell Scripting and SAP Business Objects.
ConfidentialInformatica Developer
Roles & Responsibilities:
- Involved in developing mappings using ETL tool-Informatica for data flow from Source system to Target.
- Worked on Joiner, Filter, Router, Expression, Lookup and Aggregation transformations.
- Creating solution documents for developed mappings.
- Involved in requirement gathering, mapping the solutions as part of solution workshop, and testing the solutions.
- Created mappings using transformations like Source Qualifier, Joiner, Aggregator, Expression, Filter, Router, Lookup, Update Strategy, and Sequence Generator.
- Documented ETL test plans, test cases, test scripts and validations based on design specifications for unit testing, system testing, expected results, preparing test data and loading for testing, error handling and analysis.
- L3 support-monitoring the ETL jobs, send notification mails after completion of jobs, provide fix for the issues raised by users.
- Single point of contact for database related issue.
- Used Import and Export Utilities to maintain the data effectively.
Environment: Informatica PowerCenter 8.x, Oracle10g, Jira, Remedy, TOAD, PL/SQL, Flat files, Sybase, UNIX Shell