- Professional IT Experience of 12 years, including 4+ years of work experience in Big Data Hadoop Architecture, design, development and Ecosystem Analytics.
- Experience on major components in Hadoop Ecosystem like Spark, Map Reduce, HDFS, HIVE, PIG, Sqoop, Oozie, Hbase, Kafka and Hcatelog.
- Experience in Hadoop Distributions like Cloudera, Horton works.
- Experience in Spark with Python and exposure in Java, scala, Flume and AWS
- ETL Expert with 8+ years of Data warehousing experience using Informatica PowerCenter S8.x/9.x,Datastage,Teradata and Mainframe.
- Solid understanding of the Hadoop files distributing system.
- Capable of Processing large sets of Structured, semi - structured and unstructured data and supporting systems application architecture.
- Expertise in Data Extraction, Transforming and Loading (ETL) between Homogenous and Heterogeneous System.
- Extensively used transformations for data loading in multiple projects. (Data Conversion, Conditional Split, Aggregate, Lookup, Join, Merge and Sort).
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible for managing data coming from different sources.
- Having Good knowledge in data warehousing concepts like Star Schema, Snowflake schema, SCD types, Fact and dimension tables, Physical and Logical Data Modeling.
- Extensively worked with various components of the Informatica PowerCenter - PowerCenter Designer, Repository Manager, Workflow Manager, and Workflow Monitor to create mappings for the extraction of data from various source systems.
- Extensively worked on Informatica Power Center Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Joiner, Update Strategy, Rank, Aggregator, Stored Procedure, Sorter, Sequence Generator, Normalizer, Union, and XML Source Qualifier.
- Experience in identifying Bottlenecks in ETL Processes and Performance tuning of the production applications using Database Tuning, Partitioning, Index Usage, Aggregate Tables, Session partitioning, Load strategies, commit intervals and transformation tuning.
- Implemented Slowly Changing Dimension methodology for accessing the full history of accounts and transaction information.
- Created UNIX shell scripts to access data and move data from Production to Development.
- Good knowledge on UNIX Shell scripting. Developed UNIX scripts and scheduled ETL Loads.
- Coordination in preparing project plans with managers and development team to make sure project plans are correct and are in time.
- Excellent Interpersonal Skills with the ability to work independently and with the Team.
- Core Domain skills include Healthcare, Banking, Retail, Term life, P & C Insurance
- Experience in creating High Level Design and Detailed Design in the Design phase.
- Involved in Unit testing, System testing to check whether the data loads into target are accurate.
- Hands on proficiency in Informatica, Datastage, Teradata, Mainframe, SQL Assistant,Control M, Endeavor, MS SQL Server, SQL Developer, DB2Studio, HP ALM Quality Center 11.5,Testing and using Agile and Scrum methodologies.
- Assign work and provide technical oversight to onshore and offshore developers, onsite-offshore co-ordination.
- Have solid knowledge on Project Management Process, Risk Management and CMMI L5 Metrics.
- Loading data from various data sources and legacy systems into Teradata production and development warehouse using BTEQ, FASTEXPORT, MULTI LOAD, FASTLOAD and Informatica.
- Worked on Performance Tuning, identifying and resolving performance bottlenecks in various levels like sources, targets, mappings and sessions.
Hadoop Eco System: HDFS,Hive,Pig,Sqoop,Hbase,Spark,Scala,Python,Zookeeper,Map Reduce, Flume, Oozie,Java,Hbase,Kafka
Databases: Oracle 10g/11g, SQL Server 2008/2010, My SQL 5.0/4.1, Teradata, DB2, SQL Assistant, SYBASE
Reporting Tools: OBIEE, Tableau
Data Modeling: Physical Modeling, Logical Modeling, Relational Modeling, Dimensional Modeling (Star Schema, Snow-Flake, Fact, Dimensions),Erwin CtrlM,CA, 7
ETL Tools: Informatica Power Center 9.6/9.1/8.6, Cloud Service(ICS), Data Stage 8.5,SSIS,SSAS,PL/SQL
Operating System: UNIX, Windows 2000/XP/2008
Other Tools: HP ALM Quality Center, SoapUI
Confidential, Pleasanton, CA
Senior Big data Analyst/Developer
- Analysis the Existing system process.
- Have prepared design documents for the above specified models.
- Have done the implementation for data preparation, scoring and trend analysis.
- Have developed common export framework to transfer the data for different target systems(COM,EXE)
- Have Prepared the in-house Comparator tool using Map Reduce for (Data Science and Engineering team output data validation)
- Leads quality assurance reviews, code inspections, and walkthroughs of the developers' code
- Acts as technical interface to development team for external groups
- Provide the for team members and cross team members.
- Have prepared validation script to check source and target data validation post ingested.
- Have implemented scoring logic using Python script and hive script.
- Have created and configure coordinator, workflow and bundles in oozie.
- Have deployed jar file in EC2 instance post development.
- Have worked 62 Nodes physical cluster in hadoop1x and 31 Nodes in hadoop2x Yarn.
- Have worked 10 Nodes cluster in AWS for Dev & QA Environment.
- Have involved in setting up IAM identity access manager role.
- Have involved network set up in physical cluster with admin.
Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, Python, AWS, Cloudera, Zookeeper and Hbase, DB2, Teradata, Linux
Confidential, Bloomington, IL
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Deployed and analyzed large chunks of data using HIVE as well as HBase.
- Provide support data analysts in running Pig and Hive queries.
- Used HIVE, Python at various stages of the project lifecycle.
- Create business intelligence dashboards in Tableau for reconciliation and verifying data
- Re-designed and developed a critical ingestion pipeline to process over 200 TB of data.
- Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
- Importing and exporting Data from MySQL/Oracle to HDFS.
- Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
- Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them. Exported the result set from Hive to MySQL using Shell scripts.
- Created models and customized data analysis tools in Python and MATLAB
- Delivered data analysis projects using Hadoop based tools and the python data science stack, Developed new data analysis and visualization in python
- Handled importing of data from various data sources, performed transformations using and Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Gained good experience with NOSQL database.
- Involved in creating Hive tables, loading with data and writing hive queries, which will run internally in map, reduce way.
- Involved in creating tables, partitioning, bucketing of table.
- Good understanding and related experience with Hadoop stack-internals, Hive, Pig and Map/Reduce.
- Interacted with the Business users to identify the process metrics and various key dimensions and measures. Involved in the complete life cycle of the project.
- Created Mapplets, reusable transformations and used them in different mappings. Created Workflows and used various tasks like Email, Event-wait and Event-raise, Timer, Scheduler, Control, Decision, Session in the workflow manager.
- Made use of Post-Session success and Post-Session failure commands in the Session task to execute scripts needed for clean up and update purposes.
- Prepared ETL mapping Documents for every mapping and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.
- Worked with reporting team to help understand them the user requirements on the reports and the measures on them.
- Migrated repository objects, services and scripts from development environment to production environment. Extensive experience in troubleshooting and solving migration issues and production issues.
- Actively involved in production support. Implemented fixes/solutions to issues/tickets raised by user community.
Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, Python, AWS, Cloudera, Zookeeper and Hbase, Informatica PowerCenter, DB2, Teradata, UNIX, Tableau
Confidential, Portland, ME
- Used PMCMD command to start, stop and abort workflows, sessions from UNIX.
- Develop complex SQL queries for data-analysis and data extraction by utilizing knowledge of joins and understanding database landscape.
- Implemented parallelism in loads by partitioning workflows using Pipeline, Round-Robin, Hash, Key Range and Pass-through partitions.
- Validation of Informatica mappings for source compatibility due to version changes at the source. Trouble shooting of long running sessions and fixing the issues.
- Implemented daily and weekly audit process for the Claims subject area to ensure Data warehouse is matching with the source systems for critical reporting metrics.
- Developed shell scripts for Daily and weekly Loads and scheduled using UNIX.
- Involved in writing SQL scripts, stored procedures and functions and debugging them. Used different tasks such as session, command, decision, email tasks.
- Proficient in understanding business processes / requirements and translating them into technical requirements, creating conceptual design, development and implementation of software application and integrating new enhancements into existing systems.
- Converting data integration models and other design specifications to Informatica source code .Updated numerous Bteq/Sql scripts, making appropriate DDL changes and completed unit and system test.
- Developing and maintaining the Informatica and Teradata objects. Involved in peer to peer reviews. Worked on SQL developer querying the source/target tables to validate the SQL and Lookup override.
Environment: Informatica Power Center9.1 (Repository Manager, Designer Tools like Source Designer, Warehouse Designer, Mapping and Mapplet Designer, Workflow Manager, Workflow Monitor), DB2, SQL, Teradata, SQL Server,UNIX, Mainframe, Control M, Windows XP
- Understanding the complete requirements and business rules as given in the mapping specification document.
- Used Data stage for Extraction, Transformation and Loading data to the target.
- Extracting data using Dataset and loading into DB2 Relational Database Tables.
- Created Parallel jobs using various Stages involving ODBC Connector, Filter, Remove Duplicate, Sort, Transformer, Funnel, Lookup, Join, Pivot, Aggregator, sequential file stage and etc.
- Created sample data for testing the jobs initially. Used Quality Center for Test and Defect tracking, fixing in System testing if any.
- Prepared extensive Unit Test cases upon understanding the Business logic, validation rules as given in the Mapping document. Written queries to test the functionality of the code during testing.
Environment: Data stage 8.5, DB2, SQL Server, Oracle, and UNIX
- Attending to calls related to Problems/ Abends in Production jobs.
- Communicating with Business Partners for understanding the problem and clarifying their queries.
- Analyzing and resolving them within stipulated time, If required, making Temporary or urgent fixes and moving them to Production. Analysis of the data and production problems
Environment: Mainframe, Teradata, MF-COBOL, SQL Assistant, Control M, Easytrieve, SAS, SYBASE