- Around 8+ years of experience of Data Warehouse in Design, Development and implementation of Data warehouses as an ETL Developer in Telecom, Banking, Finance, Healthcare, Retailers, customer and Suppliers.
- Around 2 years of experience in Big Data Ecosystems
- Experience in understanding of Software Development Life Cycle (SDLC) which has Analysis, Design, Implementation, Testing and Maintenance and sound knowledge of SDLC implementation methodologies including Waterfall and Agile.
- Experience with HDFS, Map - reduce, PIG, Hive, AWS, Zookeeper, Oozie, Hue, Sqoop.
- Having Experience to store data into HDFS using hive and Sqoop.
- Explorer on general data analytics on distributed computing cluster like Hadoop using Apache Spark SQL and Scala.
- Expertise in Transferred large datasets between Hadoop and RDBMS by using Sqoop.
- Expertise in scheduled Hadoop workflows and monitored using Oozie, Zookeeper.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems for Teradata, Oracle, Netezza, sql server.
- Work to load the data from multiple data sources into HDFS.
- Having Knowledge with in-memory processing of data sources using Apache Spark.
- Experience in using Sequence files, ORC, Parquet and Avro file formats and compression techniques.
- Experience in Production, Preprod, quality assurance (QA), SIT (System Integration testing) and user acceptance (UA) testing.
- Extensively Knowledge in Soap and Rest Web services.
- Having Explore on Amazon EC2, Amazon S3, AWS Athena, AWS Lambda and AWS EMR
- Work experience with cloud infrastructure like Amazon Web Services (AWS), Microsoft Azure
- Experience with Object Oriented Analysis and Design (OOAD)methodologies.
- Experience in developing Shell scripts and Python Scripts for system management.
- Experience as on ETL developer using Informatica Data Quality, Informatica Power Center, Informatica Power Exchange (Source Analyzer, Warehouse designer, Mapping designer, Mapplet designer, Transformation developer) Repository manager, Workflow manger & workflow monitor.
- Experience in working with various Informatica power center transformations like, Source Qualifier, Aggregator, Router, Joiner, Rank, Sequence Generator, Transaction Control, Lookup, Normalizer and implemented solution Tuning for Transformations.
- Having Exposure in AWK Scripts and Worked on Autosys, Crontab and Control + M to schedule the jobs.
- Experience of Data warehouse concepts, data modeling using Normalization, Business Process Analysis, Re-engineering, Dimensional Data modeling, physical & logical data modeling.
- Expert in writing BTEQ, FAST LOAD and MULTI LOAD scripts according to the business demand with the given transformation or business logics and good exposure on FAST EXPORT, TPUMP, TPT (Teradata Parallel Transporter).
- Implemented type1/type2/type3/type4/type5/type6, Incremental and CDC logic according to the Business requirements.
- Experience in Query Optimization and Performance Tuning on SQL Queries.
- Good understanding in database and data warehousing concepts (OLTP & OLAP).
- Experience of NoSQL databases like MongoDB, DynamoDB, Cassandra, and HBase using to store semi structured and unstructured data.
- Experience with data serialization formats like XML, JSON and protocol buffers.
- Very strong knowledge in Relational Databases (RDBMS), Data modeling and in building Data Warehouse, Data Marts using Star Schema and Snow Flake Schema
- Expert in creating SQL Queries, PL/SQL package, Functions, Stored Procedures, Triggers, Cursors, materialized views and customized views.
- Used Version Controls are Git and SVN and Participated in Defect management and Bug Reporting. Used the tools JIRA, HP Quality Center
Amazon Web Services: Amazon EC2, Amazon S3, AWS Athena and AWS EMR
Big data Ecosystem: MapReduce, HDFS, Yarn, Has, Zookeeper, Hive, Pig, Sqoop, MongoDB, Flume, and MapReduce, Oozie, Scala, Zookeeper, Apache Spark
SDLC Methodologies: Agile, waterfall model
Programming Languages: Java, Sql, MySql, T-Sql, Python, Scala
Web Services: Rest, Soap
Databases: Oracle, Teradata, Sql Server, Netezza, DB2, Cassandra, MongoDB and DynamoDB
IDE: Eclipse, Net Beans, IntelliJ
Reporting tools: Tableau, MicroStrategy, OBIEE, Cognos, SSRS, PowerBI
Middleware tools: Informatica, IDQ, SSIS, ODI
SDLC Methodologies: Agile, waterfall model
Job Control and Other Tools: Informatica Scheduler, Putty, SQL*Plus, TOAD, PL/SQL Developer tool, Autosys, Tidal, Winscp, Crontab and Control + M, Oozie
Data Modelling: ERWIN 4.x/3.x, MS Visio, Ralph-Kimball Methodology, Bill-Inman Methodology, Star Schema, Snow Flake Schema, Physical and Logical Modeling, Dimension Data Modeling, Fact and Dimension Tables.
Confidential, San Jose, CA
Senior Developer Lead
Environment: Informatica Power Center 10.1, Teradata V14, Oracle 11g, SQL Server Hadoop, HDFS, Map ReduceHive, Sqoop
- Analyze the existing systems informatica business logic and identify the source and target systems to convert that business logic to HDFS using hive and Sqoop.
- Worked on Sequence files, ORC, Parquet and Avro file formats and compression of columns.
- Worked on incremental data loading using Sqoop.
- Worked on filesystems data in upstream and downstream systems snapshot data using hive and relational data using Sqoop.
- Created Unix shell scripts to store the cutoff dates in hive tables to run for next sequence of data.
- Worked on query optimization in Hive to reduce the performance using partitions and buckets in hive tables.
- Worked in Delta Migration using Sqoop Incremental Updates.
- Involved in Python framework to process the hive, Unix scripts and Sqoop scripts.
- Worked with off-shore team to co-ordinate with work and migration process.
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase, Zookeeper, IntelliJ, Maven, SQL ServerMySQL, Oracle, Linux, Scala.
- Worked with different file formats like Sequence files, XML files and Flat files using Map Reduce Programs.
- Extracted millions of records from raw XML using spark and Scala from HBase.
- Create Hive tables and views on top of the HDFS folders
- Involved in implementing the solution for data preparation which is responsible for data transformation as wells as handling user stories.
- Developing and testing data Ingestion/Preparation/Dispatch jobs.
- Worked on migrating existing Oracle, Sql Server data and reporting fields to Hadoop.
- Created hive external table on top of HBase which were used for feed generation.
- Worked on migration of an existing feed from hive to Spark. To reduce latency of feeds the existing Hql was transformed to run using Spark SQL and Hive Context.
- Generated the data cubes using hive, Pig, JAVA Map-Reducing on provisioning Hadoop cluster in AWS.
- Writing Pig Latin Scripts to perform transformations (ETL) as per the use case requirement.
- Created dispatcher jobs using Sqoop export to dispatch the data into Teradata target tables.
- Created Hive target tables to hold the data after all the PIG ETL operations using HQL.
- Created HQL scripts to perform the data validation once transformations are done as per the use case.
- Responsible in code review, finding bug and bug fixing for improving the performance.
- Created Hive scripts to load the historical data and partition of the data.
Environment: Hadoop CDH 3.0, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Oozie, Linux, Unix, Spark SQL
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from SQL into HDFS using Sqoop.
- Worked with Hadoop major core components like Map and Reduce operations.
- Involved in collecting and aggregating large data sets of log data using Hive and loading data in HDFS for further analysis.
- Developed Hive tables and applied Hive Scripts to analyze the structured data present in HDFS.
- Used Oozie workflow engine to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Created views over HBase table and used SQL queries to retrieve alerts and meta data.
- Worked with loading and transforming large sets of structured, semi structured and unstructured data.
- Worked on User Defined Functions in Hive to load data from HDFS to run aggregation function on multiple rows with more than three replications.
- Worked with Spark using Scala and Spark SQL for faster testing and processing of data.
- Worked with HBase NoSQL database.
- Converted RDD’s to data frames to improve the performance and optimization using in-memory procedures with Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Created different UDF's to analyze partitioned, bucketed data and compute various metrics for reporting on dashboard and stored them in different summary tables.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster
- Created stored procedures, triggers and functions to operate on report data in MySQL.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Developed backend code in Java to interact with the database using JDBC.
Environment: Hortonworks Hadoop 2.0, Map Reduce, Hive, Pig, Sqoop, HBase, Java, Oozie, Linux, Unix
- Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
- Extensively Used Sqoop to import/export data between RDBMS and hive tables, incremental imports and created Sqoop jobs for last saved value.
- Established custom Map Reduces programs to analyze data and used Pig Latin to cleanunwanted data.
- Installed and configured Hive and wrote Hive UDF to successfully implement business requirements.
- Involved in creating hive tables, loading data into tables and writing hive queries those are running in MapReduce way.
- Experienced with using different kind of compression techniques to save data and optimize data transfer to Hive tables.
- Involved in writing test cases, implementing unit test cases.
- Installed Oozie workflow engine to run multiple Hive and Sqoop jobs which run independently with time and data availability.
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Developed various Big Data workflows using custom MapReduce, Hive, Sqoop.
Environment: Informatica Power Center 10.1, Informatica IDQ 10.1, Teradata V14, Oracle 12c, SQL Developer, Teradata Sql Assistant, UNIX Shell Scripting, Flat Files, XML Files, Agile Methodology
- Worked closely with Business Analyst and interacted with Report Users to understand and document requirements translated them to Technical Specifications.
- Parsed high-level design specification to simple ETL coding and mapping standards and Designed and customized data models for Data warehouse supporting data from multiple sources on real time.
- Involved in the full development lifecycle from requirements gathering through development and support using Informatica Power Center, Repository Manager, Designer, Server Manager, Workflow Manager, and Workflow Monitor. Extensively worked with large Databases in Production environments.
- Worked on data migration of Informatica Mappings, Sessions, and Workflows to Testing, Pre-live and Production Environments.
- Used Informatica power center ETL tool to extract the data from different source systems to target systems.
- Extracted flat file source data using Informatica Designer tool to be staging database and then implemented business logic or rules to load data into target tables.
- Developed mappings using different transformations like source qualifier, Expression, filter, Aggregate, Update Strategy, Router, Sequence Generator, and Joiner.
- Created UNIX scripts to execute in command task configured email task to send error reports to user.
- Involved in writing shell scripts to schedule, automate the ETL jobs.
- Involved in writing PL/SQL stored procedures, functions for extracting as well as writing data.
- Performstoredpackages,procedures,functionsandtriggerdevelopmentinsupportofanenterprise level, multi user,dataaggregation,analysis,andreportingsystem.
- Created Tasks, Workflows, Worklets and Sessions using Workflow Manager Tool and Monitored Workflows using Informatica Workflow Monitor.
- Involved in error handling using session logs and workflow logs.
- Used Unix Commands and shell scripts to schedule the jobs.
- Monitor production issues, inquiries and provide efficient resolution and answers to these requests
- Performed incremental aggregation to load incremental data into Aggregate tables.
- Preparation of test data for regression testing and Validating the target data with the source data.
- Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
- Created mapping documents to outline data flow from sources to targets.
- Extracted the data from the flat files and other RDBMS databases into staging area and populated into Data warehouse.
- Maintained stored definitions, transformation rules and targets definitions using Informatica repository Manager.
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, and Union to develop the mappings in the Informatica Designer.
- Developed mapping parameters and variables to support SQL override and Created mapplets to use them in different mappings.
- Used Type 1 SCD and Type 2 SCD mappings to update slowly Changing Dimension Tables.
- Modified existing mappings for enhancements of new business requirements and Used Debugger to test the mappings and fixed the bugs.
- Wrote UNIX shell Scripts & pmcmd commands for FTP of files from remote server and backup of repository and folder.
- Involved in Performance tuning at source, target, mappings, sessions, and system levels.
- Prepared migration document (Technical Installation Plan) to move the mappings from development to testing and then to production repositories.
- Prepared the DDL statements for maintenance tables, Indexes and Constraints.
- Worked on Key Generator, Labeler, Standardizer, Address Validation and Consolidation Transformations in IDQ
- Developed MLOAD, FLOAD, B-TEQ scripts to load into staging tables and then to Dimensions and Facts.
- Used Fast Export utility to extract large volumes of data at high speed from Teradata RDBMS.
- Created PL/SQL stored procedures, functions and packages for moving the data from Upstream to staging area
- Used SVN version control tool for version controlling and movement of code to upper environments like SIT, UAT, Pre-production and Production.
- Worked on unit testing and prepared unit test documents and uploaded in JIRA to review
Environment: Informatica Power Center 8.1.1, Oracle 11g, SQL Server 2008/2005, UNIX, Flat Files, PL/SQL
- Designed and developed complex mappings by using Lookup, Expression, Update, Sequence generator, Aggregator, Router, Stored Procedure, etc., transformations to implement complex logics while coding a mapping.
- Worked with Informatica power center Designer, Workflow Manager, Workflow Monitor and Repository Manager.
- Developed and maintained ETL (Extract, Transformation and Loading) mappings to extract the data from multiple source systems like Oracle, SQL server and Flat files and loaded into Oracle.
- Developed Informatica Workflows and sessions associated with the mappings using Workflow Manager.
- Experience in Performance tuning of Informatica (sources, mappings, targets and sessions) and tuning the SQL queries.
- Loading Data to the Interface tables from multiple data sources such as SQL server, Text files and
- Involved in debugging Informatica mappings, testing of Stored Procedures and Functions, Performance and Unit testing of Informatica Sessions, Batches and Target Data.
- Developed Mapplets, Reusable Transformations, Source and Target definitions, mappings using Informatica.
- Involved in Performance Tuning of mappings in Informatica.
- Good understanding of source to target data mapping and Business rules associated with the ETL processes.
- Extensively used the advanced features of PL/SQL likeRecords, Tables, Object typesand Dynamic SQL.
- Extensively involved in writing complex business logic into packages and procedures.
- Responsible for unit testing of ETL mappings, bug fixing and helping the testing team to execute the test cases.
- Migrate the Workflows to production is PMCMD and import and export the repositories.
Environment: Oracle11g/2c, PLSQL (Packages, Stored Procedures, Functions), Data Modelling, Reports,10g, Oracle Workflows, BI Publisher, Forms 10g SQL*Loader, TOAD and Unix
- Analyzed business needs, created and developed new centralized back office system for specific area including mortgage and commercial lending system.
- Worked with the Business analyst for gathering requirements.
- Identified the key facts and dimensions necessary to support the business requirements. Develop the logical dimensional models.
- Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
- Conducted design discussions and meetings to come out with the appropriate data mart using Bill Inmon methodology.
- Worked closely with the QA Team Lead in Creation, Preparation, and Implementation of quality assurance reviews and the development and execution of test plans manually
- Responsible for Data mapping testing by writing SQL Queries
- Coordinated with the front-end design team to provide them with the necessary stored procedures and packages and the necessary insight into the data.
- Worked on SQL*Loader to load data from flat files obtained from various facilities every day.
- Created and modified several Unix Scripts according to the changing needs of the project and client requirements.
- Wrote Unix Shell Scripts to process the files on daily basis like renaming the file, extracting date from the file, unzipping the file and remove the junk characters from the file before loading them into the base tables.
- Generated server-side PL/SQL scripts for data manipulation and validation and materialized views for remote instances.
- Developed PL/SQL packages and master tables for automatic creation of primary keys.
- Involved in data loading using PL/SQL and SQL*Loader calling Unix scripts to download and manipulate files.
- Performed SQL and PL/SQL tuning and Application tuning using various tools like Explain Plan, SQL*TRACE, TKPROF and AUTOTRACE.
- Extensively involved in using hints to direct the optimizer to choose an optimum query execution plan.
- Used Bulk Collections for better performance and easy retrieval of data, by reducing context switching between SQL and PL/SQL engines.
- Created PL/SQL scripts to extract the data from the operational database into simple flat text files using UTL FILE package.
- Creation of database objects like tables, views, materialized views, procedures and packages using oracle tools like Toad, PL/SQL Developer and SQL* plus.
- Created records, tables, collections (nested tables and arrays) for improving Query performance by reducing context switching.
- Used Pragma Autonomous Transaction to avoid mutating problem in database trigger.
- Extensively used the advanced features of PL/SQL like Records, Tables, Object types and Dynamic SQL.
- Handled errors using Exception Handling extensively for the ease of debugging and displaying the error messages in the application.