We provide IT Staff Augmentation Services!

Etl Architect Resume

5.00/5 (Submit Your Rating)

SUMMARY:

  • Over 9+ years of strong experience in Data Analyst, Data Engineering with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Data modeling, Data Visualization. Adept in statistical programming languages like Python, Apache Spark, Pysaprk, Scala including Big Data technologies like Hadoop, Hive, Pig.
  • Experienced on data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling, machine learning and advanced data processing.
  • Have Excellent knowledge in Dimensional Data Modeling, Relational Data modeling, ER/ Studio, Erwin, Star Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logical data modeling.
  • Extensively used SQL, Numpy, Pandas, Scikit - learn, Spark, Hive for Data Analysis and Model building.
  • More than 3 years of experience in Hadoop/Spark Development using pyspark and 5 years of experience in Oracle Database design, data modeling and ETL development using Informatica power center.
  • Hands on Experience in working with Expertise in Hadoop, MapReduce, YARN, Hive, Pig, Sqoop, Kafka, Spark - Pyspark/Scala and Spark Streaming.
  • Expertise in Normalization to 3NF/De-normalization techniques for optimum performance in relational and dimensional database environments.
  • Hands on Experience in working with AWS including EMR, RDS. Strong Knowledge of Hadoop and Hive and Hive's analytical functions.
  • Efficiently migrated code from Spark 1.6 to Spark 2.4 to make AWS Compatible.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop and spark.
  • Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Effectively made use of Table Functions, Indexes, Table Partitioning, Collections, Analytical functions, Materialized Views, and Query Re-Write
  • Strong experience in Data warehouse concepts, ETL tool (Informatica).
  • Hands on experience in IDE tools like Eclipse, Visual Studio, IntelliJ.
  • Excellent problem-solving skills, high analytical skills, good communication and interpersonal skills.
  • Exposure to Performance Tuning for Oracle RDBMS using Explain Plan, HINTS etc.
  • Good understanding of Unix Shell scripting.
  • Have good knowledge on NoSQL databases like HBase.
  • Ability to adapt to evolving technology, strong sense of responsibility and .
  • Involved in the Software Life Cycle phases like Agile methodology estimating the timelines for projects.
  • Ability to quickly master new concepts and applications.
  • Extensively used ETL methodology to Design project for extraction, integration, transformation and load of data.
  • Experience in working with data warehouses and data marts using Informatica Power center (Designer, Repository Manager, Workflow Manager, and Workflow Monitor).
  • Understanding & Working knowledge of CDC (Change Data Capture).
  • Experience in Workflow optimization by partitioning techniques, multi instance run, multi thread run and session config changes.
  • Experienced in Extracting data from DB2, Oracle, XML, SAP R3, SQL server type of source system.
  • Developed slowly changed dimensions (SCD) Type 1, 2,3 etc for loading data into Dimensions and Facts.

TECHNICAL SKILLS:

Programing Language: Python, Pyspark, Unix shell scripting.

ETL Software: Hadoop (Spark), Hive, Pig, Pyspark, Scala, Informatica .

Database: Oracle SQL, PL/SQL

Scripting: Unix/Linux Shell Scripting, R scripting

Concepts: Data Warehousing, Regression algorithms.

Hardware: Worked in SMP environment, knowledge of Cluster

Operating Systems: Unix (Solaris) & Linux (Redhat)

Domain: Human Capital Management, Finance .

Learning: Data Science Algorithm.

PROFESSIONAL EXPERIENCE:

Confidential

ETL Architect

Responsibilities:

  • Designed the data model for hive tables and used the bucketing and partitioning concepts and created the external and internal hive tables with partitions.
  • Imported the Industry level huge dataset from different product sources within the Confidential and loading into HDFS using Data Extractor utility (using Spark instead of Sqoop).
  • Used parallel processing, staging techniques, hive Partitioning to optimize the data storage. Wrote and optimized the hive sql queries to have best performance.
  • Created Generic framework to create cubes.
  • Created spark - sql wrapper utility to run Hive scripts using Spark and data acquisition from oracle.
  • Loading the benchmark data into Oracle Data warehouse using Spark.
  • Implemented Partition swap Process in Oracle to load the data without the downtime.
  • Extensively involved in Data Extraction, Transformation and Loading from various Sources to cluster using Spark and hive.
  • Extensively used EMR for load, test and deploy the spark code in AWS.
  • Performed data cleansing and transformation using UDF in spark.
  • Created automation using Unix shell script and Pl/Sql procedure for cleaning and ETL load execution.
  • Created pipeline to import and export data to and from data warehouse.
  • Worked on performance improvement techniques by optimizing caching, in memory capability with bucketing, and hashing algorithm.
  • Utilized my Unix skill to run pipeline with multi instance in loop with 4 threads in one shot.
  • Create mechanism by shell script with pl/sql for dynamic parameter file generation to automate the integral load as well audit the status of pipeline run in control table.

Confidential

Hadoop and Spark Developer

Responsibilities:

  • Extracting Facts and Dimensions data from OLAP System and storing into Hive tables.
  • Feature Engineering to create data set for ML/AI Models.
  • Performing Exploratory Data Analysis to find the capabilities of Data.
  • Created end to end ETL Pipeline from Data Extraction to Data Export.
  • Exporting data to S3 to enable Data Scientist to work on AWS Sagemaker.
  • Used AWS Glue Crawler to create tables in Athena on top of S3.
  • Created cloud formation template to spawn EMR.

Confidential

Informatica, Unix, Pl/Sql Developer

Responsibilities:

  • Created various mapping to separate the common logic for each product.
  • Parameterized the all required input variables in mapping.
  • Created separate workflows for .each product like Vantage, WFN, EV5 etc.
  • Created generic parameter File for all shell scripts and mapping.
  • Updated the common procedures.
  • Implemented auto zero down mechanism for one of the product.
  • Implemented separate Quarterly and Monthly summary fact population
  • Implemented Set of CSV files validation for each client.

Confidential

Informatica Developer

Responsibilities:

  • Extensively involved in Data Extraction, Transformation and Loading from Source to target systems using Informatica.
  • Performed data cleansing and transformation using Informatica.
  • Worked extensively with Designer tools like Source Analyzer, Transformation Developer, Mapping and Mapplet Designers.
  • Created data mappings to extract data from source XML, CSV files to transform the data using Filter, Update Strategy, Aggregator, Expression and Joiner Transformations and then loaded into datawarehouse.
  • Created data mappings to extract data from different source systems (BW, SQL SERVER, SAPR3),transform the data using Filter, Update Strategy, Aggregator, Expression, Joiner Transformations and then loaded into datawarehouse.
  • Implemented Slowly Changing dimension type2 methodology for accessing the full history of accounts and transaction information.
  • Used Update Strategy Transformation to update the Target Dimension tables, type2 updates where we insert the new record and update the old record in the target so we can track the changes in the future.
  • Developed various mapplets that were then included into the mappings.
  • Implemented Slowly Changing dimension type2 methodology for accessing the full history of accounts and transaction information.
  • Created command task to execute shell for sftp to pull the data with password less from source remote server.
  • Create mechanism by shell script with pl/sql for dynamic parameter file generation to automate the integral load as well audit the status of workflow run in control table.

Confidential

Informatica Developer

Responsibilities:

  • Worked on Informatica Designer Tool’s components - Source Analyzer, Warehouse Designer, Transformation Developer, Mapping Designer and Mapplet Designer.
  • Involved in the development of Informatica mappings and also tuned for better performance. In the Mappings most of the Transformations were used like the Source Qualifier, Expression, Filter, Aggregator, Look Up, Update, and joiner, Sequence Generator, Sorter, Rank and Router.
  • Used Unconnected Look Ups with the combination of Update Strategy to implement Type 2Warehouse.
  • Lookup tables were implemented at the staging level for the faster response time of Lookups.
  • Extensively used Informatica’s Workflow Manager & Monitor tools to Load data from MS SQL Server, Oracle OLTP sources into the Target Oracle 9i Data Ware House.

Confidential

Informatica Developer

Responsibilities:

  • Worked as an ETL Mapping developer.
  • Extensively used various transformations like Aggregator, Lookup, Expression, Router and Filter Transformations.
  • Developed various worklets that were then included into the workflows.
  • Tuned and tested the mapping to perform better using different logic’s to provide maximum efficiency.
  • Shell scripting for Sftp and Automating control table update logic.

Confidential

Informatica Developer

Responsibilities:

  • Worked on design and develop workflow that extract, transform and populate the fact tables.
  • Worked on XML file, Flat File, Joiner, Aggregator, Update Strategy, Rank, Router, Lookup, Stored Procedure, Sequence Generator, Filter, Sorter, and Source Qualifier.
  • Created shell scripts to run daily jobs and extract the files from remote location for data loads.

Confidential

Informatica Developer

Responsibilities:

  • Worked as an ETL Mapping developer.
  • Sql logic, Mapping and Workflow development, UNIT and SIT testing and deployment in production.
  • Worked on control table with update and insert logic for auditing purpose.
  • Developed shell script to update the control table and dynamic parameter file generation.

We'd love your feedback!