We provide IT Staff Augmentation Services!

Data Engineer/solution Architect Resume

Tampa, FloridA


  • Extensive professional experience in Information Technology with strong background in architecture definition of large distributed systems, technical consulting, technology implementation in Cloud, Big Data, Hadoop Ecosystem, Data Warehouse, Enterprise Information Management, Data Management and Application integration.
  • Extensive experience in Data Warehouse/Big Data/Hadoop/Data integration, Data migration and Operational data store, BI Reporting projects with a deep focus in design, development and deployment of BI and data solutions.
  • Excellent understanding of Hadoop Architecture and underlying Hadoop framework.
  • Hands on experience with Implementation of Big Data Analytics platform on cloud with AWS/Cloudera/Hortonworks/Hadoop ecosystem.
  • Hands on experience with Hadoop distribution (AWS, Cloudera, Hortonworks), Hadoop architecture and technology stack like Hive, HBase, Map Reduce, YARN, Sqoop, HDFS, Oozie,Ambari, Zookeeper, Kafka, Hue, Flume, Apache Impala, Apache Spark, Security - Kerberos, HDFS encryption, CA Workstation ESP Edison.
  • Extensive experience in analyzing, designing and developing ETL processes for Data Integration/Data Warehousing/ Decision Support System(DSS) using ETL tool Informatica PowerCenter 10.x/9.x (On primes) and Cloud based Informatica Cloud (ICS, ICRT).
  • Certified AWS Solution Architect - Professional with hands on experience in AWS services like EC2, EBS, S3, CloudWatch, Route53, Auto Scaling, CloudFormation, CodeCommit, CodeBuild, CodeDeploy, CodePipeline, Redshift, Data Pipeline, AWS Glue, Kinesis.
  • Build and configure a virtual data center in AWS to support Enterprise Data Warehouse hosting including Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups, Route Tables and Elastic Load Balancer.
  • Build and configure a virtual data center in AWS to support Enterprise Data Warehouse hosting including Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups, Route Tables and Elastic Load Balancer.
  • Strong understanding of NoSQL databases - HBase their implantation performance tuning and data modeling.
  • Strong experience in working with relational databases like Oracle/SQL Server/MySQL and proficiency in writing complex SQL queries.
  • Highly motivated with a strong analytical problem solving skills, ability to work in a team interacting with business users, business analysts, IT leads and developers in analyzing business requirements, translating requirements into functional and technical design specifications.
  • A good team player with excellent communication , presentation and interpersonal skills and ability to prioritize and coordinate work across different geographic locations


Hadoop/ Big Data: Cloudera, Hortonworks, Hive, HBase, Map Reduce, YARN, Sqoop, HDFS, Oozie, Ambari, Zookeeper, Hue, Apache Impala, Apache Spark, Drill, Phoenix, CA WA Workstation ESP Edition

ETL/Data Warehouse/ Modeling Tools: Informatica Power Center 10.x/9.x/8.x, Informatica Cloud(ICS/IICS), Informatica Cloud Real-Time(ICRT/CAI), Erwin Data Modeler 9.x

AWS: EC2, RDS, S3, Glacier, SQS, SNS, CloudFormation, CloudWatch, VPC, IAM, Route53, EBS, EMR, Glue, Data Pipeline, Redshift, Kinesis, Lambda

Operating System: Unix, Linux, Windows, Centos

Web Servers: Apache Http Server, Tomcat Apache

RDBMS/NoSQL DB: Oracle 12c/11g/9i, Microsoft SQL Server, Teradata, MySQL, HBase, MongoDB, DynamoDB


Confidential, Tampa, Florida

Data Engineer/Solution Architect


  • Data engineer/Solution Architect responsible for Modern Data Architecture, Hadoop, Big data and BI requirements and defining the strategy, technical architecture, implementation plan, management and delivery of Big Data applications and solutions.
  • Implementing Data Lake Analytics platform leveraging AWS cloud and Hadoop technologies for providing the centralized Data repository.
  • Providing technical thought leadership on Big Data strategy, adoption, architecture and design, as well as data engineering and modeling.
  • Working with product owners, business SME and data ingestion and reporting architects to identify requirements and consolidate enterprise data model consistent with business processes
  • Developing and maintaining technical roadmap for Enterprise Big Data Platform for different platform capabilities
  • Ingesting a wide variety of data like structured, unstructured and semi structured into the Big data ecosystems with batch processing and real time streaming.

Environment: Big Data, Teradata, Hortonwork, Hive, HBase, Map Reduce, YARN, Sqoop, HDFS, Oozie, CA Workstation ESPx, Apache Spark, PySpark, AWS Glue, EMR, Kinesis, Redshift, Athena, S3, EC2, VPC, DynamoDB.


Data Engineer


  • Collaborated with internal/BPO/implementation partner’s Architect/BA in understanding the requirement and architect a data flow system.
  • Created Hive schema using performance techniques like partitioning and bucketing.
  • Optimized Hive scripts to use HDFS efficiently by using various compression mechanisms.
  • Developed various data flow scripts using Pig Latin to load data into HDFS, Hive and Hbase.
  • Extracted data from Hive, processed and loaded into HBase/MangoDb using Apache Spark.
  • Developed Spark code using Scala and Spark-SQL for faster processing of data.
  • Developed Oozie workflow jobs to execute Hive, Sqoop and Pig actions.
  • Written extensive Hive queries to do transformations on the data to be used by downstream models.
  • Performed Importing and exporting data into HDFS, Hive and HBase using Sqoop.
  • Built the ETL architecture and source to target mapping to load data into Staging, ODS, ODM and Data warehouse using Informatica PowerCenter 10.x/9.x.
  • Integration of Salesforce service cloud with implementation partner’s SOAP based web services exposing as RESTful via Informatica Cloud Real Time (ICRT) and with enterprise data warehouse through Informatica Cloud Real Time (ICRT) OData Connector.
  • Involved in Dimensional modeling ( STAR SCHEMA) of the Data warehouse and used Erwin to design the business process, dimensions and measured facts.
  • Redesigned the existing Informatica ETL mappings & workflows using Spark SQL.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Worked as a fully contributing team member, under broad guidance with independent planning and execution responsibilities.

Environment: Big Data, Informatica PowerCenter, Informatica Cloud (ICS, ICRT), OData, Oracle, Sybase, MS-SQL Server, Cloudera, Hive, HBase, Map Reduce, YARN, Sqoop, HDFS, Oozie, Zookeeper, Apache Impala, Apache Spark, MongoDB, AWS, EC2, EBS, VPC.

Confidential, White Plains, New York

Informatica/Cloud Solution Consultant


  • Collaborated with users, Business Analysts to understand the overall system data flow, gathered and documented requirements and translated the Functional Specification to Technical Specification for Informatica mappings
  • Involved in analyzing different modules of facets system and EDI interfaces to understand the source system and source data.
  • Coordinated with Data Analyst, Data Modeler and Data Warehouse team to analyze source data coming from various source systems like Oracle, DB2, Flat Files in designing logical and physical Dimensional Data Models and created Star Schema using Erwin Data Modeler 7.3
  • Designed and developed complex Informatica mappings, Slowly Changing Dimension (Type-I, Type-II and Type III) mappings to Extract, Transform and Load data from multiple sources into data warehouse and other databases using different transformations like Source Qualifier, Lookup, Expression, Aggregate, Update Strategy, Sequence Generator, Joiner, Filter, Rank and Router, Stored Procedure, XML, SQL transformations etc.
  • Extensively worked in the performance tuning of Informatica PowerCenter at the Target Level, Source level, Mapping Level, Session Level, and System Level.
  • Designed and developed SQL queries. Created Cursors, functions, stored procedures, packages, Triggers, views, materialized views using PL/SQL Programming and extensively worked on Oracle Performance Tuning.
  • Responsible for Informatica Cloud implementation installing secure agent and provisioning user access management.
  • Heavily worked with Informatica Cloud DSS tasks to orchestrate the data integration task with different vendors.
  • Responsible for setting up and managing Amazon Redshift clusters such as launching the cluster by specifying the nodes and performing the data analysis queries.
  • Worked with Informatica Cloud Real Time (ICRT) creating service connector for third party SOAP web service and processes facilitating real time integration.
  • Performed error checking, testing and debugging of various ETL objects and mappings using Informatica session logs to identify and rectify coding errors in process data-flow.
  • Tested the data and data integrity among various sources and targets.
  • Planned and coordinated testing across multiple teams, tracked and reported status, created test case and test cycle plan, troubleshoot data issues, validated result sets, recommended and implemented process improvements.
  • Responsible for PowerCenter upgrade from 9.1 to 9.5, Migrated ETL code from Informatica 9.1 to 9.5. Integrated and managed workload of Power Exchange CDC.
  • Used Repository Manager to migrate workflows, mapping and shared object between development, testing and production environments.
  • Coordinated with onsite and offshore ETL teams to meet the scheduled project deadlines.
  • Provided 24/7 on call support in Testing and Production environment.

Environment: Informatica PowerCenter, Informatica Cloud, AWS, Redshift, Oracle, SQL Navigator, ERwin Data Modeler, PL/SQL, MS-SQL Server.

Confidential, Tampa, Florida

ETL Consultant


  • Gathered business scope and technical requirements from users and translated business logic into ETL Specifications.
  • Extensively worked on Business Analysis. Designed, Developed, Tested and Implemented Informatica transformations and Workflows for extracting data from multiple legacy systems.
  • Actively involved in interaction with Management to identify key Dimensions and Measures for business performance.
  • Responsible for Data Modeling and developing procedures to populate business rules using mappings.
  • Developed extraction mapping to load data from various source system like Flat Files, ODBC sources into target using different transformations like Source Qualifier, Look up (connected and unconnected), Expression, Aggregate, Update Strategy, Sequence Generator, Joiner, Filter, Update Strategy Rank and Router transformations.
  • Designed and developed reusable transformations and mapplets to use in multiple mappings.
  • Generated different types of reports, such as Chart, Master/Detail and Cross Tab.
  • Implemented performance tuning logic on targets, sources, mappings, sessions to achieve maximum efficiency and performance.
  • Developed Data loading stored procedures, functions using PL/SQL from Source systems into operational data storage.
  • Implemented Variables and Parameters in the mappings.
  • Involved in designing ER diagrams, logical model (relationship, cardinality, attributes and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) as per business requirements.
  • Implemented session partitions, dynamic cache memory, and index cache to gain optimal performance of Informatica server.
  • Optimized SQL queries for better performance.
  • Extensively used Informatica debugger to validate mappings and to gain troubleshooting information about data and error conditions.
  • Worked with different Informatica tuning issues and fine-tuned the transformations to make them more efficient in terms of performance.

Environment: Informatica Power Center 9.1/8.5 (Repository Manager, Designer, Workflow Manager and Workflow Monitor), Oracle 10g, MS SQL Server 2005, DB2, Mainframe, PL/SQL, SQL,

Hire Now