We provide IT Staff Augmentation Services!

Lead Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY:

  • 14 years of IT experience in software Analysis, Design, Development, Testing, Implementation and production Support of various Client/Server and Web based applications. Has strong hands - on experience in Big Data, Hive, Pig, Kubernetes and ETL tool Informatica power center, Informatica cloud using salesforce for customer data and data modeling, data warehousing, Inman and Kimball methodologies, ETL data Integration, Excellent interpersonal and mentoring skills. Well versed in reliability theory with hands on experience in performing reliability engineering for complex systems
  • Expert level on Informatica Administration and Informatica cloud services
  • Building ETL data pipeline on Hadoop/Teradata using Hadoop/Pig/Hive/UDFs
  • Tracking data (data at Minutes/Hourly/Daily level):
  • Supporting data pipeline Confidential .com-->Kafka--> Gobblin-->Hadoop (Computation layer) and Teradata
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
  • Hands on experience in Data modeling and Dimensional modeling using Kimball methodologies.
  • Exposure to Full Lifecycle (SDLC) of Data Warehouse projects including Dimensional Data Modeling
  • Well versed with the data warehousing concepts Star schema, Snowflake star schema. Proven Technical and Business Documentation experience with large proposals. Strong time management and project coordination skills with simultaneous projects.
  • Comprehensive knowledge of system engineering, reliability life-cycle management, and reliability modelling

PROFESSIONAL EXPERIENCE:

Confidential

Lead Data Engineer

  • Implement Continuous Integration and Continuous Delivery process using GitLab along with Python and Shell scripts to automate routine jobs, which includes synchronize installers, configuration modules, packages and requirements for the applications.
  • Hands on Django frame work using PyCharm, hands on Airflow workflow management
  • Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
  • Construct the AWS data pipelines using VPC, EC2, S3, Auto Scaling Groups (ASG), EBS, Snowflake, IAM, CloudFormation, Route 53, CloudWatch, CloudFront, CloudTrail.
  • Create and configure elastic load balancers and auto scaling groups to distribute the traffic and to have a cost efficient, fault tolerant and highly available environment.
  • Manage metadata alongside the data for visibility of where data came from, its linage to ensure and quickly and efficiently finding data for customer projects using AWS Data lake and its complex functions like AWS Lambda, AWS Glue.
  • Develop the software systems, using scientific analysis, complex and optimal algorithms like sessionization for generating the statistics into web site analyzer.
  • Develop Restful and soap API’s using swagger and perform Mobile app and customer product details app testing using Postman.
  • Hands on with Redshift Database (ETL data pipe lines from AWS Aurora - MySQL Engine to Redshift)
  • Distkey, Sort key, groups on Primary columns for better query performance and tuning
  • Assigning user level/ group level permissions on Redshift schema for security reasons
  • Restoring PostgreSQL into Redshift (except certain data types)
  • AWS DMS replication instance between different databases
  • Design and construct of AWS Data pipelines using various resources in AWS including AWS API Gateway to receives response from aws lambda and retrieve data from snowflake using lambda function and convert the response into Json format using Database as Snow Flake, DynamoDB, AWS Lambda function and AWS S3.
  • Develop and implement complex databases and data marts for current Production state for both traditional RDBMS and Hadoop Eco system with the legacy applications and designing the solutions to in corporate new processes & implementation into existing environment.
  • Build ETL pipeline end to end from AWS S3 to Key, Value store DynamoDB, and Snowflake Datawarehouse for analytical queries and specifically for cloud data.
  • Convert into different Data formats for user/business requirements by streaming data pipeline from various sources Snowflake and unstructured data, Dynamo-db.
  • Perform document modifications and enhancements made to the applications, systems and databases as required by the project.
  • Document all the changes implemented across all systems and components using Confluence and Atlassian Jira. Documentation includes Technical changes, Infrastructure changes, and Business Process changes. Post Release documentation would also include Known Issues from Production Implementation and Deferred defects.
  • Perform Informatica Cloud Services, Informatica Power Center Administration ETL strategies and ETL Informatica mapping. Setting up of Secure Agent and connect different applications and its Data Connectors for processing the different kinds of data including unstructured (logs, click streams, Shares, likes, topics etc..), semi structured (XML, JSON) and structured like RDBMS.

Lead Data Engineer

Confidential

  • Hands-on experience developing ETL’s using Informatica Cloud Services (ICS) and third-party data connectors (i.e. Salesforce, Zuora, Oracle EBS etc.) and Change data capture
  • Export/Import data from Teradata to Hive/HDFS using Sqoop and the Hortonworks Connector for Teradata
  • Open to learning and using new systems involved in Python Programming
  • Kafka producer API and consumer API configuration, upgrading, rolling upgrade, Topic level Configs, Kafka connect configs, stream configs, consumer rebalancing, operations, replication, message delivery semantics, end - to - end batch compression etc.
  • Hands-on experience with Informatica power center and power exchange in integrating with different applications and relational databases
  • AWS CI/CD Data pipeline and AWS Data Lake using EC2, AWS Glue, AWS Lambda.
  • Hands on with different API Endpoints like Edge Optimized, Regional, Private in Aws Api Gateway
  • Configured control connections different levels like API Key, Method level, Account Level.
  • AWS API Gateway protection strategies like Resource Policies, IAM, Lambda, Cognito Authentications
  • Building ETL data pipeline on Hadoop/Teradata using Hadoop/Pig/Hive/UDFs
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
  • Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline, GitHub.
  • File Type on Hadoop: Avro (Schema evolution) and ORC (good performance- Adhoc analysis Analytics queries) and Confidential 's DALI Storage (Presto Database)

Sr. Consultant

Confidential

  • Developed various ETL flows using Informatica power center and power exchange: Salesforce to oracle (continues data capture) and oracle to Salesforce using external ids ODS to db2 etc...
  • As SFDC consultant used Informatica for cross application integration requirements. Mapping complex data structures, setting up understandable and reliable business rules.
  • Functional requirements gathering from Operational & Business Users and translate the Functional requirements to Technical requirements & specifications.
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.

Senior Engineer

Confidential

  • Participating in the call of Business Requirements Designs from clients and work with Architecture/RAD team for shaping out the Design of ETL tasks lists.
  • Created continuous integration and continuous delivery (CI/CD) pipeline on AWS that helps to automate steps in software delivery process
  • Hands-on experience developing ETL’s using Informatica Cloud Services (ICS)
  • Hands-on experience with Informatica power center and power exchange in integrating with different applications and relational databases
  • Created AWS Code Pipeline, a service that builds, tests, and deploys code every time there is a code change, based on the release process models
  • Created pipeline that uses AWS Code Deploy to deploy applications from an Amazon S3 bucket and AWS Code Commit repository to Amazon EC2 instances running Amazon Linux.
  • Data analysis and maintenance of production data
  • Worked on Auto Scaling, Cloud Watch (monitoring), AWS Elastic Beanstalk (app deployments), AWS S3 (storage) and AWS EBS (persistent disk storage)

Senior Engineer

Confidential

  • High Level Design (HLD) which comprises of Data Modeling, Design, Data Validation anomalies
  • Experience with Performance Tuning for Oracle RDBMS using Explain Plan and HINTS
  • Logical/Physical Design of the databases.
  • Resolving issues like Loops, Fan Traps & Chasm Traps by creating Aliases and Contexts as required in removing the cyclic dependencies and testing them to ensure for correctness.
  • Tuning the queries to improve the Performance of the existing Reports for various functional areas.
  • Tested the aggregate awareness to ensure the query is pulling correct level of aggregation.

Senior Engineer

Confidential

  • Developed framework for ETL workflows.
  • Created necessary RDBMS model and table structures for sourcing customer profile information based on geography, nature, asset classes and credit ratings.
  • Data analysis and maintenance of production data
  • Designing ETL strategies for load balance, exception handling and design processes that can satisfy high data volumes. Had setup new servers and laid out a migration plan

Environment: RHEL 4/5/6, Centos, Informatica, CDC, Oracle, SQL server, Teradata (Fast Load, MLoad, BTEQ), XML, SQL/PLSQL, T-SQL, TDD, Erwin, shell scripting, Informatica Cloud Services, aws api gateway aws glue, aws lambda, Big Data, HDFS, Pig, Hive, sqoop, Agile,AWS, Jenkins, Git, Docker, Kubernetes, Hadoop, Puppet, Jira, Python.

We'd love your feedback!