We provide IT Staff Augmentation Services!

Cloud Data Engineer Resume

2.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • 11+ years of experience in design and development, Integration talents on Big Data, Cloud services, Informatica in Insurance, Banking, Retail and HealthCare Industries.
  • 5 years of experience as Cloud Data Engineering in Big data Hadoop ecosystems such as HDFS, Hive, Spark, Data Bricks, Kafka, Yarn on AWS, Azure cloud services and Cloud rational databases.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple files formats & transforming the data on EMR/ HDInsight into the customer usage patterns.
  • Hands on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Expertise in data processing semi-structured data (CSV, Parquet, XML and JSON) in Hive/ Spark by using Python programming.
  • Experienced in using Hadoop eco system implemented batch mode and real-time data from multiple source system such as web services and loaded data in S3/RDS.
  • Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Daemons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing).
  • Legacy Informatica logical code migrated into Python, Spark, Data Bricks used for ETL process and loaded Dataset results into cloud storage S3/ blob/ relational databases.
  • Migrated SQL Server Database into multi cluster Snowflake environment, created data sharing multiple applications and snowflake virtual warehouses based on data volume/ Jobs.
  • Good understanding NoSQL Mongo Database and front-end application event message information stored into Mongo Database.
  • Informatica Development/ Architect/ Administration with Informatica Power center, Data Quality & Informatica BDM, Informatica Intelligent cloud services (IICS) products with using On-premise, Azure, Amazon web services.
  • Data Profile, Analysis, data cleansing, Address Validation, fuzzy matching/ Merging, data conversion, exception handling on Informatica Data Quality 10.1 (IDQ).
  • Hands-on using various AWS Services including EC2, EMR cluster, Redshift, Data Bricks, S3 Buckets, AWS Kinesis and IaaS /PaaS/SaaS.
  • Hands-on using various Azure Services including Azure virtual machine, Blob storage, Data Lake, Data factory, Azure SQL, PostgreSQL, HDInsight.
  • Virtualized the servers using Docker for the test/ dev-environments also configuration automation using Docker containers.
  • Configuration Management and source code repository management using tools like GIT, TFS.

TECHNICAL SKILLS

Big Data: Hadoop HDFS, Hive, Spark 2.x, Python 3.x, Kafka, CDH5x, DataBricks, Stream set

Cloud Services: AWS- EC2, S3, EMR, Kinesis, AMI, Cloud watch, Docker, Redshift, VPC, IPaas. Azure- Azure VM, Blob, Data Lake, Data Factory, HDInsight, AD, Docker.

ETL: Informatica 10x/9x/8x, PowerCenter, Data Quality, IICS, BDM, MDM.

Database: SQL Server 2016/ 2012, SQL Azure, Snowflake, Mango DB, RDS, Oracle 11g/10g

OLAP Tools: OBIEE11g/ OBIEE10g, Business Objects, Tableau

Operating Systems/Tool: Red Hat 7/6, Amazon Linux, Ubuntu, Windows Server 2016/201.x, Jupiter notebook

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Cloud Data Engineer

Responsibilities:

  • Analyse, design and build Modern scalable distributed data solutions using with Hadoop, AWS cloud services.
  • Experienced in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries and Python Spark SQL.
  • Loaded the data into Spark RDD and in-memory data computation to generate the output response stored datasets into HDFS/ Amazon S3 storage/ relational databases.
  • Legacy Informatica batch/ real time ETL logical code migrated into Hadoop using Python, Spark Context, Spark-SQL, Data Frames and Pair RDD’s in Data Bricks.
  • Experienced in handling large datasets using partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
  • Experienced in Performing tuning of Spark applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Implemented near-real time data processing using Stream Sets and Spark/Databricks framework.
  • Implemented best practices to secure and manage the data in AWS S3 buckets and used the Spark custom framework to load the data from AWS S3 to Redshift.
  • Hands on Amazon EC2, Amazon S3, Amazon RedShift, Amazon EMR, Amazon RDS, Amazon ELB, Amazon Cloud Formation, and other services of the AWS family.
  • Spark Datasets are stored into Snowflake relational databases for perform Analytics reports.
  • Migrated SQL Server Database into multi cluster Snowflake environment and created data sharing multiple applications and created snowflake virtual warehouses based on data volume/ Jobs.
  • Developed Apache Spark jobs using python in test environment for faster data processing and used Spark SQL for querying.
  • Hadoop Spark Docker container are used for validating data load for test/ dev-environments.
  • Experience in job management using fair Scheduling and Developed job processing scripts using Oozie workflow.
  • Perform Informatica Intelligent cloud services (IICS) pilot project on Amazon cloud services.
  • Prepared documents for data pipeline and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.

Environment: Amazon Elastic MapReduce, Spark, Hive, Python, Kafka, RDS, Informatica, SQL Server 2016, Snowflake Mango DB, S3, RDS, Redshift, Docker, Kubernetes.

Confidential, Redmond, DC

Data Engineer/ ETL Developer

Responsibilities:

  • Created Pipelines in Python using Datasets/Pipeline to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Data Lake, Azure SQL Data warehouse.
  • Responsible build Spark applications using Spark-SQL in Databricks for data extraction, transformations from multiple files formats & transforming the data on HDInsight.
  • Develop ETL code for XML, CSV, TXT, JSON sources and loading the data from these sources into relational tables with using Pandas, NumPy on Python.
  • Developed Apache Spark jobs using Python for faster data processing and used Spark SQL for querying.
  • Experience in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core.
  • Spark SQL queries, Data frames and import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into Blob storage.
  • Design and implement streaming solutions using Kafka or Azure Stream Analytics.
  • Build Power BI reports using output source file from Blob storage
  • Migration of on-premise data (SQL Server / MongoDB) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/V2).
  • Design matching plans, help determine best matching algorithm, configure identity matching and analyse matching score using Informatica Data Quality (IDQ).
  • Preform Data Profiling, Data Standardization, Address Validation, Matching and Merging used for Data quality.
  • Created Data Governance business rules for validate at front end application level at real time.
  • Designed and developed complex Informatica mappings by using Lookup, Expression, Update, Sequence generator, Aggregator, Router, Stored Procedure, etc.
  • Informatica PowerCenter, Data Quality upgrade from 9.5 to 9.6 HF 2 and 9.6 to 10.1 for various applications.
  • Setup and configure PowerCenter domain, grid and services, Hands on apply HF and EBF in different Informatica products as well as provide the day by day production support activities.

Environment: Azure HDInsight, Spark, Hive, Python, Kafka, RDS, Informatica Data Quality/ PowerCenter, SQL Server 2016, Mango DB, Blob Storage, Data Lake, Data Factory, RDS, Docker.

Confidential, FL

Sr ETL Developer

Responsibilities:

  • Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Integration Suite.
  • Extensive experience in developing complex mappings in Informatica to load the data from various sources using different transformations like Source Qualifier, Lookup, Expression, Update Strategy etc.
  • Worked with Informatica Data Quality 10.0 (IDQ) Analysis, data cleansing, fuzzy data matching, data conversion, exception handling.
  • Designed and developed transformation rules (business rules) to generate consolidated (fact/summary) data using Informatica ETL tool.
  • Deployed reusable transformation objects such as mapplets to avoid duplication of metadata, reducing the development time.
  • Extracting, cleansing, aggregating, transforming and validating the data to ensure accuracy and consistency.
  • Experience with Informatica Advanced Techniques - Dynamic Caching, Memory Management, Parallel Processing to increase Performance throughput.
  • Extensively involved in Optimization and Tuning of mappings and sessions in Informatica by identifying and eliminating bottlenecks, memory management and parallel threading.
  • Developed Informatica workflows and sessions associated with the mappings using workflow manager; Developed mapplets, reusable mapplets, mappings and source/ target definitions.
  • Preparing all the DB scripts and Informatica objects for the implementation in production environment.

Environment: Informatica 9.6/9.1, PowerCenter, Data Quality, Amazon Cloud, Oracle, SQL server 2016/2014, Windows Server 2016/14.

Confidential, WI

Informatica Developer/ Administrator

Responsibilities:

  • Developed ETL programs using Informatica to implement the business requirements.
  • Hands on in all phases of SDLC from requirement gathering, design, development, testing, Production, user training and support for production environment.
  • Modify the Informatica mappings, transformations, sessions and workflows in Informatica PowerCenter Designer/Manager if any change is requested from clients.
  • Responsible for creating Workflows and sessions using Informatica workflow manager and monitor the workflow run and statistic properties on Informatica Workflow Monitor.
  • Responsible for Defining Mapping parameters and variables and Session parameters according to the requirements and performance related issues.
  • Created various tasks like Event wait, Event Raise and E-mail etc.
  • Created Shell scripts for Automating of Worklets, batch processes and Session schedule using PMCMD.
  • Responsible for design, implementation of Informatica 8.x/9.x platform and continue to support existing Informatica 8.x platform.
  • Upgraded Informatica from 8.x to 9.x and setup Informatica PowerCenter Disaster Recovery and installed Informatica Hotfixes and EBF (emergency bug fix) on servers and updated windows/ Linux security patches on monthly basis.
  • Configured Active Directory LDAP on Admin console to authenticate authorize Developers/ Business users.
  • Configured Informatica Data Quality (IDQ) components like Model Repository Service, DIS Data Integration Service, Content Management services, web services and Business Glossary.
  • Performed System level health checks CPU, Memory utilization, Number of parallel loads (sessions) running on each node and provided recommendations on capacity planning (Disk Space, Memory & CPU etc.)
  • Created Deployment groups/ scripts for migration the code from lower to higher region.
  • Extensively worked on scripts on automation (Auto restart of services, Disk space utilization, clean up the logs directories) and scripts using Informatica Command line utilities.

Environment: Informatica 9.1/8.6 PowerCenter, Power Exchange, Metadata manager, Oracle, SQL server 2014/2012, DB2, RedHat 5.6.

We'd love your feedback!