Cloud Data Engineer Resume Atlanta, GA - Hire IT People

SUMMARY

11+ years of experience in design and development, Integration talents on Big Data, Cloud services, Informatica in Insurance, Banking, Retail and HealthCare Industries.
5 years of experience as Cloud Data Engineering in Big data Hadoop ecosystems such as HDFS, Hive, Spark, Data Bricks, Kafka, Yarn on AWS, Azure cloud services and Cloud rational databases.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple files formats & transforming the data on EMR/ HDInsight into the customer usage patterns.
Hands on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
Expertise in data processing semi-structured data (CSV, Parquet, XML and JSON) in Hive/ Spark by using Python programming.
Experienced in using Hadoop eco system implemented batch mode and real-time data from multiple source system such as web services and loaded data in S3/RDS.
Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Daemons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing).
Legacy Informatica logical code migrated into Python, Spark, Data Bricks used for ETL process and loaded Dataset results into cloud storage S3/ blob/ relational databases.
Migrated SQL Server Database into multi cluster Snowflake environment, created data sharing multiple applications and snowflake virtual warehouses based on data volume/ Jobs.
Good understanding NoSQL Mongo Database and front-end application event message information stored into Mongo Database.
Informatica Development/ Architect/ Administration with Informatica Power center, Data Quality & Informatica BDM, Informatica Intelligent cloud services (IICS) products with using On-premise, Azure, Amazon web services.
Data Profile, Analysis, data cleansing, Address Validation, fuzzy matching/ Merging, data conversion, exception handling on Informatica Data Quality 10.1 (IDQ).
Hands-on using various AWS Services including EC2, EMR cluster, Redshift, Data Bricks, S3 Buckets, AWS Kinesis and IaaS /PaaS/SaaS.
Hands-on using various Azure Services including Azure virtual machine, Blob storage, Data Lake, Data factory, Azure SQL, PostgreSQL, HDInsight.
Virtualized the servers using Docker for the test/ dev-environments also configuration automation using Docker containers.
Configuration Management and source code repository management using tools like GIT, TFS.

TECHNICAL SKILLS

Big Data: Hadoop HDFS, Hive, Spark 2.x, Python 3.x, Kafka, CDH5x, DataBricks, Stream set

Cloud Services: AWS- EC2, S3, EMR, Kinesis, AMI, Cloud watch, Docker, Redshift, VPC, IPaas. Azure- Azure VM, Blob, Data Lake, Data Factory, HDInsight, AD, Docker.

ETL: Informatica 10x/9x/8x, PowerCenter, Data Quality, IICS, BDM, MDM.

Database: SQL Server 2016/ 2012, SQL Azure, Snowflake, Mango DB, RDS, Oracle 11g/10g

OLAP Tools: OBIEE11g/ OBIEE10g, Business Objects, Tableau

Operating Systems/Tool: Red Hat 7/6, Amazon Linux, Ubuntu, Windows Server 2016/201.x, Jupiter notebook

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Cloud Data Engineer

Responsibilities:

Analyse, design and build Modern scalable distributed data solutions using with Hadoop, AWS cloud services.
Experienced in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries and Python Spark SQL.
Loaded the data into Spark RDD and in-memory data computation to generate the output response stored datasets into HDFS/ Amazon S3 storage/ relational databases.
Legacy Informatica batch/ real time ETL logical code migrated into Hadoop using Python, Spark Context, Spark-SQL, Data Frames and Pair RDD’s in Data Bricks.
Experienced in handling large datasets using partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
Experienced in Performing tuning of Spark applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Implemented near-real time data processing using Stream Sets and Spark/Databricks framework.
Implemented best practices to secure and manage the data in AWS S3 buckets and used the Spark custom framework to load the data from AWS S3 to Redshift.
Hands on Amazon EC2, Amazon S3, Amazon RedShift, Amazon EMR, Amazon RDS, Amazon ELB, Amazon Cloud Formation, and other services of the AWS family.
Spark Datasets are stored into Snowflake relational databases for perform Analytics reports.
Migrated SQL Server Database into multi cluster Snowflake environment and created data sharing multiple applications and created snowflake virtual warehouses based on data volume/ Jobs.
Developed Apache Spark jobs using python in test environment for faster data processing and used Spark SQL for querying.
Hadoop Spark Docker container are used for validating data load for test/ dev-environments.
Experience in job management using fair Scheduling and Developed job processing scripts using Oozie workflow.
Perform Informatica Intelligent cloud services (IICS) pilot project on Amazon cloud services.
Prepared documents for data pipeline and Data Migration document for smooth transfer of project from development to testing environment and then to production environment.

Environment: Amazon Elastic MapReduce, Spark, Hive, Python, Kafka, RDS, Informatica, SQL Server 2016, Snowflake Mango DB, S3, RDS, Redshift, Docker, Kubernetes.

Confidential, Redmond, DC

Data Engineer/ ETL Developer

Responsibilities:

Created Pipelines in Python using Datasets/Pipeline to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Data Lake, Azure SQL Data warehouse.
Responsible build Spark applications using Spark-SQL in Databricks for data extraction, transformations from multiple files formats & transforming the data on HDInsight.
Develop ETL code for XML, CSV, TXT, JSON sources and loading the data from these sources into relational tables with using Pandas, NumPy on Python.
Developed Apache Spark jobs using Python for faster data processing and used Spark SQL for querying.
Experience in writing Spark RDD transformations, actions, Data Frames, case classes for the required input data and performed the data transformations using Spark-Core.
Spark SQL queries, Data frames and import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into Blob storage.
Design and implement streaming solutions using Kafka or Azure Stream Analytics.
Build Power BI reports using output source file from Blob storage
Migration of on-premise data (SQL Server / MongoDB) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/V2).
Design matching plans, help determine best matching algorithm, configure identity matching and analyse matching score using Informatica Data Quality (IDQ).
Preform Data Profiling, Data Standardization, Address Validation, Matching and Merging used for Data quality.
Created Data Governance business rules for validate at front end application level at real time.
Designed and developed complex Informatica mappings by using Lookup, Expression, Update, Sequence generator, Aggregator, Router, Stored Procedure, etc.
Informatica PowerCenter, Data Quality upgrade from 9.5 to 9.6 HF 2 and 9.6 to 10.1 for various applications.
Setup and configure PowerCenter domain, grid and services, Hands on apply HF and EBF in different Informatica products as well as provide the day by day production support activities.

Environment: Azure HDInsight, Spark, Hive, Python, Kafka, RDS, Informatica Data Quality/ PowerCenter, SQL Server 2016, Mango DB, Blob Storage, Data Lake, Data Factory, RDS, Docker.

Confidential, FL

Sr ETL Developer

Responsibilities:

Responsible for developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica Integration Suite.
Extensive experience in developing complex mappings in Informatica to load the data from various sources using different transformations like Source Qualifier, Lookup, Expression, Update Strategy etc.
Worked with Informatica Data Quality 10.0 (IDQ) Analysis, data cleansing, fuzzy data matching, data conversion, exception handling.
Designed and developed transformation rules (business rules) to generate consolidated (fact/summary) data using Informatica ETL tool.
Deployed reusable transformation objects such as mapplets to avoid duplication of metadata, reducing the development time.
Extracting, cleansing, aggregating, transforming and validating the data to ensure accuracy and consistency.
Experience with Informatica Advanced Techniques - Dynamic Caching, Memory Management, Parallel Processing to increase Performance throughput.
Extensively involved in Optimization and Tuning of mappings and sessions in Informatica by identifying and eliminating bottlenecks, memory management and parallel threading.
Developed Informatica workflows and sessions associated with the mappings using workflow manager; Developed mapplets, reusable mapplets, mappings and source/ target definitions.
Preparing all the DB scripts and Informatica objects for the implementation in production environment.

Environment: Informatica 9.6/9.1, PowerCenter, Data Quality, Amazon Cloud, Oracle, SQL server 2016/2014, Windows Server 2016/14.

Confidential, WI

Informatica Developer/ Administrator

Responsibilities:

Developed ETL programs using Informatica to implement the business requirements.
Hands on in all phases of SDLC from requirement gathering, design, development, testing, Production, user training and support for production environment.
Modify the Informatica mappings, transformations, sessions and workflows in Informatica PowerCenter Designer/Manager if any change is requested from clients.
Responsible for creating Workflows and sessions using Informatica workflow manager and monitor the workflow run and statistic properties on Informatica Workflow Monitor.
Responsible for Defining Mapping parameters and variables and Session parameters according to the requirements and performance related issues.
Created various tasks like Event wait, Event Raise and E-mail etc.
Created Shell scripts for Automating of Worklets, batch processes and Session schedule using PMCMD.
Responsible for design, implementation of Informatica 8.x/9.x platform and continue to support existing Informatica 8.x platform.
Upgraded Informatica from 8.x to 9.x and setup Informatica PowerCenter Disaster Recovery and installed Informatica Hotfixes and EBF (emergency bug fix) on servers and updated windows/ Linux security patches on monthly basis.
Configured Active Directory LDAP on Admin console to authenticate authorize Developers/ Business users.
Configured Informatica Data Quality (IDQ) components like Model Repository Service, DIS Data Integration Service, Content Management services, web services and Business Glossary.
Performed System level health checks CPU, Memory utilization, Number of parallel loads (sessions) running on each node and provided recommendations on capacity planning (Disk Space, Memory & CPU etc.)
Created Deployment groups/ scripts for migration the code from lower to higher region.
Extensively worked on scripts on automation (Auto restart of services, Disk space utilization, clean up the logs directories) and scripts using Informatica Command line utilities.

Environment: Informatica 9.1/8.6 PowerCenter, Power Exchange, Metadata manager, Oracle, SQL server 2014/2012, DB2, RedHat 5.6.

We provide IT Staff Augmentation Services!

Cloud Data Engineer Resume

Atlanta, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship