We provide IT Staff Augmentation Services!

Sr.data Engineer (aws/gcp) Resume

4.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY

  • A Cloud enthusiastic team player having around 7 years of Experience in the IT industry as a Data Engineer with proven expertise in Software development involving cloud computing platforms like Amazon Web Services (AWS), Azure and Google Cloud (GCP).
  • Extensively worked on AWS Cloud services like EC2, VPC, IAM, RDS, ELB, EMR, ECS, Auto - Scaling, S3, CloudFront, Glacier, Elastic Beanstalk, Lambda, Elastic Cache, Route53, OpsWorks, Cloud Watch, Cloud Formation, RedShift, DynamoDB, SNS, SQS, SES, Kinesis Firehose, Lambda, Cognito IAM.
  • Experience in changing over existing AWS infrastructure to Server less architecture (AWS Lambda, Kinesis) through the creation of a Serverless Architecture using Lambda, API gateway, Route53, S3 buckets.
  • Experience in designing a Terraform and deploying it in cloud deployment manager to spin up resources like cloud virtual networks, Compute Engines in public and private subnets along with AutoScaler in Google Cloud Platform.
  • Experience in Designing, Architecting and implementing scalable cloud-based web applications using AWS and GCP.
  • Set up GCP Firewall rules in order to allow or deny traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency.
  • Experience on various Azure Services like Compute (Web Roles, Worker Roles), Azure Websites, Caching, SQL Azure, NoSQL, Storage, Network services, Azure Active Directory, API Management, Scheduling, Auto Scaling, and PowerShell Automation
  • Experience in migrating on-premises to Windows Azure and building Azure Disaster Recovery Environment and Azure backups from the scratch using PowerShell script.
  • Experience in working on ELK architecture and its components like Elastic search, Log stash and Kibana. Handled installation, administration and configuration of ELK stack on AWS.
  • Expertise in deploying Ansible playbooks in AWS environment using Terraform as well as creating Ansible roles using YAML. Used Ansible to configure Tomcat servers and maintenance.
  • Excellent knowledge ofHadoop components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, and MapReduce programming paradigm.
  • Configured various performance metrics usingAWS Cloud watch & Cloud Trial
  • Experience in Analyzing, Designing, Development, and Implementation of operational database systems (OLTP) and data warehouse systems (OLAP).
  • Expertise in the full life cycle of ETL (Extraction, Transformation, and Loading) using Informatica power center and Apache Airflow
  • Proficiency with AWS developer tools like AWS CLI, CloudFormation templates, and workflows.
  • Experience in automation of code deployment, support, and administrative tasks across multiple cloud providers such as Amazon Web Services, Microsoft Azure, Google Cloud Platform
  • Good experience in ETL along with SQL/, PL/SQL, NoSQL for MongoDB. Developed simple & complex stored procedures in IBM Netezza. Expertise in developing UNIX shell scripts.
  • Good understanding of relational database management systems like Oracle, IBM Netezza, DB2, SQL Server and worked on Data Integration using Informatica for the Extraction transformation and loading of data from various database source systems.
  • Project Planning, Development, Risk and Dependencies Management, with Communication Management (at all levels of the organization), Requirements/Scope Development and Management, and Project Integration Management & Implementation/Deployment.

TECHNICAL SKILLS

Databases: Oracle, MS SQL Server, Amazon RDS

Datawarehouse: AWS Redshift, Cloudera, Spark, Pyspark, Hadoop

ETL Tools: Informatica power center, SSIS, IICS

Query Tools: Aginity, SQL Developer, SQL Navigator, SQL* Plus, PL/SQL Developer, Data modelling

Cloud Ecosystem: Amazon Web Services (S3, EC2, RDS), Azure, GCP

Hadoop Distributions: Cloudera, MapReduce, Hortonworks, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka

Operating Systems: UNIX, Windows95/98/NT/2000/XP/Windows10, MS-DOS

Programming/ Scripting Languages: UNIX, SQL, PL/SQL, Python, C, C++, Java, Scala

PROFESSIONAL EXPERIENCE

Sr.Data Engineer (AWS/GCP)

Confidential, Phoenix AZ

Responsibilities:

  • Designed and set up Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing data.
  • Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with the stakeholders & solution architect.
  • Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
  • Experience in GCP Dataproc, GCS, Cloud functions, BigQuery.
  • Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery.
  • Worked with google data catalog and other google cloud API’s for monitoring, query and billing related analysis for BigQuery usage.
  • Set up and worked on Kerberos authentication principals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3.
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
  • Used Spark SQL for Scala & amp, Python interface that automatically converts RDD case classes to schema RDD.
  • Import the data from different sources like HDFS/HBase into Spark RDD and perform computations using PySpark to generate the output response.
  • Creating Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost for EC2 resources.
  • Importing & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS Packages).
  • Coded Teradata BTEQ scripts to load, transform data, fix defects like SCD 2 date chaining, cleaning up duplicates.
  • Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects.
  • Conducted Data blending, Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server.
  • Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions.
  • Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker related tasks such as publishing data to S3, training ML model and deploying it for prediction.
  • Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon SageMaker.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Amazon SageMaker, DataFlow,BigQuery,Apache Spark, HBase, Apache Kafka, HIVE, SQOOP, Map Reduce, Apache Pig, Python, SSRS, Tableau .

Azure Data Engineer

Confidential, Phoenix AZ

Responsibilities:

  • Meetings with business/user groups to understand the business process, gather requirements, analyze, design, develop and implement according to client requirements.
  • Designing and Developing Azure Data Factory (ADF) extensively for ingesting data from different source systems like relational and Non relational to meet business functional requirements.
  • Designed and Developed event driven architectures using blob triggers and DataFactory.
  • Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Automated jobs using different triggers like Events, Schedules and Tumbling in ADF.
  • Created, provisioned different Databricks clusters, notebooks, jobs and autoscaling.
  • Ingested huge volume and variety of data from disparate source systems into Azure DataLake Gen2 using Azure Data Factory V2.
  • Created several Databricks Spark jobs with Pyspark to perform several tables to table operations.
  • Performed data flow transformation using the data flow activity.
  • Implemented Azure, self-hosted integration runtime in ADF.
  • Developed streaming pipelines using Apache Spark with Python.
  • Created, provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
  • Improved performance by optimizing computing time to process the streaming data and saved cost to the company by optimizing the cluster run time.
  • Perform ongoing monitoring, automation, and refinement of data engineering solutions.
  • Designed and developed a new solution to process the NRT data by using Azure stream analytics, Azure Event Hub and Service Bus Queue.
  • Created Linked service to land the data from SFTP location to Azure Data Lake.
  • Extensively used SQL Server Import and Export Data tool.
  • Working with complex SQL views, Stored Procedures, Triggers, and packages in large databases from various servers.
  • Experience in working on both agile and waterfall methods in a fast pace manner.
  • Generating alerts on the daily metrics of the events to the product people.
  • Extensively used SQL Queries to verify and validate the Database Updates.
  • Suggest fixes to complex issues by doing a thorough analysis of root cause and impact of the defect.
  • Provided 24/7 On-call Production Support for various applications and provided resolution for night-time production job, attend conference calls with business operations, system managers for resolution of issues.

Environment: Azure Data Factory (ADF v2), Azure SQL Database, Azure functions Apps, Azure Data Lake, BLOB Storage, SQL server, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, ADLS Gen 2, Azure Cosmos DB, Azure Event Hub, Azure Machine Learning.

Data Engineer

Confidential, Atlanta - GA

Responsibilities:

  • Responsible for the execution of big data analytics, predictive analytics and machine learning initiatives.
  • Implemented a proof of concept deploying this product in AWS S3 bucket and Snowflake.
  • Utilize AWS services with focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Developed Scala scripts, UDF's using both data frames/SQL and RDD in Spark for data aggregation, queries and writing back into S3 bucket.
  • Experience in data cleansing and data mining.
  • Wrote, compiled, and executed programs as necessary using Apache Spark in Scala to perform ETL jobs with ingested data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL for data analysis and provided to the data scientists for further analysis.
  • Prepared scripts to automate the ingestion process using Python and Scala as needed through various sources such as API, AWS S3,Teradata and snowflake.
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
  • Created scripts to read CSV, json and parquet files from S3 buckets in Python and load into AWS S3, DynamoDB and Snowflake.
  • Implemented AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway

Environment: AgileScrum,MapReduce,Pig,Spark,Scala,Hive,Kafka,Python,Airflow,Parquet,Codecloud, AWS

Big Data Engineer

Confidential, Middleton WI

Responsibilities:

  • Installed and configured Hive, Pig, Sqoop, Oozie on the Hadoop cluster by Setting up and benchmarked Hadoop clusters for internal use.
  • Developed and implemented data acquisition of Jobs using Scala that are implemented using Sqoop, Hive & Pig for optimization of MR Jobs to use HDFS efficiently by using various compression mechanisms with the help of Oozie workflow.
  • In the preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming data to create new features.
  • For the data exploration stage used Hive to get important insights about the processed data from HDFS.
  • Handled importing of data from various data sources, performed transformations using Hive and MapReduce for loading data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Used UDF’s to implement business logic in Hadoop by using Hive to read, write and query the Hadoop data in HBase.
  • Used Cloudera Manager continuous monitoring and managing of the Hadoop cluster for working application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed data pipelines using Sqoop, Pig and Hive to ingest customer member data, clinical, biometrics, lab and claims data into HDFS to perform data analytics.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Used Oozie workflow engine to run multiple Hive and Pig Scripts with the help of Kafka for the real-time processing of data to navigate through data sets in the HDFS storage by loading Log File data directly into HDFS.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action & Java action.
  • Analyzed substantial amounts of data sets to determine optimal way to aggregate and report on it.

Environment: MapReduce, Hive, Pig, My SQL, Cloudera Manager, Sqoop, Oozie, No SQL, Eclipse.

Data Warehouse ETL Developer

Confidential, St.Louis MO

Responsibilities:

  • Responsible for preparing the technical specifications from the business requirements.
  • Analyze the requirement and work out on the solution. Develop and maintain the detailed project documentation.
  • Used Informatica and generated Flat file to load the data from Oracle to Teradata and BTEQ/Fast Load Scripts to do incremental load. Used Stage work and Dw table concept to load data, applied Start Schema concept. Created UDFs in JAVA Transformation to complete some tasks.
  • Design, develop and implement ECTL process for Team for existing tables in Oracle. Wrote BTEQ Scripts in order to support the project.
  • Wrote stored procedures in PL/SQL and UNIX Shell Scripts for Automated execution of jobs
  • Used version control system to manage code in different code streams like Clear case.
  • Performed data-oriented tasks on Master Data projects especially Customer/Party, like standardizing, cleansing, merging, de-duping, determining survivorship rules.
  • Responsible for the design, development, testing and documentation of the Informatica mappings, PL/SQL, Transformation, jobs based on Paypal standards.
  • Initiate, define, manage implementation and enforce DW data QA processes across, Interact with other QA Team. Interacted with the data quality team.
  • Identify opportunities for process optimization, process redesign and development of new processes.
  • Anticipate resolve data integration issues across applications and analyze data sources to highlight data quality issues. Did performance and analysis for Teradata Scripts.
  • Migrate SAS Code to Teradata BTEQ Scripts to do the scoring for score taking in account various parameters like login details, transaction amount etc. Playing with Marketing Data for various reports.

Environment: Oracle 9i, Informatica PC 8.1, PL/SQL, Teradata V2R6, Teradata SQL Assistant, Fast Load, BTEQ Script, SAS Code, Clear case, Pearl Scripts, XML Source .

We'd love your feedback!