We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

0/5 (Submit Your Rating)

St Louis, MO

SUMMARY

  • Over 8+ years of experience as a Data Engineer and Analyst including designing and developing using big data and scalable technologies across various industries.
  • Experience in designing, building, and implementing complete Hadoop ecosystem comprising of HDFS, Hive, Pig, Sqoop, Oozie, HBase and MongoDB.
  • Experience in building notebooks like Jupyter notebook with various operations in spark python.
  • Expert in writing various scripts using Shell Script and Python Scripting.
  • Worked in different Hadoop platforms distributions like Hortonworks, Cloudera and MapR.
  • Experience in working various cloud environments like AWS, Azure and GCP.
  • Worked on migration projects from on prem cluster to Azure HDInsight’s and Azure Databricks.
  • Experienced in batch and streaming ingestion tools like Sqoop, Spark Streaming and Kafka.
  • Experience in GCP tools like BigQuery, Pub/Sub, Dataproc and Data lab.
  • Worked on different tools and utilities like Eclipse, IntelliJ, SBT, Maven
  • Experience in using different optimized file formats like Avro, ORC, Parquet, Sequence.
  • Experience in using Python included Boto3 to supplement automation provided by Ansible and Terraform for tasks such as encrypting Elastic Beanstalk volumes and scheduling Lambda functions for routine AWS tasks.
  • Implemented monitoring around using Elasticsearch and used AWSLambda to run code without managing servers.
  • Experience in developing pipeline using Hive to retrieve the data from Hadoop cluster, Oracle database and used ETL for transforming data.
  • Implemented Hadoop jobs on a EMR cluster performing several Spark, Hive and MapReduce jobs to process data to build recommendation Engines and Behavioral Insights.
  • Deploying and maintaining the production using AWS EC2 instances and ECS with docker.
  • Experience in loading data to Azure Data Lake, Azure SQL Data, Azure SQL Database, and building data pipelines using Azure Databricks, Azure Data Factory.
  • Experience with Azure services like HDInsight, Active Directory, Storage Explorer, Stream Analytics.
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop) NoSQL databases like MongoDB, HBase, Cassandra.
  • Experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Expertise in building CI/CD on AWS environment using AWS Code Commit, Code Build, Code Deploy, and Code Pipeline and experience in using AWS CloudFormation, API Gateway, and AWS Lambda in automation and securing the infrastructure on AWS.
  • Created Automation to create infrastructure for Kafka clusters with different instances as per components in the cluster using Terraform for creating multiple EC2 instances & attaching ephemeral or EBS volumes as per instance type in different availability zones & multiple regions in AWS
  • Expertise in Amazon Web Services (AWS) Cloud Platform which includes services like VPC, DynamoDB, Route 53, Elastic Container Services (ECS), Security Groups, CloudWatch, EC2, S3, Security Groups, Kinesis, Red shift, IAM, CloudFormation, ELB, Cloud Front, Elastic Beanstalk (EBS).
  • Experience working with snowflake for various ETL transformations and analysis.
  • Experience in working with migrations projects like on prem cluster to cloud servers like GCP.
  • Implemented various ETL pipelines in GCP using BigQuery, Data Lab and Data Proc clusters.
  • Experience working with various automation tools like Airflow and Oozie.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, Hive, Yarn, Spark, MapReduce, HBase, Spark, Zookeeper, Airflow, Stream Sets, Oozie, Sqoop, Flume, Pig, Snowflake.

Cloud Services: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP)

AWS services: EC2, S3, EMR, Glue, RDS, Elasticsearch, SQS, EBS, Lambda, Athena, Kinesis, ECS, DynamoDB, Quick Sight, Redshift.

Azure services: Azure Data Factory, Databricks, Azure Active Directory, Blob Storage, Data Lake, SQL Database, SQL Data Warehouse.

Databases: MySQL, Oracle, Teradata, MS SQL, Dynamo DB.

NoSQL Databases: HBase, DynamoDB, Cassandra, MongoDB

ETL/BI Tools: Snowflake, Informatica, Tableau, Power BI, Qlik

Hadoop Distribution: Horton Works, Cloudera and MapR

Scripting Languages: Python, PySpark, Scala Spark, SQL, PowerShell Scripting.

Operating systems: Linux (Ubuntu, Centos, RedHat), Windows (XP/7/8/10)

Version Control: Git, SVN, Bitbucket

Methodologies: Agile, Waterfall

IDE Tools: Eclipse, PyCharm, Jupyter Notebook, Anaconda

PROFESSIONAL EXPERIENCE

Confidential, St. Louis, MO

Sr Data Engineer

Responsibilities:

  • Developed solutions for batch processing with Azure Databricks, ADF and Azure Event.
  • Used ADF as orchestration tool for integrating data from upstream to downstream systems
  • Utilized Informatica PowerCenter to perform data extraction, transformation and loading and modified mappings according to specific business requirements.
  • Designed Collections, performed CRUD operations, utilized aggregate pipelines in MongoDB.
  • Worked on Snowflake Schema, Data Modeling and Elements, and Source to Target Mappings, Interface Matrix, and Design elements.
  • Utilized Informatica PowerCenter to accomplish full phases of data flow from source data (Oracle, SQL Server, flat files) being analyzed before extracted to transformation.
  • Developed scripts to load data to hive from HDFS and involved in ingesting data into Data Warehouse using various data loading techniques.
  • Scheduled Jobs using crontab, run deck and control - M.
  • Analyzed SQL scripts and designed the solutions to implement using PySpark.
  • Implemented ETL pipelines to load JSON and XML data into Hive.
  • Experience in building docker files and container-based deployments on Kubernetes
  • Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Worked on complex Snow Sql and Python Queries in Snowflake.
  • Performed data quality issue analysis using Snow SQL for various business requests.
  • Involved in processing large datasets of different forms including structured, semi-structured and unstructured data.

Environment: Hadoop 3.1, Hive 3.1.2, ETL Tool: Informatica PowerCenter 10.x, 9.x, IDQ, Sqoop 1.4.7, MySQL, Snowflake, Mongo DB, SQL, Python 3.9, XML, JSON, agile, PySpark, AWS, S3, AWS Lambda, Glue, Spark-SQL, Sql Script.

Confidential, Massachusetts

Sr. Data/cloud Engineer

Responsibilities:

  • Included in code migration of quality monitoring tool from Amazon EC2 to AWS Lambda and built logical datasets to administer quality monitoring on snowflake warehouses.
  • Implemented and setup AWSshield, AWS, config, Amazon Macie, and Amazon inspector for security and protection of sensitive data.
  • Automation of cloud infrastructure using Terraform, and application configuration and deployment.
  • Creating and managing access to AWS services for IAM user accounts and for role-based users.
  • Using Tableau, designing dashboard to show operational metrics.
  • Experience integrating AWS services: EC2, S3, Network Protocol, Transit VPC, VPC Peering, VPC Endpoints, VPC Private Link.
  • Used Amazon Elastic Cloud Compute (EC2) infrastructure for tasks and Simple Storage Service (S3) as storage mechanism.
  • Evaluated Snowflake design considerations for any change in the application.
  • Worked on Spark data bricks cluster for estimating the cluster’s size, monitoring on AWScloud.
  • Experienced with Spark Streaming and AWS Kinesis for real-time data processing.
  • Configured the services S3, AWS Glue, EC2 using python Boto 3.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Worked on Kibana dashboards based on log stash data and integrated several source and target systems into Elastic search for near real-time log analysis of end-to-end transaction monitoring.
  • Integrated Apache Airflow with AWS to monitor multi-stage machine learning processes with Amazon SageMaker jobs.
  • Worked on S3 bucket in AWSextensively and moved data from HDFS to AWSSimple Storage Service.
  • Actively participate with Client Stakeholders to gather business requirements and document them for the project plan.
  • Designed AWS Lambda functions in Python an enabler for triggering the shell script to ingest the data into Mongo DB and exporting data from Mongo DB to consumers.
  • Implemented AWS Lambdas to drive real-time monitoring dashboards from system logs.
  • Added support for AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
  • Used AWS S3 Buckets to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines

Environment: AWS, PySpark, Spark Streaming,EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, AmazonSageMaker, Apache Spark, HBase, Apache, HIVE, Map Reduce, Snowflake, Pig, Python, NumPy, Pandas, SSRS, Tableau.

Confidential, Framingham, MA

Sr Data/cloud Engineer

Responsibilities:

  • Design, build, test, and maintain end-to-end data pipelines, data integration, ETL processes, and data management delivery within Azure Cloud using Azure Data Factory, Azure Data Lake Storage and Azure Databricks.
  • Ingesting/Migrating data, applying transformation logic, and continuous data quality checks on the ingested data from various sources.
  • Determining the lifecycle from analysis to production, with a focus on data validation, defining logic and performing transformations according to the business requirements, and creating end to end ETL data pipelines.
  • Manipulated semi-structured and unstructured data using Azure Databricks into Bronze-Sliver-Gold Zones using PySpark as programming language.
  • Using Azure Data Factory (V2), created ingestion pipelines from different sources into Azure Data Lake Storage.
  • Created and maintained Azure resources using a combination of Windows PowerShell and Azure Resource Manager (ARM) templates for unit testing during the DevOps process.
  • Moving data from Azure Blob storage to Azure Data Lake Storage using Azure Data Factory pipelines.
  • Configuring and developing Azure Databricks notebooks using PySpark and Spark SQL for data transformation, aggregations, and extractions from multiple file formats for analyzing the data.
  • Involved into Application Design and Data Architecture using Cloud and Big Data solutions on Azure.
  • Leading the effort for migration of Legacy-system to Microsoft Azure cloud-based solution and re-designing the Legacy Application solutions with minimal changes to run on cloud platform.
  • Maintain code repositories for Databricks notebooks, Data Factory pipelines in GitHub.
  • Created and developed the Stored Procedures, Joins and Triggers to handle complex business rules within Azure environment.
  • Wrote complex SQL statements using CTE’s, Correlated Subqueries, and Joins.

Environment: Azure Data Factory, Azure Data Lake Storage (ADLS), Azure Databricks, Power BI, PySpark, Python, Spark SQL, PowerShell scripting, ETL, GIT, Kanban, Jira.

Confidential

Data Analyst

Responsibilities:

  • Worked on SQL queries in dimensional data warehouses and relational data warehouses.
  • Performed Data Analysis and Data Profiling using Complex SQL queries on various systems.
  • Worked on RDD Architecture and implementing spark operations on RDD and optimizing transformations and actions in Spark.
  • Implemented programs in Spark using Python (Pyspark) packages for performance tuning, optimization, and data quality validations.
  • Developed Kafka Producers and Consumers for streaming millions of records pers second.
  • Implemented a distributing messaging queue to integrate with Cassandra using Apache Kafka.
  • Worked on Tableau to build customized interactive reports, worksheets, and dashboards.
  • Used ETL to develop jobs for extracting, cleaning, transforming, and loading the data from various sources.
  • Developed various modules in spring applications in building REST API’s.
  • Performed in-depth analysis of data and prepared daily reports by using SQL, MS Excel, SharePoint.
  • Created multiple SQL scripts, stored procedures, functions to extract and process the data from data sources and transform them to bring it into the right state.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API

Environment: PySpark, Python, Spark SQL, Confluence, PowerShell scripting, ETL, GIT, Kanban, Jira.

We'd love your feedback!