We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

0/5 (Submit Your Rating)

DallaS

SUMMARY

  • Highly dedicated, experienced, and inspiring Senior data engineer with 8+ years of IT industry experience in working with Azure and AWS, exploring various tools, technologies, and databases.
  • Experienced in designing and building ETL pipelines, Visualizations using Azure, AWS, and Open - source frameworks.
  • Excellent knowledge of DataValidation, DataAnalysis, DataCleansing,and DataVerification.
  • Experienced in AWS services such as S3, RDS, EC2, IAM, Glue, Redshift, Lambda, Athena, AWS Kinesis, and CloudWatch.
  • Experienced in working on EMR Clusters to modify python scripts and shell scripts.
  • Experienced in using Azure Services like Azure Data Lake Storage, AzureSQL Database, Azure Log Analytics, Azure Stream Analytics, Azure Triggers, HD Insights.
  • Experienced in creating Spark applications within Databricks to extract, transform, and aggregate data from various file formats which can be used to analyze and transform the data into customer usage patterns.
  • Experienced in working with Hadoop technologies such as HDFS, No SQL, and Spark.
  • Proficient with Programming skills in coding in different technologies, i.e., Python, Scala, and Bash.
  • Experienced in developing Spark applications using Spark tools like Spark Streaming, RDD transformations, Spark SQL, Spark MLlib.
  • Implemented Continuous Integration & Deployment (CI/CD) with Jenkins and Azure DevOps.
  • Experienced in real-time data streaming using ApacheKafka.
  • Experienced in databases such as My SQL, Oracle, Dynamo DB, Spreadsheets, Mongo DB.
  • Hands-on experience working on Teradata databases and Snowflake.
  • Experienced in using Monitoring tools such as Splunk and Cloud watch.
  • Experienced in visualization tools such as Tableau, Power BI, and Quick Sight.
  • Hands-on experience with Agile, Waterfall Methodologies.

TECHNICAL SKILLS

AWS: S3, IAM, EC2, Glue, Redshift, Lambda, Athena, RDS, AWS Kinesis, CloudWatch.

AZURE: Azure Data Lake Storage, HD Insights, AzureSQL Database, Azure Log Analytics, Azure Stream Analytics, Databricks, Azure Triggers.

DATABASES: My SQL, Oracle, Dynamo DB, Snowflake, Apache Pyspark, SQL, Mongo DB.

OTHER TECHNOLOGIES: Pandas, Python, Scala, Shell Scripting, GCP Cloud Storage, Big Query.

ETL/BI: Tableau, Power BI, Quick sight

SOFTWARE METHODOLOGY: Agile, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Dallas

Senior Data Engineer

Responsibilities:

  • Designed an ETL architecture for data transfer from the source server to the Data Warehouse.
  • Developed an ETL process in AWS Glue to migrate customer data from external data stores such as S3 into AWS Redshift.
  • Built pipelines to copy the data from multiple sources to destination in AWS Redshift.
  • DevelopedPythoncode using modules such as that manipulate data in formats such as Excel, CSV, JSON, Avro, and Parquet.
  • Developed Pyspark scripts to perform advanced data processing and transformation tasks.
  • Developed Spark application scripts using Pyspark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats.
  • Developed the Pyspark code for ETL in AWS Glue jobs and EMR.
  • Performed Real-time event processing of data from multiple servers in the organization using Kafka.
  • DevelopedAirflowWorkflow to schedule batch and real-time data from source to target.
  • Built different phases of the Software development life cycle using Agile Methodology.
  • Developed and implemented CI/CD pipeline involving Bitbucket, and Jenkins for complete automation from commit to deployment.
  • Worked on ETL Migration services by developing and deploying Lambda functions for generating serverless data pipelines.
  • Used Glue to crawl JSON data stored in Amazon S3 buckets.
  • Implemented Spark in EMR for data processing in AWS Data Lake.
  • Involved in designing and Developing Spark workflows using Scala to pull the data from the AWS S3 bucket.
  • Built a clouddatawarehouse on Snowflake for batch processing and streaming (with snow pipes)
  • Used Athena to transform and clean the data before it was loaded into data warehouses.
  • Experienced in creating logical and physicaldatamodels, including database queries, tables, schema, indexes, and constraints, as per business needs.
  • Experienced in improving the performance of dashboards and visualizing customer data using Amazon Quick sight.
  • Used CloudWatch to monitor and alert production and corporate servers/storage.

Environment: - Python, Pyspark, Kafka, Scala, Glue, S3, Redshift, EMR, Airflow, Snowflake, Amazon Quick Sight, Agile, Bitbucket, Jenkins, Lambda, Athena, CloudWatch.

Confidential, Chicago

Data Engineer

Responsibilities:

  • Migrating customer data through ETLs from Azure Data Lake Storage server toAzureSQL Database usingAzureHD Insights.
  • Used Azure Data Factory to transfer data from Hadoop to Azure Data Lake Storage or other Azure data stores.
  • Used Apache Pyspark to load the data into Azure SQL Database.
  • Involved in executing Spark jobs and SQL queries by creating a cluster using Notebook in Databricks.
  • Used HD Insights for the Extract, Transform, and Load process from Hadoop to Azure.
  • Designed and implemented streaming solutions using Azure Stream Analytics.
  • Involved in developing and maintaining multiple Power BI visualizations and Dashboards as per the requirement.
  • Developed Azure DevOpsdata pipes for CI/CD.
  • Used Snowflake for analyzing and visualizing data in POWER BI.
  • Used Azure Log Analytics to monitor and troubleshoot the process of migration.
  • Involved in using Azure Triggers to monitor the automation of the migration process.
  • Extensive knowledge of working with variousdataformats CSV, JSON, XML, Tabular (Relational/Non-Relationaldatasets)

Environment: - Pyspark, Hadoop, Azure Data Lake Storage, Azure SQL Database, HD Insights, Azure Data Factory, Databricks Azure Stream Analytics, Azure DevOps, Azure Log Analytics, Azure Triggers, Snowflake, Power BI.

Confidential

Data Engineer

Responsibilities:

  • Primarily Responsible for converting the Manual Report system to a fully automated CI/CD Data Pipeline that ingests data from different Marketing platforms to AWS S3 data lake.
  • Deployed the project into Jenkins using the GIT version control system and worked on Jenkins continuous integration tool for deployment of process.
  • Performed data cleaning using pandas and packages in python.
  • Involved in Configuring EC2 Instances and IAM roles and created S3 Data pipe using Boto API to load data from internal data sources.
  • Developed spark SQL scripts and designed the solutions to implement using Pyspark.
  • Worked in the LINUX environment for the development of applications.
  • Involved in Developing PIG scripts to transform the raw data into consumable data as specified by the business users.
  • Worked on No SQL databases such as Mongo DB.
  • Involved in Setting up databases in AWS using RDS for storage using S3 buckets and configuring instance backups to S3 buckets.
  • Used AWS Kinesis streaming application for real-time processing.
  • Designed serverless application CI/CD by using the AWS Serverless application model AWS Lambda.
  • Used Visualization Tools Such as Tableau to get quick business insights into data.
  • Used Splunk to create dashboards, search queries, and reports for multiple applications.

Environment: - Jenkins, Python, SQL, Pyspark, EC2 Instances, IAM roles, crontab, LINUX, PIG Scripts, Mongo DB, RDS, AWS Code Pipeline, AWS Kinesis, AWS Lambda, Tableau, Splunk.

Confidential

SQL Server Database Administrator

Responsibilities:

  • Developed and created views, stored procedures, and functions for databases.
  • Efficiently involved in developing SQL queries for building and testing ETL processes.
  • Used various data modeling techniques to develop the database and was involved in the complete Software Development Life Cycle of the system.
  • Created constraints and triggers by maintaining data integrity.
  • Managed and configured allSQLhealth checks and monitoring using various methods and tools.
  • Used left, right, and inner joins by connecting dynamic and static datasets.
  • Experienced in frequently monitoring database performances, connections, and logs.

Environment: -My SQL Workbench, SQL Queries, stored procedures, views, functions, constraints, triggers, joins, connections, logs.

We'd love your feedback!