We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

5.00/5 (Submit Your Rating)

Plano, TX

SUMMARY

  • Around 9+ years of IT experience as on Azure Cloud. Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
  • Domain expertise in E - commerce, Financial Sector. Skilled in working with cross-functional team for problem solving, project development, process improvement, dynamic environment, meeting deadlines, providing business insights, designing, developing, and implementing data-driven solutions with key performance indicators to solve complex business problems and executing data-driven action-oriented solutions.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns
  • Performed data engineering functions with sound Decision making and technical problems solving: data extraction, transformation, loading, subversion and integration in support of enterprise data infrastructures - data warehousing, operational data stores and master data management.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
  • Experience with MS SQL Server Integration Services (SSIS), T-SQL skills, stored procedures, triggers.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Airflow and Experienced in writing live Real-time Processing using Spark Streaming as a data pipe-line system to meet deadlines.
  • Worked on building ETL pipelines using AWS Glue and Python to create Dashboards in Tableau.
  • Highly Skilled at Python coding using SQL, NumPy, Pandas and Spark for Data Analysis and Model building, deploying, and operating highly available, scalable, and fault-tolerant systems using Amazon Web Services (AWS).
  • Worked in mixed role DevOps: Azure Architect/System Engineering, network operations and data engineering.
  • Good Experience in developing web applications, RESTful web services and APIs using Python Flask, Django; good knowledge of web services with protocols SOAP, REST.
  • Build and execute analytics and reporting across platforms to identify user behavior and analyze trends, patterns, and shifts in user behavior, both independently and in collaboration.
  • Architected several Airflow DAGs (Directed Acyclic Graph) for automating ETL pipelines and Data Infrastructure.
  • Proficient in business analysis, Hive Query language and experienced in Hive performance optimization using Static-Partitioning, Dynamic-Partitioning, and Bucketing and Parallel Execution concepts in ETL.
  • Experience in translating and using Azure cosmos DB database service in a commercial project by working with non-relational database, storing the documents in JSON format designed for high throughput.
  • Strong programming capability using Python along with Hadoop framework utilizing Hadoop Ecosystem projects (HDFS, Spark etc.)
  • Developed Python scripts, UDF's using both Data frames/PLSQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop to reduce technical problems.
  • Experience in working on version control systems like GIT and used Source code management client tools like Git Bash Scripting, GitHub, Git GUI, and other command line processing applications.
  • Designed and engineered on-premises to off-premises CI/CD Docker pipelines (integration and deployment) with ECS, Glue, Lambda, ELK, Firehose, and Kinesis stream.
  • Excellent Communication, team building, dedicated, collaborative, leadership, consistency, influence, Self-motivated, quality assurance, interpersonal, analytical skills, and strong ability to perform in a team as well as individual team member.

TECHNICAL SKILLS

Cloud Infrastructure/ cloud computing: Azure- Azure Databricks, AWS- Redshift

Big Data/Hadoop Tools: Spark, Apache Hive, Big Query, HDFS, Hive, Hadoop

Containers and Orchestration: Docker, Kubernetes, AWS ECS and AWS EKS

Programming Languages: Python, R, Scala, Java, HTML, CSS, Bash

Data Modeling Tools: Toad, Erwin, Star-Schema Modeling, Snowflake-Schema Modeling

Machine Learning Algorithms: Linear Regression, Deep Learning, Decision Trees, Random Forest, NLP

Python: Spark ML-lib, Scikit-learn, Pandas, NumPy, ML Flow

Database Tools: SQL, TSQL, NoSQL, Postgres SQL, DynamoDB, Snowflake, MongoDB, HBase, PLSQL ERP

Reporting and Visualization Tools: Power BI, Tableau and SSRS

Frameworks: Django, Flask

PROFESSIONAL EXPERIENCE

Confidential, Plano, TX

Azure Data Engineer

Responsibilities:

  • Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data.
  • Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Used Liquibase for database migration and JDBC for database connections.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Worked with data transfer from on-premises SQL servers to cloud databases (Azure Synapse Analytics & Azure SQL DB) using Azure Data Factory.
  • Loaded CSV files and JSON files from Azure Data Lake Storage to Synapse using Azure Data Factory.
  • Loaded the historical Data from SQL Managed Instance to Azure Synapse using Azure Data Factory.
  • Implementing One time Data Migration of multistate level data from SQL server to Snowflake by using Python and Snow SQL.
  • Completed online data transfer from AWS S3 to Azure Blob by using Azure Data Factory.
  • Used Azure Migrate to get started migrating your AWS EC2 instances over to Azure.
  • Developed Spark applications usingPySparkandSpark-SQLfor data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
  • Developing ETL pipelines in and out of data warehouse, develop major regulatory and financial reports using advanced SQL queries in Snowflake.
  • Migration of on-premises data (SQL Server / MongoDB) to Azure Data Lake Store (ADLS) using Azure Data Factory (ADF V1/V2).
  • Exposures with Azure Active Directory compatibility. Extensive experience in deployment, migration, patching and troubleshooting of windows 2008 and 2012 R2 Domain Controllers in Active Directory.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • To meet specific business requirements wrote UDF’s inScalaandPySpark.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Hands-on experience on developing SQL Scripts for automation purpose.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

Environment: Azure Data Factory, Azure Databricks, ADLS Gen2, Python, Scala, PySpark, Hive, Spark StreamingSynapse, MS-Azure, Azure SQL Database, Azure functions, Snowflake, BLOB Storage, SQL server, Azure Cosmos DB, Azure Event Hub, Azure Logic Apps.

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

  • Standard in designing the architecture of the multi cloud (GCP, Azure and AWS) organization at the enterprise level and thorough understanding and hands-on experience of Software Development life cycle from Ideation to Implementation.
  • Migrated SQL database to GCP data, Google Analytics, Google Big Query, Data Studio, Google Cloud and cloud data fusion and controlling and granting database access and Migrating On premise databases to GCP, using dataflow and Google Cloud Storage (GCS)
  • Developed Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats in ETL for analyzing & transforming the data to uncover insights into the customer usage, consumption patterns and behavior.
  • Skilled in fact dimensional modeling (Star schema, Snowflake schema), Data Infrastructure, Google Data Fusion, transactional modeling and SCD (Slowly changing dimension) and ETL.
  • Experience in data architecture and engineering data lake leverage GCP services like pub/sub, google cloud storage (GCS), cloud functions, Cloud SQL, Big Query, Big Table.
  • Used dataflow to store excel files, parquet files and retrieve using data using Blob API for distributed systems.
  • Worked on google Big Query, Hadoop, PySpark, Spark SQL and hive used to load, transform data and finish a technical design review.
  • Used Big Query as Source and pulled data to analyze results also using Cloud SQL retrieved data using ETL.
  • Using rest API with Python to ingest Data from and some other site to BIGQUERY. Strong understanding of GCP such as Data Fusion, dataflow Big Query and AWS components such as EC2 and S3.
  • Migrated SQL database to Big Query, Data Bricks and Data warehouse, OpenShift and Controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory and used CLI in ACS (Azure Cogitative Services).
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage, consumption patterns and behavior using Ai.
  • Skilled in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling, ACS and SCD (Slowly changing dimension).
  • Collaborate and directed Data Analysts & Data Scientists; for products that require reporting and practicing data modelling and Azure ML models to ensure that datasets are in place and are used consistently internally/externally with Apache hive.
  • Deployed, managed, and operating scale, highly available, and fault tolerant systems to GCP with Data Governance for care delivery ETL.

Environment: PL/SQL, Python, Docker, PySpark, Microsoft Word/Excel, Flask, AWS S3, AWS Redshift, SnowflakeAWS RDS, DynamoDB, Athena, Lambda, MongoDB, Hive, Sqoop, Tableau

Confidential, Eden Prairie MN

Data Engineer

Responsibilities:

  • Involved in Requirement gathering, business Analysis, Design and Development, testing, and implementation of business rules.
  • Developed Spark programs using Scala to compare the performance of Spark with Hive and SparkSQL.
  • Developed spark streaming application to consume JSON messages from Kafka and perform transformations.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Involved in developing a MapReduce framework that filters bad and unnecessary records.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Responsible for migrating the code base to Amazon EMR and evaluated Amazon eco systems components like Redshift.
  • Collected the logs data from web servers and integrated in to HDFS using Flume
  • Developed Python scripts to clean the raw data.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's
  • Used AWS services like EC2 and S3 for small data sets processing and storage
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Worked on different file formats (ORCFILE, Parquet, Avro) and different Compression Codecs (GZIP, SNAPPY, LZO).
  • Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
  • Worked on importing and exporting data into HDFS and Hive using Sqoop, built analytics on Hive tables using Hive Context in spark Jobs.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS.

Environment: Hadoop, Hive, MapReduce, Sqoop, Kafka, Spark, Yarn, Pig, PySpark, Cassandra, Oozie, Nifi, Solr, Shell Scripting, Hbase, Scala, AWS, Maven, Java, JUnit, agile methodologies, Hortonworks, Soap, Python, Teradata, MySQL.

Confidential

Data Analyst

Responsibilities:

  • Created Database in Microsoft Access by using blank database and created tables and entered dataset manually and Data Types, performed ER Diagram and Basic SQL Queries on that database.
  • Documented the complete process flow to describe program development, logic, testing, and implementation, application integration in agile, coding in Linux, data architecture.
  • Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple heterogeneous information sources.
  • Used SQL and SQL Server for writing simple and complex queries like finding Distinct Values and Unique Values in Data set
  • Wrote application code to do SQL queries in MySQL to organize useful information based on the business requirements.
  • Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with incremental load.
  • Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database ETL.
  • Performed incremental load with several Dataflow tasks and Control Flow Tasks using SSIS.
  • Gathered, analyzed, and translated business requirements, communicated with other departments to collected client business requirements and access available data and optimized SQL queries information technology.
  • Involved in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement the business logic and created clustered and non-clustered indexes.
  • Imported data usingSqoopto load data fromMySQLto HDFS on regular basis.
  • Performed data mapping, created class diagrams and ER diagrams, and used SQL queries to filter data.
  • Extracted data from existing data source and performed ad-hoc queries by using SQL, UNIX, EDI and .NET.
  • Collaborated with ETL, BI and DBA teams while working on, SQL Server, Teradata to Analyze and provide solutions to Data issues and other challenges while implementing the OLAP model for performance metrics

Environment: SQL, ETL, MySQL, SSIS, Linux, DDL, HDFS, UNIX, SQL Server, DML, Sqoop, Dataflow, DBA.

We'd love your feedback!