We provide IT Staff Augmentation Services!

Sr. Azure Data Engineer Resume

0/5 (Submit Your Rating)

Denver, CO

SUMMARY

  • 8 years of IT experience with expertise in Hadoop ecosystem components such as HDFS, Map Reduce, Yarn, HBase, Pig,Sqoop,Spark,SparkSQL,SparkStreaming,andHiveforscalability,distributedcomputing,andhigh - performancecomputing.
  • Experience in working on real time stream processing technologies like Spark structured streaming, Kafka. Worked with Hadoop eco system and real-time analytics tools including PySpark/Spark/Hive/Hadoop CLI/Map Reduce/ Storm/Kafka/Lambda Architecture/Mongo DB.
  • Developed and designed an API (Restful Web Service) for chatbot integration.
  • Familiarity with AWS ecosystem: Redshift, Aurora, Redis, EMR, SNS, SQS, DocDB, Cloud Watch, Step Functions, etc... Hands-on experience
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
  • Experiences Azure Data engineer working on ADF, Data Bricks, PySpark, and Azure Synapse Analytics
  • Expertise with Spark Core, SparkSQL for data processing. Used Apache Spark 2.0 and explored using Data Frames, RDD Spark SQL and Spark Streaming.
  • Great hands-on experience with PySpark for using Spark libraries by using python scripting for data analysis
  • Experience in Implementing Continuous Integration using Jenkins and GIT. Managed GitHub repositories and permissions including branching, merging, and tagging.
  • Experience using Containerized Automation, using Git commands on the container to push, pull and execute Python Scripts.
  • Strong analytical and programming skills in Micro Services, Core Java, Collections Framework, Error Exception Handling, Multi-Threading,
  • Strong Experience in implementing presentation layer using JSP, HTML, CSS, JavaScript, Angular JS, Bootstrap JS, JSTL, jQuery, XML; Data access & Business logic layer using ORM, JDBC, and Mongo Template.
  • Explored using Elastic Search ELK stacks for search implementation, indexing of data for ELK stack. Analyzed performance of different NoSQL
  • Experience using Business intelligence tools like Tableau, PowerBI, and TIBCOSpotfire.
  • Worked on chatbot development for providing relevant product information to the customers.
  • Expertise includes Data modeling, DB Design, Data cleansing, Data Validation, Data mapping identification & documentation, Data Extraction and Load Process from multiple Data sources, Data verification, Data Analysis, Transformation, Integration, Data import, Data export and use of multiple ETL tools.
  • Generated Cubes using SQL Server Analysis Services (SSAS), Identified data source, data source view and deployed created cubes into server.
  • Proficient in multiple RDBMS systems such as DB2, Microsoft SQL Server, Oracle, Snowflake, MySQL, and Teradata.
  • Developed multiple SSAS Objects such as calculated attribute, KPI, perspective, partitions, translations, actions and generated.
  • Experience in both on-prem and cloud solutions. ExtensiveexperiencewithMicrosoftAzureecosystemsuchasADF2, Azure SQL Server, Azure SQLIAAS/SAAS/PAAS, Azure SQLDW, and Azure Blob.
  • Experienced in workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies
  • Extensive experience in Systems Development Life Cycle (SDLC) from Design, Development, and Implementation.
  • Involved in converting SQL queries into Spark transformations using Spark RDD and PySpark concepts.
  • Extremely diligent, strong team player with an ability to take on new roles.

TECHNICAL SKILLS

ETL Tools: Azure Data Factory, Azure Synapse Analytics, AWS Glue, SQL Server Analysis Services, Informatica Power Center, Informatica DT Studio & Informatica Power Exchange

Cloud Technologies: AWS, Microsoft Azure

Containers / Orchestration: Docker, Kubernetes, AWS ECS and AWS EKS

Big Data/Hadoop Tools: Spark, Hive, Big Query, Hadoop

Methodologies: Star Schema Modeling, Snow-Flake Schema Modeling, Logical and Physical Data Modeling, Erwin

Databases: MS SQL Server 2008/2014/2016 , Oracle 12.1/10g/9i, TeradataV2R6/V2R5, DB2

Languages / Scripts: T-SQL, PL/SQL, PostgreSQL, Power Shell, UNIX Shell Script, C, Java, HTML

Tools: Toad, SQL* Loader, Visual Studio, SSMS, Google Data Studio, PVCS, Q.C, Erwin, MS Visio, SSRS, Power BI, D2k (Oracle Forms and Reports, Autosys, GitHub.

Reporting and Visualization Tools: Power BI, Data Studio, Business Objects XI R1/R2/R3,6.x, Xcelsius 2008, Dashboards, Tableau, Tibco Spotfire

Operating Systems: Windows, UNIX, Linux

PROFESSIONAL EXPERIENCE

Confidential, Denver, CO

Sr. Azure Data Engineer

Responsibilities:

  • Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL) and processing the data in In Azure Data bricks.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, and Hive to perform Streaming ETL
  • Worked on Performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Created Partitioned and Bucketed Hive tables in Parquet File Formats with compression and then loaded data into Parquet hive tables.
  • Developed Data pipeline in Azure Data Factory to load and transform text, fault, failure, and part attribute information from a variety of text sources as needed for part identification models.
  • Migrated the pipelines from Azure Data Factory to Azure Synapse Analytics
  • Used Data bricks notebooks, to develop, test and analyze spark jobs before scheduling customized spark jobs
  • Developed REST API endpoint for AzureML Flow Model Serving
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, Data bricks).
  • Enhanced the functionality of existing ADF pipeline by adding new logic to transform the data.
  • Worked on estimating the cluster size, monitoring, and troubleshooting of the Azure data bricks cluster.
  • Handled importing of data from various data sources, performed transformations using Hive, and Map Reduce, loaded data into HDFS and Extracted the data from Oracle into HDFS using Sqoop
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Python.
  • Created PowerBI dashboards designed with large data volumes from data source SQL server.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing, and reviewing data backups and log files.
  • Designed and implemented map-reduce jobs to support distributed processing using java, Hive and Apache Pig.
  • Involved in migration of large amount of data from OLTP to OLAP by using ETL Packages.
  • Created ETL process using SSIS to transfer data from heterogeneous data sources into data warehouse systems with various steps,
  • Created and maintained databases, tables, stored procedures, indexes, database check constraints, and business rules using T-SQL.

Environment: Azure, HDFS, Map Reduce Hive, Sqoop, Pig, Oozie Scheduler, Shell Scripts, Oracle, Azure Data Factory, Synapse Analytics, HBase, PowerBI, Cassandra, Cloudera, JavaScript, JSP, Kafka, Spark, Scala and ETL, Python.

Confidential, Virginia Beach, VA

Sr. Data Engineer

Responsibilities:

  • Involved in creating specifications for ETL processes, finalized requirements, and prepared specification documents
  • Migrated data from on-premises SQL Database to Azure Synapse Analytics using Azure Data Factory, designed optimized database architecture
  • Created Azure Data Factory for copying data from Azure BLOB storage to SQL Server
  • Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight/Data bricks
  • Work with similar Microsoft on-prem data platforms, specifically SQL Server and SSIS, SSRS, and SSAS
  • Create Reusable ADF pipelines to call REST APIs and consume Kafka Events.
  • Developing and configuring Build and Release (CI/CD) processes using Azure DevOps, along with managing application code using Azure GIT with required security standards for .Net and java applications.
  • Migrate the ETL logic, which was currently running in SSIS and MS Access, by Azure Pipeline in Azure data factory without any change in business logic
  • Developed high performant data ingestion pipelines from multiple sources using Azure Data Factory and Azure Data bricks
  • Extensively worked on creating pipelines in Azure Cloud ADFv2 using different activities like Move &Transform, Copy, filter, for each, Data bricks, etc.
  • Develop dynamic Data Factory pipelines using parameters and trigger them as desired using events like file availability on Blob Storage, based on schedule, and via Logic Apps.
  • Writing SQL queries to help the ETL team with a system migration. DDL & DML SQL code to map and migrate data from source to destination new server in Azure DB.
  • Develop Power BI and SSRS reports, Create SSAS Database Cubes to facilitate self-service BI.
  • Utilized Polybase, T-SQL queries to efficiently import huge amounts of data from Azure Data Lake Store to Azure SQL Data warehouse.
  • Created Azure Run book to Scale up and Scale down Azure Analysis Services and Azure SQL Data warehouse.
  • Upgrading Azure SQL Data warehouse Gen1 to Azure SQL Data warehouse Gen2
  • Designed, Developed Azure SQL Data Warehouse Fact and Dimension tables. Used different distributions (Hash, Replicated, and Round-Robin) while creating Fact\Dim.
  • Created Azure Data Factory Pipeline to load data from on-premises SQL Server to Azure Data Lake store.
  • Utilize Azure’s ETL, Azure Data Factory (ADF) services to ingest data from legacy disparate data stores to Azure Data Lake Storage
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real-time and persists into Cassandra and redshift implementing massive data lake pipelines.

Environment: Azure Data Factory, Selenium ER Studio, Teradata13.1, Oracle, Python, Tableau, Hadoop, Spark, Scala, Hive, SQL Server SSIS, SSRS, and SSAS, Kafka, Redshift

Confidential, Dallas, TX

Sr.AWS Data Engineer

Responsibilities:

  • Worked with AWS Elastic Kubernetes Service to manage containerized applications using its nodes, ConfigMaps, selector services and deployed application containers as Pods.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR. Git. S3
  • Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, Dynamo DB
  • Set up and worked on establish secure network communications on cluster and testing of HDFS, Hive, Pig and Map Reduce to access cluster for new users
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Responsible for developing data pipeline using Sqoop extracted data from RDBM Sand store in HDFS.
  • Configure, monitor, and automate Amazon Web Services and involved in deployingthecontentcloudplatformonAmazonWebServicesusingEC2, S3andEBS.
  • Developed reusable framework to be leveraged for future migrations that automates ETL from RDBMS systems to the Data Lake utilizing Spark Data Sources and Hive data objects
  • Integrated Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon SageMaker
  • Implemented the machine learning algorithms using python to predict the quantity a user might want to order for a specific item so we can automatically suggest using kinesis firehose and S3 Data Lake
  • Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon S3 and Amazon Dynamo DB
  • Used Spark SQL for Scala & amp, Python interface that automatically converts DD case classes to schema RDD
  • Import the data from different sources like HDFS/Base into Spark DD and perform computations using PySpark to generate the output response
  • Created Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost for EC2 resources
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Implemented AWS Step Functions to automate and orchestrate the Amazon SageMaker related tasks such as publishing data to S3, training ML model and deploying it for prediction
  • Developed data pipeline using Kafka, Sqoop, and Spark Streaming to ingest customer behavioral data and financial histories into HDFS, HBase and Cassandra for analysis.
  • Developed different process Workflows using Apache NiFito Extract, Transform and Load raw data into HDFS and then process it to Hive tables.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Ran Hadoop streaming jobs to process terabytes of xml's data, utilized clusterco-ordination services through Zookeeper.
  • Worked with HBase, Mongo DB and Cassandra NoSQL databases
  • Developed real-time data pipelines using Kafka and Spark Streaming.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Developed Kafka producer and consumers, HBaseclients, Spark, shark, Streams, and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Created Hive External tables and loaded the data into tables and query data using HQL.

Environment: AWS, Hadoop Cluster, HDFS, Hive,Sqoop,OLAP,datamodeling,AWS,Linux,HBase,ShellScripting,Apache Spark, PySpark.

Confidential, New York, NY

Data Engineer

Responsibilities:

  • Involved in loading and transforming large sets of structured, semi structured, and unstructured data from relational databases into HDFS using Sqoop imports.
  • Working on extracting files from MySQL through Sqoop and placed in HDFS and processed.
  • Ingested data from Teradata using Sqoop into HDFS and worked with highly unstructured and semi structured data.
  • Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker, on GCP. Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploys.
  • Developed Oozie workflows to perform daily, weekly, and monthly incremental loads into hive tables.
  • Migrated complex MapReduce programs into Spark RDD transformations using PySpark.
  • Loaded the data into Spark RDD and did in memory data Computation to generate the Output response.
  • Ingested data in mini-batches and performed RDD transformations on those mini-batches of data.
  • Used Oozie workflow engine to run multiple jobs which run independently.
  • Worked on Kafka while dealing with raw data, by transforming into new Kafka topics for further consumption.
  • Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Developing design documents considering all possible approaches and identifying the best of them.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, and Python.
  • Created a POC to orchestrate the migration from existing Teradata platform to Hadoop infrastructure to increase the efficiency of the data used for analytic sand decision making.
  • Utilized Informatica Power Center ETL tool to extract the data from heterogeneous sources and load them into the target systems.
  • Implemented the Slowly Changing Dimensions to capture the updated master data and load into the target Teradata system according to the business logic.
  • Created mapping variables and data flow logic from source to target systems.
  • Participated in daily status calls with internal team and weekly calls with client and updated the status report.

Environment: Hadoop, MapReduce, HDFS, HiveQL, Pig, Java, Spark, Kafka, Python, Informatic PowerCenter, Maven, Sqoop, Zookeeper, Python.

Confidential

Data Analyst

Responsibilities:

  • Participated in testing of procedures and Data utilizing, PL/SQL to ensure integrity and quality of Data in Data warehouse.
  • Worked to ensure high levels of Data consistency between diverse source systems including flat files, XML and SQL Database.
  • Developed and ran ad-hoc Data queries from multiple database types to identify system of records, Data inconsistencies, and Data quality issues.
  • Developed complex SQL statements to extract the Data and packaging/encrypting Data for delivery to customers.
  • Provided business intelligence analysis to decision-makers using an interactive OLAP tool
  • Created T/SQL statements (select, insert, update, delete) and stored procedures.
  • Defined Data requirements and elements used in XML transactions.
  • Performed Tableau administering by using tableau admin commands.
  • Involved in defining the source to target Data mappings, business rules and Data definitions.
  • Ensured the compliance of the extracts to the Data Quality Center initiatives
  • Metrics reporting, Data mining and trends in helpdesk environment using Access
  • Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple heterogeneous information sources.
  • Built reports and report models using SSRS to enable end user report builder usage.
  • Created Excel charts and pivot tables for the Ad-hoc Data pull.
  • Created Column Store indexes on dimension and fact tables in the OLTP database to enhance read operation.

Environment: SQL, PL/SQL, T/SQL, XML, Tableau, OLAP, SSIS, SSRS, Excel, OLTP.

We'd love your feedback!