We provide IT Staff Augmentation Services!

Sr. Azure Data Engineer Resume

4.00/5 (Submit Your Rating)

Tampa, FL

SUMMARY

  • Having 8+ years of experience in IT which includes Analysis, Design, Development of Big Data using s, design and development of web applications.
  • Solid work experience on Big Data Analytics with hands on experience in installing, configuring, and using ecosystem components like Hadoop Map reduce, HDFS, HBase, Hive, Pig, Flume, Cassandra, Kafka and Spark.
  • Perform System and Regression ETL Testing with each release while ensuring all projects complete Regression Testing in Pre - Production, as applicable, before deploying to Production.
  • Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
  • Played a key-role is setting up a 50 node Hadoop cluster utilizing Apache Spark by working closely with the Hadoop Administration team.
  • Good experience with agile methodology.
  • Well versed with big data on AWS cloud services i.e., EC2, S3, Glue, DynamoDB and RedShift
  • Good Understanding of Azure Big data technologies like Azure Data Lake Analytics, Azure Data Lake Store, Azure Data
  • Factory and created POC in moving the data from flat files and SQL Server using U-SQL jobs.
  • Deploying VM's, Storage, Network and Resource Group through Azure Portal.
  • Creating Storage Pool and Stripping of Disk for Azure Virtual Machines. Backup, Configure an Restore Azure Virtual Machine using Azure Backup.
  • Developed a decision tree classifier using Apache Spark distribution and Scala programming language to provide insight on the type of lens to be prescribed to an individual.
  • Design the platform architecture. This is covering the platform utilities, patterns for deploying new services and new applications.
  • Experience in dealing with log files to extract data and to copy into HDFS using flume.
  • Developed Hadoop test classes using MR unit for checking Input and Output.
  • Experience in integrating Hive and HBase for effective operations.
  • Writing code to create single-threaded, multi-threaded or user interface event driven applications, either stand-alone and those which access servers or services.
  • Good experience in object-oriented design (OOPS) concepts. Experienced with various types of testing i.e., Functional Testing, Trained Java programming language, automation principles, and tools to new hires.
  • Testing, UI Testing Hands on knowledge of writing code in Scala.
  • Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
  • Good working knowledge on Spring Framework.
  • Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
  • Expertise in implementing Service Oriented Architectures (SOA) with XML based Web Services (SOAP/REST).

PROFESSIONAL EXPERIENCE

Sr. Azure Data Engineer

Confidential, Tampa, FL

Responsibilities:

  • Build scalable and reliable ETL systems to pull large and complex data together from different systems efficiently.
  • Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Kafka, Spark with Cloudera distribution.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data
  • Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure) and processing the data in In Azure Databricks.
  • Worked on Developing Data Pipeline to Ingest Hive tables and File Feeds and generate Insights into Cassandra DB,
  • Worked on AI Processor piece of SOI. Well versed with building CI/CD pipelines with Jenkins, used tech stack like Gitlab, Jenkins, Helm, Kubernetes.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Worked on installing, configuring, and monitoring Apache Airflow for running both batch and streaming workflows.
  • Written Scala and python script notebooks for Azure Data bricks transformation task.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Worked on Spark Structured Streaming for developing Live Steaming Data Pipeline with Source as Kafka and Output as Insights into Cassandra DB. The Data was fed in JSON/XML format and then Stored in Cassandra DB.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure DW) and cloud migration processing data in Azure Databricks
  • Used Spark-Streaming APIs to perform necessary transformations.
  • Worked with spark to consume data from Kafka and convert that to common format using Scala.
  • Converted existing Map Reduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
  • Wrote new spark jobs in Scala to analyze the data of the customers and sales history.
  • Developed Scripts, automated data management from end to end, and synchronize up between all the clusters.
  • Used Scala functional programming concepts to develop business logic.
  • Designed and implemented Apache Spark Application (Cloudera)
  • Responsible for Ingestion of Data from Blob to Kusto and maintaining the PPE and PROD pipelines.
  • Expertise in creating HDInsight cluster and Storage Account with End-to-End environment for running the jobs.
  • Developed Json Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Cosmos Activity.
  • Hands-on experience on developing PowerShell Scripts for automation purpose.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).
  • Experience in using Scala Test Fun suite Framework for developing Unit Tests cases and Integration testing.
  • Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
  • Involved in running the Cosmos Scripts in Visual Studio 2017/2015 for checking the diagnostics.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.

Environment: Hadoop, HDFS, Hive, Spark, Cloudera, Kafka, PYSpark, Scala, Pig, Cassandra, Agile methods, MySQL, HDFS

Sr. Big Data Engineer

Confidential, Negaunee, MI

Responsibilities:

  • Experienced in development using Cloudera distribution system.
  • Hands - on experience in Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Databricks services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.
  • Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
  • Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
  • As a Hadoop Developer my responsibility is managing the data pipelines and data lake.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Performed Hive test queries on local sample files and HDFS files.
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop.
  • Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python.
  • Develop ETL Process using SPARK, SCALA, HIVE and HBASE.
  • Setting up Clusters and jobs for Azure Databricks.
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Worked on the NoSQL databases HBase and mongo DB.
  • Experienced in Installation, Configuration, and Administration of Informatica Data Quality and Informatica Data Analyst.
  • Expertise in address data cleansing using Informatica Address Doctor to find deliverable personal and business addresses.
  • Analyzed Data Profiling Results and Performed Various Transformations.
  • Written Python scripts to parse JSON documents and load the data in database.
  • Used python APIs for extracting daily data from multiple vendors.

Environment: Hadoop, Azure, Azure Data Lake, Scala, ETL, Hive, Python, Maven, MySQL, Spark, Informatica Tool 10.0, IDQ Informatica Developer Tool 9.6.1 HF3.

Sr. Data Engineer

Confidential, Dearborn, MI

Responsibilities:

  • Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.
  • Experienced in development using Cloudera distribution system.
  • Experienced Data Scientist with over 1 year experience in Data Extraction, Data Modelling, Data Wrangling,
  • Statistical Modeling, Data Mining, Machine Learning and Data Visualization.
  • Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.
  • Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler, and database engine tuning advisor to enhance performance.
  • Implementing and Managing ETL solutions and automating operational processes.
  • Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
  • Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
  • Worked on Big data on AWS cloud services i.e., EC2, S3, EMR and DynamoDB
  • Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates.
  • Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
  • Developed code to handle exceptions and push the code into the exception Kafka topic.
  • Was responsible for ETL and data validation using SQL Server Integration Services.
  • Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).
  • Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.

Environment: SQL Server, Kafka, Python, MapReduce, Oracle 11g, AWS, Redshift, ETL, EC2, S3, Informatica RDS, NOSQL, Terraform, SQL Server, PostgreSQL.

Data Engineer

Confidential, Scottsdale, AZ

Responsibilities:

  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
  • Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse
  • Performed ETL operation using SSIS and loaded the data into Secure DB.
  • Good hands on experience in Data Vault concepts, data models, well versed understanding and implementation on Data warehousing concepts/Data Vault.
  • Designed, reviewed, and created primary objects such as views, indexes based on logical design models, user requirements and physical constraints
  • Worked with stored procedures for data set results for use in Reporting Services to reduce report complexity and to optimize the run time. Exported reports into various formats (PDF, Excel) and resolved formatting issues.
  • Worked on Sqoop to import and export data from database to HDFS and Data Lake on AWS.
  • Designed the packages in order to extract data from SQL DB, flat files and loaded into Oracle database.
  • Performed data migrations from on-prem to Azure Data Factory and Azure Data Lake.
  • Responsible for defining cloud network architecture using Azure virtual networks, VPN and express route to establish connectivity between on premise and cloud
  • Extensively used SQL queries to check storage and accuracy of data in database tables and utilized SQL for querying the SQL database

Environment: Azure Data Factory, Spark (Python/Scala), Hive, Jenkins, Kafka, Spark Streaming, Docker Containers, PostgreSQL, RabbitMQ, Celery, Flask, ELK Stack, MS-Azure, Azure SQL Database, Azure functions Apps, Azure Data Lake, BLOB StorageSQL Server

Data Engineer

Confidential

Responsibilities:

  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools.
  • Engage with business users to gather requirements, design visualizations and trained to use self-service BI tools.
  • Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
  • Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.
  • Designed easy to follow visualizations using Tableau software and published dashboards on web and desktop platforms
  • Worked with Business Owners on perfecting the process, increasing the overall efficiency of the systems
  • Responsible for building scalable distributed data solutions using Azure Data Lake, Azure Databricks. Azure HDInsight

Environment: MS Azure, MSBI, SQL Server, SQL Azure, Tableau, Azure Data bricks, Data Lake.

We'd love your feedback!