Azure Big Data Engineer Resume Waltham, MA - Hire IT People

SUMMARY

Over 12+ years IT experience as a Senior AZURE Big Data Engineer on Azure Cloud and Hadoop.
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and migrating on premise databases to Azure Data Lake store using Azure Data factory.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Technical expertise entails Data Warehouse, Enterprise Data Platform (EDP) and Big Data.
Experienced in managing Azure Data Lake Storage (ADLS), Databricks Delta Lake and an understanding of how to integrate with other Azure Services.
Advocate and facilitate adoption of DevOps appropriate practices across multiple development environments.
Experience in PySpark, Spark SQL to handle big data transformation and processing in Azure Databricks
Migrating On premise databases to Azure Data Lake store using Azure Data Factory.
Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
Experience in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
Hands on experience working on creating delta lake tables and applying partitions for faster querying
Experience with Azure technologies such as Storage solutions Azure Blob storage, Azure Data Lake Storage gen2, Azure SQL Database
Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
Hands on experience with data ingestion tools Kafka, Flume and workflow management tools Oozie.
Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud-based technologies such as Azure Blob Storage and Azure SQL Database

TECHNICAL SKILLS

Programming Languages: Scala, Python, SQL, R.

Big Data Ecosystems: Spark, Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Zookeeper

Database: MySQL, Teradata, PostgreSQL, Oracle.

NoSQL Databases: Apache HBase, Mongo DB, Cassandra

Streaming Frameworks: Kafka, Flume.

GUI Design & UML Modeling: Microsoft Visio, Plant UML

SDLC Methodologies: Agile, Scrum, Waterfall, Kanban

Reporting Tools: Tableau, Excel

Project Management Tools: MS Project, MS SharePoint

Business Analytics Tools: Clear Quest, Requisite Pro, MS SharePoint, Rational Rose, MS Office

IDE: IntelliJ, Jupyter, Anaconda, Databricks, PuTTY, DBeaver, RStudio, Visual Studio.

PROFESSIONAL EXPERIENCE

Confidential, Waltham, MA

Azure Big Data Engineer

Responsibilities:

Participate in design, implementation, and support of a data warehouse and analytics platform utilizing Azure cloud technology
Designed and implement data load processes from disparate data sources into Azure Data Lake and subsequent Azure SQL Data Warehouse
Worked on creating pipelines for ingestion, transformation and custom requirements using Azure Data Factory, Azure Synapse Analytics and Azure Databricks.
Developed complicated Hive queries to extract data from various sources (Data Lake) and to store it in HDFS.
Helped in creation of end-to-end data pipelines from external and internal sources to the data warehouse.
Created automation jobs using Azure Data Factory to normalize and prepare data to be consumed by analytics and the business.
Used DevOps and automation to support the implementation of an Enterprise configuration management capability based on Open source components.
Analysed and tuned application performance by optimizing queries, indices, schema, stored procedures and triggers.
Build ETL pipelines using Azure Synapse Analytics, Azure Data Factory and distributed computing such as Databricks Spark
Design and build reusable data extraction, transformation, and loading processes by creating Azure Synapse pipelines
Design and develop real time and batch Spark based data pipelines
Design, develop, optimize, and maintain data architecture and pipelines that adhere to ETL principles and business goals
Solve complex data problems to deliver insights that helps our business to achieve their goals
Worked on querying Parquet Files stored in a data lake, as well as CSV files stored in an external data store.

Confidential, Troy, MI

Big Data Engineer / Hadoop Developer

Responsibilities:

Designed and developed Hadoop-based Big Data analytic solutions and engaged clients in technical discussions.
Worked on multiple Azure platforms like Azure Data Factory, Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HD Insight.
Worked on the creation and implementation of custom Hadoop applications in the Azure environment.
Created ADF Pipelines to load data from an on-prem to Azure SQL Server database and Azure Data Lake storage.
Used Cloudera Manager for continous monitoring and managing of the Hadoop cluster for working application teams.
Used Azure Data Lake Analytics, HDInsight/Databricks to generate Ad Hoc analysis.
Developed custom ETL solutions, batch processing, and real-time data ingestion pipeline to move data in and out of Hadoop using PySpark and shell scripting.
Worked on all aspects of data mining, data collection, data cleaning, model development, data validation, and data visualization.
Translated a set of requirements and data into a usable database schema by creating or recreating adhoc queries, scripts and macros, updates existing queries, creates
Worked on building data pipelines using Azure Data Factory, Azure Databricks, loading data to Azure Data Lake.
Handled bringing in enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce, and then loading data into HBase tables.
Responsible for estimating the cluster size, monitoring, and troubleshooting of the Hadoop cluster.
Supported the development of performance dashboards that encompass key metrics to be reviewed with senior leadership and sales management

Confidential, Dearborn, MI

Data Engineer / Kafka Developer

Responsibilities:

Participated in Agile Ceremonies and provide status to the team and product owner
Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks.
Integrated data storage options with Spark, notably with Azure Data Lake Storage and Blob storage.
Worked on creating Spark cluster in both HDInsight's and Azure Databricks environment.
Created an Oozie workflow to automate the process of loading data into HDFS and Hive.
Created tables using NoSQL databases like HBase to load massive volumes of semi-structured data from sources.
Created, provisioned numerous Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
Created and developed the UNIX shell scripts for creating reports from Hive data.
Developed internal and external tables, and used Hive DDLs to create, alter and drop tables.
Set-up QA environment and updated configurations for implementing scripts with Pig.

Confidential, Sugar Land, TX

Hadoop Developer / Data Analyst

Responsibilities:

Performed data cleaning, filtering and transformation to develop new data insights.
Practiced database query languages and technologies (Oracle, SQL, Python) to retrieve data.
Gathered of business needs for data insights and analysis, creation of supportive visualizations and preparation of data together with data engineers and architects.
Provided platform and infrastructure support, including cluster administration and tuning.
Acted as Liaise between Treasury lines of business and Technology team and other Data Management analysts to communicate control status and escalate issues according to defined process
Created and managed big data pipeline, including Pig/MapReduce/hive
Installed and configuring Hadoop components on multiple clusters
Worked collaboratively with users and application teams to optimize query and cluster performance
Involved capacity management, cluster set up, structure planning, scaling and administration
Developed data cleansing approach and execution, adherence and enhancements to data governance policies
Drove and participated in design, development and implementation of future Treasury data controls
Collaborated with PMO Leads to and provide updates on project timelines, deliverables and obstacles / challenging

We provide IT Staff Augmentation Services!

Azure Big Data Engineer Resume

Waltham, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship