Azure Big Data Engineer Resume
Waltham, MA
SUMMARY
- Over 12+ years IT experience as a Senior AZURE Big Data Engineer on Azure Cloud and Hadoop.
- Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and migrating on premise databases to Azure Data Lake store using Azure Data factory.
- Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Technical expertise entails Data Warehouse, Enterprise Data Platform (EDP) and Big Data.
- Experienced in managing Azure Data Lake Storage (ADLS), Databricks Delta Lake and an understanding of how to integrate with other Azure Services.
- Advocate and facilitate adoption of DevOps appropriate practices across multiple development environments.
- Experience in PySpark, Spark SQL to handle big data transformation and processing in Azure Databricks
- Migrating On premise databases to Azure Data Lake store using Azure Data Factory.
- Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
- Experience in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
- Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations; perform read/write operations, save the results to output directory into HDFS.
- Hands on experience working on creating delta lake tables and applying partitions for faster querying
- Experience with Azure technologies such as Storage solutions Azure Blob storage, Azure Data Lake Storage gen2, Azure SQL Database
- Good understanding of HDFS Designs, Daemons, HDFS High Availability (HA).
- Hands on experience with data ingestion tools Kafka, Flume and workflow management tools Oozie.
- Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud-based technologies such as Azure Blob Storage and Azure SQL Database
TECHNICAL SKILLS
Programming Languages: Scala, Python, SQL, R.
Big Data Ecosystems: Spark, Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Zookeeper
Database: MySQL, Teradata, PostgreSQL, Oracle.
NoSQL Databases: Apache HBase, Mongo DB, Cassandra
Streaming Frameworks: Kafka, Flume.
GUI Design & UML Modeling: Microsoft Visio, Plant UML
SDLC Methodologies: Agile, Scrum, Waterfall, Kanban
Reporting Tools: Tableau, Excel
Project Management Tools: MS Project, MS SharePoint
Business Analytics Tools: Clear Quest, Requisite Pro, MS SharePoint, Rational Rose, MS Office
IDE: IntelliJ, Jupyter, Anaconda, Databricks, PuTTY, DBeaver, RStudio, Visual Studio.
PROFESSIONAL EXPERIENCE
Confidential, Waltham, MA
Azure Big Data Engineer
Responsibilities:
- Participate in design, implementation, and support of a data warehouse and analytics platform utilizing Azure cloud technology
- Designed and implement data load processes from disparate data sources into Azure Data Lake and subsequent Azure SQL Data Warehouse
- Worked on creating pipelines for ingestion, transformation and custom requirements using Azure Data Factory, Azure Synapse Analytics and Azure Databricks.
- Developed complicated Hive queries to extract data from various sources (Data Lake) and to store it in HDFS.
- Helped in creation of end-to-end data pipelines from external and internal sources to the data warehouse.
- Created automation jobs using Azure Data Factory to normalize and prepare data to be consumed by analytics and the business.
- Used DevOps and automation to support the implementation of an Enterprise configuration management capability based on Open source components.
- Analysed and tuned application performance by optimizing queries, indices, schema, stored procedures and triggers.
- Build ETL pipelines using Azure Synapse Analytics, Azure Data Factory and distributed computing such as Databricks Spark
- Design and build reusable data extraction, transformation, and loading processes by creating Azure Synapse pipelines
- Design and develop real time and batch Spark based data pipelines
- Design, develop, optimize, and maintain data architecture and pipelines that adhere to ETL principles and business goals
- Solve complex data problems to deliver insights that helps our business to achieve their goals
- Worked on querying Parquet Files stored in a data lake, as well as CSV files stored in an external data store.
Confidential, Troy, MI
Big Data Engineer / Hadoop Developer
Responsibilities:
- Designed and developed Hadoop-based Big Data analytic solutions and engaged clients in technical discussions.
- Worked on multiple Azure platforms like Azure Data Factory, Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HD Insight.
- Worked on the creation and implementation of custom Hadoop applications in the Azure environment.
- Created ADF Pipelines to load data from an on-prem to Azure SQL Server database and Azure Data Lake storage.
- Used Cloudera Manager for continous monitoring and managing of the Hadoop cluster for working application teams.
- Used Azure Data Lake Analytics, HDInsight/Databricks to generate Ad Hoc analysis.
- Developed custom ETL solutions, batch processing, and real-time data ingestion pipeline to move data in and out of Hadoop using PySpark and shell scripting.
- Worked on all aspects of data mining, data collection, data cleaning, model development, data validation, and data visualization.
- Translated a set of requirements and data into a usable database schema by creating or recreating adhoc queries, scripts and macros, updates existing queries, creates
- Worked on building data pipelines using Azure Data Factory, Azure Databricks, loading data to Azure Data Lake.
- Handled bringing in enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce, and then loading data into HBase tables.
- Responsible for estimating the cluster size, monitoring, and troubleshooting of the Hadoop cluster.
- Supported the development of performance dashboards that encompass key metrics to be reviewed with senior leadership and sales management
Confidential, Dearborn, MI
Data Engineer / Kafka Developer
Responsibilities:
- Participated in Agile Ceremonies and provide status to the team and product owner
- Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks.
- Integrated data storage options with Spark, notably with Azure Data Lake Storage and Blob storage.
- Worked on creating Spark cluster in both HDInsight's and Azure Databricks environment.
- Created an Oozie workflow to automate the process of loading data into HDFS and Hive.
- Created tables using NoSQL databases like HBase to load massive volumes of semi-structured data from sources.
- Created, provisioned numerous Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
- Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
- Created and developed the UNIX shell scripts for creating reports from Hive data.
- Developed internal and external tables, and used Hive DDLs to create, alter and drop tables.
- Set-up QA environment and updated configurations for implementing scripts with Pig.
Confidential, Sugar Land, TX
Hadoop Developer / Data Analyst
Responsibilities:
- Performed data cleaning, filtering and transformation to develop new data insights.
- Practiced database query languages and technologies (Oracle, SQL, Python) to retrieve data.
- Gathered of business needs for data insights and analysis, creation of supportive visualizations and preparation of data together with data engineers and architects.
- Provided platform and infrastructure support, including cluster administration and tuning.
- Acted as Liaise between Treasury lines of business and Technology team and other Data Management analysts to communicate control status and escalate issues according to defined process
- Created and managed big data pipeline, including Pig/MapReduce/hive
- Installed and configuring Hadoop components on multiple clusters
- Worked collaboratively with users and application teams to optimize query and cluster performance
- Involved capacity management, cluster set up, structure planning, scaling and administration
- Developed data cleansing approach and execution, adherence and enhancements to data governance policies
- Drove and participated in design, development and implementation of future Treasury data controls
- Collaborated with PMO Leads to and provide updates on project timelines, deliverables and obstacles / challenging