Sr Bigdata Engineer (azure) Resume
VA
SUMMARY
- Around 7 years of IT experience involving project development, implementation, deployment and maintenance using big data Hadoop Ecosystem and Cloud related technologies in various sectors with multiprogramming language expertise like Scala, Java, Pyspark.
- Worked on Azure data factory (ADF) and Azure SQL Server, Azure Data Lake Store, Azure Blob Storage Account.
- Created Azure data factory Pipelines to pull data from Blob Storage account to Azure SQL Data warehouse, Azure SQL Data base and Azure Data Lake storage as a full/Incremental load based on frequency at which the data arrives in to corresponding tables/folders.
- Automated the entire process mentioned above through ADF components like Pipelines, Datasets, Connection Strings and Schedule trigger.
- Experience in using various activities present in ADF pipeline like Copy Data, execute pipeline, Execute Stored Procedure, Get Meta data, Lookup, Web for sending Email, For Each Iterator, If Condition etc.
- Expert on writing the Dynamic content expressions in ADF, which enables to pass dynamic values to parameters at pipeline, dataset and activity level.
- Experience in ETL processes. Worked extensively on Data Extraction, Transformation and loading from different data sources using ADF.
- Expertise in creating, maintaining database objects like Tables, Stored Procedures, Indexes, Functions, Views, User defined data types and functions, constraints.
- Experienced in different environments (Dev, Stage, UAT and Production) configuration.
- Experience in providing Logging, Error Handling by using Event Handler for ADF
- Good Knowledge on Data Warehousing concepts and Data marts
- Good Experience in writing MS SQL queries by using various Joins, Sub queries and Stored Procedures with parameters.
- Good Knowledge in Azure Analysis Services Cube Refresh (Processed Full, Processed Default, Processed Delete).
- Expertise in Azure Data Lake Gen2(Raw and Processed Data).
- Experience in REST API call to get the Data in to Azure Data Lake Gen2
- Good Knowledge in Azure Data flow (Minor Transformation functions without Coding).
- Good Knowledge in process of deployment in ADF resources from Development environment to Production Environment
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, YARN, Hive, Pig, Sqoop, HBase, Kafka, Oozie, ImpalaSpark, Spark SQL (Data Frames and Datasets).
Cloud (Azure): Azure Data Factory, Azure Data bricks, Azure Date Lake
Databases: Oracle, Teradata, Netezza and SQL - Server
Languages: Java, Scala, Java, SQL, Shell Scripting.
Operating systems: UNIX/Linux, Windows and Mac OS.
Tools: Maven, SBT, Jenkins, IntelliJ, Eclipse, GIT.
PROFESSIONAL EXPERIENCE
Confidential - VA
Sr Bigdata Engineer (AZURE)
Responsibilities:
- Extracted data from various sources like Oracle, Manual files, Blob Storage.
- Created tables on top of the data using Azure Data bricks.
- Performed basic transformation and joins on the temporary table data.
- Created the Pipelines on premises DB2 database to Azure SQL Server using Azure Data Factory (Using Copy Activity).
- Crating Notifications and Alters in Azure.
- Created SQL queries and performance tuning.
- Developed data pipeline using Data Factory to ingest the data from Source Systems to Azure Data Lake Gen1.
- Developing the Spark-SQL Scripts in Databricks and running those notebooks in Pipelines.
- Developed Apache Spark Applications by using python and Implemented Apache Spark data processing project to handle data from various Data Lake.
- Created azure data workflows to ingest the data from various sources to Azure.
- Used sparkSQL for reading data from external sources and processes the data using Python computation framework.
- Created and deployed Reports and Dashboards in Power BI web services.
Environment: Azure Data factory, Azure Databricks, Azure Data Lake, Pyspark, Spark Sql
Confidential, CO
Azure Data Engineer
Responsibilities:
- Design and development process of ADF pipelines to load incremental data from Data Lake Gen 2
- Created JSON Files and Databricks Notebooks as input to ADF pipelines.
- Developed Onetime Notebook to load the history.
- Developed Spark sql code to transform the data from parquet to Delta format
- Developed a common framework to prepare the data to feed for the machine learning models.
- Design and performance tuning hive tables and queries from storage, file formats and query levels
- Used Hive to analyses the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
- Extracted the data from different Source Systems, Transformed, Cleansed and Validated and load the into Destinations.
- Loading the data from azure data lake to snowflake by using ADF
- Used Snowflake data as a primary data and exported as CSV file through email using ADF and power BI
- Extracted the Data from ADLS Gen 2, transformed and load the data into Azure SQL DB.
- Created the Linked Services for various Source Systems and Target Destinations.
- Created the Datasets for various Source Systems and Target Destinations.
- Parameterized the Datasets and Linked Services using the Parameters and Variables.
- Implemented the Optimization Techniques and Best Practices in Azure Data Factory.
- Created the Data Flows in Azure Data Factory for Transformations to implement Business Rules.
- Used the For Each Loop Activity for iterating the multiple loads as per the business requirement.
- Worked on azure Databricks by using Pyspark to analyze the data
Environment: Spark Scala, Pyspark, Hive, AZURE: ADF, Data Lake Gen2, Databricks, Snowflake
Confidential
Azure Data Engineer
Responsibilities:
- Created the Pipelines on premises DB2 database to Azure SQL Server using Azure Data Factory (Using Copy Activity).
- Crating Notifications and Alters in Azure.
- Created SQL queries and performance tuning.
- Developed data pipeline using Data Factory to ingest the data from Source Systems to Azure Data Lake.
- Developing the Spark-SQL Scripts in Databricks and running those notebooks in Pipelines.
- Developed Apache Spark Applications by using python and Implemented Apache Spark data processing project to handle data from various Data Lake.
- Created azure data workflows to ingest the data from various sources to Azure.
- Used sparkSQL for reading data from external sources and processes the data using Python computation framework.
- Created and deployed Reports and Dashboards in Power BI web services.
Environment: Hive, Sqoop, Spark, Python, Scala, Linux, Impala, SQL Server
Confidential -Sanjose, CA
Research Graduate Assistant
Responsibilities:
- Handle the installation and configuration of a Hadoop cluster.
- Responsible for analyzing and cleansing raw data by performing Hive queries.
- Created Hive tables, loaded data and wrote hive queries that run within the map.
- Extracted the data from RDBMS into HDFS using Sqoop and vice versa.
- Implemented Partitioning, Dynamic Partitioning and Bucketing in Hive.
Confidential
Database Developer
Responsibilities:
- Provided support for all operations focusing on integration of new products and services.
- Liaised with management and development to improve overall access to information.
- Recommended resolutions ensuring optimal performance.
- Assisted in development and execution of data service disaster recovery plans.
- Wrote test cases in JUnit for unit testing of classes.
- Designed and implemented stored procedures views and other application database code objects.
- Maintained SQL scripts indexes and complex queries for analysis and extraction.
- Performed quality testing and assurance for SQL servers.
- Worked with stakeholders’ developers and production teams across units to identify business needs and solution options.
- Ensured best practice application to maintain security and integrity of data.
Environment: Java, JSP, Servlets, JDBC, JavaScript, MySQL.