Senior Data Analyst Resume
5.00/5 (Submit Your Rating)
New York, NY
SUMMARY
- Senior Data Analyst with 9 years of experience in building on - premises and cloud-based data solutions using cutting edge tools like Azure Data Factory, Data bricks, Spark, SQL, and Python
- Adept at using Databricks for developing ETL and ELT processes using PySpark, Spark-SQL, and Scala
- Proficient in building and orchestrating ETL pipelines using Azure Data Factory and SQL Server Integration Services (SSIS)
- Experienced in implementation of Lakehouse architecture on Azure using Azure Data Lake, Delta Lake, Delta Tables, and Databricks
- In-depth understanding and usage of various Databricks optimization techniques like Partitioning, Bucketing, Adaptive Query Execution, DAG, Dynamic File Pruning, Data Skipping, Optimize, Bin-packing, Z-Order clustering on delta tables.
- Well versed in data warehousing and normalization techniques for architecting data models including star schema, snow-flake schema, and slowly changing dimensions etc.
- Experience in provisioning access to Azure Data Lake from Databricks using Secret Scope, and mounting the storage account into Databricks
- Experienced in large data migration projects from on-prem databases to Azure SQL and Azure Data warehouse.
- Experienced in designing and building data models for OLTP and OLAP applications for a variety of business domains
- Expert in writing efficient SQL queries and optimizing poor performing queries and stored procedures.
- Solid experience in working as a data analyst to manage, document, analyze, and resolve data related issues.
- Pragmatic and proactive in performance tuning on OLTP and Data Warehouse systems (OLAP) for fast and efficient data load and data retrieval
TECHNICAL SKILLS
- Azure Data Factory
- SSIS
- Databricks
- PySpark
- Azure Data Lake
- Delta Lake
- SQL Server
- Oracle
- Spark-SQL
- Python
- Scala
- PySpark
- Azure Key Vault
- Logic Apps
PROFESSIONAL EXPERIENCE
Confidential, New York NY
Senior Data Analyst
Responsibilities:
- Develop, design data models, data structures and ETL jobs for data acquisition and manipulation purposes.
- Develop deep understanding of the data sources, implement data standards, maintain data quality, and master data management.
- Worked with various file formats like CSV, JSON, Parquet, and Snappy Parquet.
- Used Python and Pyspark in Synapse notebooks to efficiently process huge amounts of data and load them into delta lake silver layer
- Utilized Delta caching for faster data processing using Databricks and PySpark.
- Worked on improving poorly performing PySpark code by using coalesce, repartition, query hints, broadcast joins, and delta caching.
- Involved in optimized Delta tables by using Optimize command and applying Z-Order Clustering.
- Transformed batch data from several tables containing tens of thousands of records from SQL Server, MySQL, PostgreSQL, and csv file datasets into data frames using PySpark
- Performed ETL operations in Azure Databricks by connecting to different relational database source systems using JDBC connectors.
- Developed Python scripts to do file validations in Databricks and automated the process using ADF.
- Developed an automated process in Azure cloud which can ingest data daily from web service and load into Azure Data Lake Gen2.
- Used Logic App to take decisional actions based on the workflow.
- Developed custom alerts using Azure Data Factory, SQLDB and Logic App.
- Developed Databricks ETL pipelines using notebooks, Spark Data frames, SPARK SQL, and python scripting.
- Developed complex SQL queries using stored procedures, common table expressions (CTEs), temporary table to support Power BI reports.
- Worked with enterprise Data Modeling team on creation of Logical models.
- Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud-based technologies such as Azure Blob Storage and Azure SQL Database.
- Worked on developing JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data.
- Independently manage development of ETL processes - development to delivery.
Confidential, Tampa FL
Data Analyst
Responsibilities:
- Develop proof of concepts (POC) using Azure Cloud services such as Azure Data factory (ADF), Logic Apps, Data Bricks, Azure SQL DB, Power BI and Azure Data Lake storage (ADLS) to validate proposed solutions and seek feedback from stakeholders.
- Working on getting the data from on prem SQL server database to a Raw Zone in Azure Data Lake by using Azure Data Factory.
- Used Parquet file formats on Azure Blobs to store raw data.
- Monitored the SQL scripts and modified them for improved performance using PySpark SQL
- Develop Azure Databricks notebooks to apply business transformations and perform data cleansing operations. Used spark, python, delta while reading data from parquet files in ADLS to Data frame and writing to destination ADLS location. Partitioned data and implemented schema drift by using metadata JSON file before writing data to destination location in ADLS.
- Implemented ETL platform using Azure Data Factory, Data Bricks, Data Lake, Azure SQL.
- Create External tables in Azure SQL Database for data visualization and reporting purpose.
- Performance improvement of existing views and procs and identify deadlocks, missing indexes and alter indexes where necessary
- Created T-SQL Stored Procedures, User Defined Functions, Indexes, Views, Constraints, Triggers and wrote complex queries using Joins, Aggregate Functions, Sub-queries, Derived tables and CTEs
- Implemented SCD type1, SCDType2 logic using Azure data bricks for Delta tables
- Working on Agile environment where we have a daily standup, weekly sprints, backlog grooming sessions and defining test scenarios and strategies.
Confidential, Albania
SQL Server Developer/Data Analyst
Responsibilities:
- Created Complex ETL Packages using SSIS to extract data from staging tables to base tables, performing sanity check for integrity of the data and removal of bad data.
- Designed and developed SSIS Packages to extract data from various data sources such as Access database, Excel spreadsheet and flat files into SQL server for further Data Analysis and Reporting by using multiple transformations provided by SSIS such as Data Conversion, Conditional Split, merge, union all and lookup transformation and send mail task.
- Analyze source system data, profile data, create metadata definitions; define data attributes for data flow & lineage and creating source-to-target mapping.
- Use Power BI to pull Data from other OLTP AND OLAP systems and from flat files and Excel files to build dashboards and Reports for the business users as Required
- Document technical ETL specifications for the data warehouse. Perform periodic T-SQL code reviews and test plans to ensure high performance, data quality and integrity.
- Used T-SQL to develop and implement procedures and functions. Reviewed and interpreted ongoing business report requirements.
- Create database objects like Stored procedure, Functions, Views, indexes, and triggers to base on the business requirements documents and user requests
- Collaborate with team members in the Dev environment to promote all T-SQL Code, SSIS Package, Reports and all other Database Object into source control using Team Foundation Server (TFS), and GitHub.
- Designed cubes to satisfy the senior management requirement, deployed the cube into different environments and created KPI’s and dashboard reports and extensively written MDX queries in report generation
- Involved in Performance Tuning of Stored Procedures and SQL queries using SQL Profiler and Index Tuning Wizard in SSIS.
- Generated functional and technical documentation including as-built documents, dashboard design specifications, operating manuals, release notes, and security requirements for reports generated and maintained.