Azure Data Engineer Resume
Eagan, MN
SUMMARY
- Having 8 years of experience in Data Engineering and Data Warehouse wif a solid understanding of all phases of SDLC, STLC and Data Lifecycle in different domains like pharmaceutical, technology. Proficient in Azure and Aws Technologies, Data Warehousing Solutions, ETL Tools, Data Ingestion, Cleansing, Staging, Loading and Transforming.
- Gather and understand client requirements; Build data pipelines and perform testing. Perform Data analysis as needed.
- Extensive experience using spark framework in big data echo system
- Hands on experience using Kafka, PySpark, Sqoop data ingestion techniques
- Good understanding of Hadoop cluster admin functions
- Implemented Performance tuning techniques in spark Framework
- Implemented UDF's in Hive
- Extensive hands - on experience using Hive in Big data echo system
- Knowledge on mongo DB
- Create jobs to schedule data pipelines as per requirements and perform Postproduction monitoring of jobs and handle any defects.
- Implemented AWS lambda to ingest teh data from various source system
- Implemented pipeline using AWS EMR
- Hands on experience working wif AWS Redshift
- Experience in Azure Databricks, Azure Data Factory(ADF), Data Lake, Delta Lake, Synapse, Azure Stream Analytics, Azure Event hub and CosmosDB.
- Architectural noledge of Big data & related concepts. Strong understanding of Azure Cloud databases such as Azure SQL Database, SQL managed instance, SQL Elastic pool on Azure and SQL server
- Expertise in Data Warehouse/Data mart, OLTP and OLAP implementations teamed wif Project Scope, Analysis, Requirements gathering, Data modeling, Effort Estimation, ETL Design, ELTL Design, development, System testing, Implementation, and production support.
- Conversant using Configuration management tools like GIT. Good Understanding in Data mining, Data Preprocessing, Machine Learning and Analyzing large data sets using Python, T-SQL, Power BI & Tableau.
- An individual wif a track-record of being a 'self-starter', who can TEMPeffectively decipher assigned tasks, deliver on-time wif a high-level of accuracy, and proactive in solving problems.
- Performed duties as an individual contributor and can manage multiple projects and tasks simultaneously. Excellent Interpersonal, Communication and Analytical skills.
- Good noledge of Data Marts, OLAP, Dimensional Data Modeling wif Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services. Strong analytical and problem-solving skills and teh ability to follow through wif projects from inception to completion.
- Ability to work TEMPeffectively in cross-functional team environments, excellent communication, and interpersonal skills.
- Involved in converting Hive/SQL queries into Spark transformations using PySpark Data frames and Scala.
- Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
- Experience in working wif Software Development (SDLC) and Software Testing Life Cycle (STLC) models - Waterfall, Agile/Scrum.
PROFESSIONAL EXPERIENCE
Azure Data Engineer
Confidential, Eagan, MN
Responsibilities:
- Involved in planning and developing roadmaps and deliverables to advance teh migration of existing SSIS on-premises systems/applications to Azure cloud.
- Used Azure architecture decision making Architect and implemented ETL and data movement solutions using Azure Data Factory(ADF), SSIS.
- Design and implement database solutions in Azure Synapse (SQL Data Warehouse), Azure SQL.
- Implemented ad-hoc analysis solutions using Azure Data Lake Analytics and HDInsight.
- Architect and implemented ETL and data movement solutions using Azure Data Factory, SSIS create and run SSIS Package ADF V2 Azure-SSIS IR.
- Configuring SQL Azure firewall for security mechanism.
- Created Azure SQL database, performed monitoring and restoring of Azure SQL database.
- Performed migration of Microsoft SQL server to Azure SQL database.
- Managed Azure Data Lakes (ADLS) and Data Lake Analytics to integrate wif other Azure Services.
- Used various sources to pull data into Power BI such as SQL Server, Excel, SQL Azure etc.
- Created POWER BI Visualizations and Dashboards as per teh requirements.
- Utilizing Cosmos DB (NoSQL) (AWS equivalent = DynamoDB) for simple data entry, manipulation and transformation; scaling and partitioning for growth.
- Azure cloud migration is process of understanding Azure cloud platform to deliver data needs and provide PAAS solutions using storage and Spark eco systems.
- Using Python based Spark for data ingestion.
- Configured AWS Cloud architecture (configured IAM user and group policies and S3 bucket storage for teh company and backup/migration of company website to AWS wif web hosting on EC2. Training / support provided for AWS certification preparation)
- Develop Azure Data Factory pipelines for Data Flow Orchestration across Azure Data Lake zones
- Configuring and managing company AWS cloud system using services such as IAM and S3; assisting Alliance Lead in getting company Amazon Partner Network (APN) certified
- Web development of company website (and joint venture partnership websites) in line wif company business development and marketing plans; planned migration from WordPress to Drupal Executed Power Shell Scripts to move Data from Lower to Higher Environment.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, PySpark, Kafka, ADF, YARN, Scala, SQL, Git, Azure, AWS.
Data Engineer
Confidential, Fort Lauderdale, FL
Responsibilities:
- Built reporting data warehouse from ERP system using Order Management, Invoice & Service contracts modules.
- Extensive work in Informatica Powercenter.
- Acted as SME for Data Warehouse related processes.
- Performed Data analysis for building Reporting Data Mart.
- Worked wif Reporting developers to oversee teh implementation of report/universe designs.
- Tuned performance of Informatica mappings and sessions for improving teh process and making it efficient after eliminating bottlenecks.
- Worked on complex SQL Queries, PL/SQL procedures and convert them to ETL tasks
- Worked wif PowerShell and UNIX scripts for file transfer, emailing and other file related tasks.
- Worked wif deployments from Dev to UAT, and then to Prod.
- Worked wif Informatica Cloud for data integration between Salesforce, RightNow, Eloqua, Web Services applications
- Implemented Proof of concepts for SOAP & REST APIs
- Built web services mappings and expose them as SOAP wsdl
- Worked wif Reporting developers to oversee teh implementation of reports/dashboard designs in Tableau.
- Assisted users in creating/modifying worksheets and data visualization dashboards in Tableau.
- Tuned and performed optimization techniques for improving report/dashboard performance.
- Assisted report developers wif writing required logic and achieve desired goals.
- Met End Users for gathering and analyzing teh requirements.
- Worked wif Business users to identify root causes for any data gaps and developing corrective actions accordingly.
- Created Ad hoc Oracle data reports for presenting and discussing teh data issues wif Business.
- Performed gap analysis after reviewing requirements.
- Identified data issues wifin DWH dimension and fact tables like missing keys, joins, etc.
- Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system.
- Developing ETL pipelines in Azure Databricks in and out of data warehouse using combination of Python and SQL.
- Exposure to Jupyter Notebook in Azure Databricks environment.
- Build, deploy and monitor Batch and near real time data pipelines to load structured and unstructured data into Azure Data Lake Storage.
- Using Python based Spark for data ingestion.
- Coordinating and providing technical details to reporting developers.
Environment: Informatica Power Center 9.5/9.1, Informatica Cloud, Oracle 10g/11g, SQL Server 2005, Tableau 9.1, Salesforce, RightNow, Eloqua, Web Methods, PowerShell, Unix, PySpark, Azure
Data Engineer
Confidential, Negaunee, MI
Responsibilities:
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Built models using Statistical techniques like Bayesia HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
- Responsible for developing, support and maintenance for teh ETL (Extract, Transform and Load) processes using Oracle and Informatica PowerCenter.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Develop and maintain Big Data Pipeline dat transfers and process several terabytes of data using Apache, Spark, python, Apache Kafka, Hive, and Impala.
- Involved in Designing and Developing Enhancements of CSG using AWS APIS.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Analyzed patterns by calculating autocorrelation wif different time lags.
- Hands on experience wif tools: AI Platform Notebooks, Cloud Functions and Datastore, Cloud SQL
- Implemented scripts dat load Google Big Query data and run queries to export data
- Analyzed clickstream data from Google analytics wif Big Query
- Used Python to dynamically author and schedule workflows wifin Cloud Composer
- Created DAG instance for groups of infrastructure components
- Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in In Azure Databricks.
- Created risk scores for teh customer segments using Regression and Classification techniques.
- Used TEMPPrincipal Component Analysis in feature engineering to analyze high dimensional data.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop
- Created and designed reports dat will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Created various types of data visualizations using Python and Tableau.
- Successfully migrated many Data domains from Oracle platform to Data Lake.
- Architected and designed a future state solution for data warehouse in Data Lake.
- Using Google IAM (Identity Access Management) to create organizational structures, hundreds of workgroups, and many projects. Worked to implement security policy across entire organization, wif built-in auditing to ease compliance processes.
Data Warehouse Developer
Confidential
Responsibilities:
- Worked in various projects and managed teams.
- Involved in Designing teh specifications, Design Documents, Data modeling and design of data warehouse.
- Analyzing existing database schemas and designing star schema models to support teh users reporting needs and requirements.
- Involved in creating teh Logical Model using Normalization and Abstraction technique.
- Implemented Slowly Changing Dimensions (SCDs, Both Type 1 & 2).
- Cleansed teh source data, extracted and transformed data wif business rules, and built reusable mappings, non as 'Mapplets' using Informatica Designer.
- Used parallel processing capabilities and Session-Partitioning and Target Table partitioning utilities
- Extensively worked on tuning (Both Database and Informatica side) and there by improving teh load time.
- Used Informatica Repository Manager for managing all teh repositories (development, test & validation), was also involved in migration of folders from one repository to another.
- Verified and tested teh code and migrated into Integ and UAT environments and also was involved in designing teh specs wif teh Business Users.
- Used PL/SQL scripts and Informatica jobs.
- Written Unix Shell Scripts for getting teh data from all systems to teh data warehousing system. Teh data was standardized to store various business units in tables.
- Wrote User Acceptance Test cases and trained users on new products and features.
- Tested teh target data against source system tables by writing Some QA Procedures.
Environment: Informatica Power Center 7.1.2/8.1, Power Exchange, MQ-Series, Eagle Pace, Autosys, Perl, HP UNIX, Oracle 9i/10g, Sybase, SQL, PL/SQL, SQL * Loader, SQL Navigator, Erwin.
Systems Engineer/ Hadoop Developer
Confidential
Responsibilities:
- Responsible for design and development of Big Data applications using Cloudera/Hortonworks Hadoop.
- Coordinated wif business customers to gather business requirements.
- Migration of huge amounts of data from different databases (me.e., Netezza, Oracle, SQL Server) to Hadoop.
- Importing and exporting data into HDFS from Teradata and vice versa using Sqoop.
- Responsible to manage teh data coming from different sources.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop.
- Developed Apache Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Migrated HiveQL queries on structured into Spark SQL to improve performance.
- Analyzed data using Hadoop components Hive and Pig and created tables in hive for teh end users.
- Involved in writing Hive queries and pig scripts for data analysis to meet teh business requirements.
- Designed and created teh data models, Implemented teh data pipelines dat provides teh timely access to large datasets in Hadoop ecosystem
- Worked on building, testing and deploying teh code to Hadoop clusters on Development and Production Hadoop servers using Cloudera CDH4 Hadoop distribution.
- Created teh pipelines to import teh data from S3, API’s and other vendor applications using PIG, HIVE, bash scripts and Oozie workflows.
- Integrated teh Hadoop ecosystem wif teh Power BI to create data visualization applications for various teams
- Optimized Map Reduce and hive jobs to use HDFS efficiently by using Gzip, LZO, Snappy and ORC compression techniques.