Sr. Azure Data Engineer Resume
Cincinnati, OH
SUMMARY
- Over 7+ years of IT experience in the field of Data Engineering, analysis, modelling, development and Project Management
- Demonstrated expert level technical capabilities in areas of Azure Batch and Interactive solutions, Azure Machine learning solutions and operational zing end to end Azure Cloud Analytics solutions.
- Expert on OpenShift/Kubernetes and complete end to end product life cycle along with containerizing several software stacks including big data technologies.
- Have Data warehousing experience in Business Intelligence Technologies and Database with Extensive Knowledge in Data analysis, TSQL queries, ETL & ELT Process, Reporting Services (using SSRS, Power BI) and Analysis Services using SQL Server /2 SSIS, SSRS and SSAS, SQL Server Agent.
- Have excellent knowledge on Python Collections and Multi - Threading.
- Skilled experience in Python with proven expertise in using new tools and technical developments
- Worked on several python packages like numpy, scipy, pandas, pytables etc.
- Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Pig, Impala, Sqoop, Oozie, Flume, Mahout, Storm, Talend big data technologies.
- Expertise on setting up Hadoop security, data encryption and authorization using Kerberos, TLS/SSL and Apache Sentry respectively.
- Worked on Cloud Administration tasks such as creating Affinity Groups, Storage Accounts, Site - Site VPN, ExpressRoute, DNS servers.
- Worked on several prototype OpenShift projects involving clustered container orchestration and management.
- Experience with different cloud-based storage systems like S3, Azure Blob Storage, Azure DataLake Storage Gen 1 & Gen2.
- Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/ Databricks, NoSQL DB)
- Extremely used the features to optimize performance like Bulk Binds, Bulk Collect, Ref. Cursor and dynamic SQL.
- Created Splunk app for Enterprise Security to identify and address emerging security threats through the use of continuous monitoring, alerting and analytics.
- Actively participated in all phases of the project life cycle including data acquisition, data cleaning and pre-processing, feature engineering, Exploratory data analysis, model building and testing and validation, data visualization and final presentation to the client
- In-depth knowledge of. Snowflake Database, Schema and Table structures.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Pyspark concepts.
- Experience in Creating Teradata SQL scripts using OLAP functions like rank and rank () Over to improve the query performance while pulling the data from large tables.
- Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation
- Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying and analyzing risks using appropriate templates and analysis tools.
- Experience of using TFS (aka Azure DevOps) for development and collaboration.
- Experience with WebServices, SoapUI tool, XML, Validating request and response XML, SOAP and RESTFUL WebService calls.
- Extensive experience in setting up the CI/CD pipelines using Jenkins, Maven, Nexus, GitHub, CHEF and Terraform
- Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL and how it can be used for data transformation as part of a cloud data integration strategy
- Designed and developed continuous deployment pipelines (CI/CD) in Azure DevOps.
- Expert in installing and using Splunk apps for UNIX and LINUX.
- Good understanding of Relational Database Design, Data Warehouse/OLAP concepts and methodologies
- Performed Logical & Physical Data Modeling and delivering Normalized, De-Normalized & Dimensional schemas.
TECHNICAL SKILLS
Big Data Technologies: AWS EMR, S3, EC2-Fleet, Spark-2.2, 2.0 and 1.6, Hortonworks HDP, Hadoop, Mapreduce, Pig, Hive, Apache Spark, SparkSQL, PySpark, Kafka, NoSQL, Elastic Mapreduce(EMR), Hue,YARN, Nifi, Impala, Sqoop, Solr, OOZie.
ETL TOOLS: SQL * Loader, Ascential Data Stage
DATA MODELING: Erwin 4.0, Power Designer, Microsoft Visio 2003, ER Studio
DATABASES: Oracle 12c/11g/10g, MS SQL Server, SQL Azure, MS Access, Teradata
OPERATING SYSTEMS: Windows and UNIX, Sun Solaris, AIX, HP
PROGRAMMING LANGUAGES: SQL, PL/SQL, Visual Basic, Java/J2EE, C, C++, UNIX Shell Scripting, XML
Azure Cloud: Azure Stream Analytics, Azure SQL Database, Azure Data Lake, Azure Databricks, Azure Data Factory (ADF), Azure SQL Data Warehouse, Azure Service Bus, Azure Analysis Service (AAS), Azure Blob Storage, Azure data Explorer (Kusto), Azure Search, Azure App Service. ADLS Gen 2
PROFESSIONAL EXPERIENCE
Confidential, Cincinnati, OH
Sr. Azure Data Engineer
Responsibilities:
- Design and implement end-to-end data solutions (storage, integration, processing, and visualization) inAzure.
- Developed Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
- Architect & implement medium to large scale BI solutions onAzureusingAzureData Platform services (AzureData Lake,Data Factory, Data Lake Analytics, Stream Analytics,AzureSQL DW, HDInsight/ Databricks, NoSQL DB).
- Building, Deployment, Configuration, Management of SPLUNK Cloud instances in a distributed environment which spread across different application environments belonging to multiple lines of business.
- Ingested huge volume and variety of data from disparate source systems into Azure Data Lake Gen2 using Azure Data Factory V2.
- Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
- Developed data marts in Snowflake cloud data warehouse.
- Collaborated with data teams to showcase the projects KPIs by using Big Data system Azure Data Lake, Scope and Azure data Explorer (Kusto)
- Transform data by running a Python activity in Azure Databricks.
- Experience Configuring and managing AzureAD Connect, AzureAD Connect health, Microsoft Azure Active Directory.
- Experience working with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
- Designed VNets and subscriptions to confirm to Azure Network Limits.
- Experience in Syncing the Objects Users, Groups, Workstation from active directory to azure active directory.
- Architect and implement ETL and data movement solutions usingAzureData Factory, SSIS create and run SSIS Package ADF V2Azure-SSIS IR.
- Worked with Terraform Templates to automate the Azure Iaas virtual machines using terraform modules and deployed virtual machine scale sets in production environment.
- Develop ETL data workflows and code for newdata integrationsand maintain/enhance existing workflows and code.
- Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
- Built pipelines to move hashed and un-hashed data from Azure Blob to Datalake.
- Recreating existing application logic and functionality in theAzureData Lake,Data Factory, SQL Database and SQL Data warehouse environment.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Designed and implemented data integration modules for Extract/Transform/Load ETL functions
- Heavily involved in testing Snowflake to understand best possible way to use the cloud resources.
- Utilized SQL*Loader to perform bulk data loads into database tables from external data files.
- Automate provisioning and repetitive tasks using Terraform and Python, Docker container, Service Orchestration.
- Implementation of different exporters to get metrics of the Openshift components and servers.
- Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
- Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker, Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy
- Provide regular support guidance to Splunk project teams on complex solution and issue resolution.
- Release pipelines use Azure AD Application Registration service principal in the Azure DevOps service connections for authentication to Azure.
- Worked on data profiling and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Added assertions to validate the xml in WebServices SOAP and RESTful services.
- Performed analysis on implementing Spark using Scala and wrote spark sample programs using PySpark.
- Used SQLs to test various reports and ETL load jobs in development, QA and production.
- Worked inAgile(usingRally) environment.
Environment: Azure Databricks, Azure Synapse, Azure Data Lake, Azure Stream Analytics, Azure Data Factory (ADF), Azure Active Directory, ADLS Gen 2, SQL Server 2017/2016/2012 , Oracle 12c/11g/10g/9i, MS-Office, XML, Hive, HDFS, Flume, Snowflake, Terraform, Kubernetes, OpenShift, CI/CD, Python, R, Java, Scala, Kerberos, Jira
Confidential, Indianapolis, IN
Azure Data Engineer
Responsibilities:
- Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata.
- Migrating on-prem ETLs from MS SQL server to Azure Cloud using Azure Data Factory and Databricks
- Developed new scripts for gathering network and storage inventory data and make Splunk ingest data.
- Worked on Openshift platform in managing Docker containers and Kubernetes Clusters and Created Kubernetes clusters using ansible playbooks (launch-instan deploy-docker.yml, deploy-kubernetes.yml) on Exoscale.
- Involved in creating the pyspark dataframes in Azure Databricks to read the data from Data lake and use Spark Sql context for transformation.
- Designed and configured Azure Virtual Networks (VNets), subnets, Azure network settings, DHCP address blocks, DNS settings, security policies and routing.
- Data studio workflow orchestrators with custom reducers written in Azure Data Explorer (KUSTO).
- For Log analytics and for better query response used Kusto Explorer and created alerts using Kusto query language.
- Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
- Creation and configuration of dashboards and graphs based on Prometheus information to show the current status of Openshift components, containers, pods, quotas, etc.
- Developed facts and dimensions tables using star and/or snowflake schema
- Written Templates for Azure Infrastructure as code using Terraform to build staging and production environments. Integrated Azure Log Analytics with Azure VMs for monitoring the log files, store them and track metrics and used Terraform as a tool, Managed different infrastructure resources Cloud, VMware, and Docker containers.
- Imported the customer data into Python using Pandas libraries and performed various data analysis - found patterns in data which helped in key decisions for the company
- Dealt with data ambiguity and performed lazy evaluation in pyspark for code optimization.
- Developed Stored Procedures, Views and Complex Queries on Kusto and SQL Server.
- Created pipelines to move data from on-premise servers to Azure Data Lake.
- Extensively designed Data mapping and filtering, consolidation, cleansing, Integration, ETL, and customization of data mart.
- Worked extensively on the migration of different data products from Oracle to Azure.
- Written Automation scripts in PowerShell, which make API calls to Azure DevOps and find users who have not accessed Azure DevOps for more than 90 days (Cost Optimization Project).
- Good experience in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics and Excel data extracts.
- Developed SQL Queries to fetch complex data from different tables in remote databases using joins, database links and Bulk collects.
- Involved in troubleshooting the failed u-sql jobs which has been executed through Azure Data Lake Analytics.
Environment: SQL/Server, Oracle 9i/10g/11g, MS-Office, Azure Data Lake, Azure SQL, Azure Synapse, Azure Data Factory (ADF), CI/CD, Terraform, Databricks, Teradata, ER Studio, XML, Kubernetes, OpenShift, AWS, PySpark, Hive, HDFS, Flume, Sqoop, R connector, Splunk, Python, Java, Scala, Jira
Confidential, Deerfield, Illinois
Data Engineer
Responsibilities:
- Experienced in developing business reports by writing complex SQL queries using views, volatile tables
- Experienced in Automating and Scheduling the Teradata SQL Scripts in UNIX using Korn Shell scripting.
- Wrote several Teradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
- Worked on Offshore-Onshore Model.
- Interaction with the Client and End User to Understand the Requirement and design the high and Low-Level documentation.
- Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
- Analysis of functional and non-functional categorized data elements for data profiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
- Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
- Create Pyspark frame to bring data from DB2 to Amazon S3.
- Experience in working with Splunk authentication and permissions and having significant experience in supporting large scale Splunk deployments.
- Design and prototype of accurate and scalable prediction algorithms using R/R Studio
- Analyzed different types of data to derive insights about relationships between locations, statistical measurements and qualitatively assess the data using R/R Studio
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing
- Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS).
- Experienced in using Pandas, Numpy, SciPy, Scikit-learn to develop various machine learning algorithms.
- Python program which uses splunk for input and delivers the mined information in needed format
- Implement data mapping, data validation, and data modeling and develop SQL queries for ad-hoc reports.
- Worked with data migration team on designing the data migration process.
- Provide UAT support and production deployment.
- Created side by side bars, Scatter Plots, Stacked Bars, Heat Maps, Filled Maps and Symbol Maps according to deliverable specifications.
Environment: Oracle 10g, MS-Office, SQL, Spark, PySpark, Scala, CI/CD, Terraform, Machine Learning, Python, Snowflake, Teradata, Linux, UNIX
Confidential
Data Engineer
Responsibilities:
- Build the new universes in Business Objects as per the user requirements by identifying the requiredTables from Data mart and by defining the universe connections.
- Used Business Objects to create reports based on SQL-queries. Generated executive dashboard reportsWith latest company financial data by business unit and by product.
- Implemented Teradata RDBMS analysis with Business Objects to develop reports, interactive drillCharts, balanced scorecards and dynamic Dashboards.Convert into different Data formats for user/business requirements by streaming data pipeline from various sources Snowflake and unstructured data, Dynamo-db.
- Responsible for requirements gathering, status reporting, creating various metrics, projectsDeliverables.
- Responsible for estimating the cluster size, monitoring and troubleshooting of the Hadoop cluster.
- Responsible for managing MongoDB environment with high availability, performance and scalabilityPerspectives.
- Replaced the existing Map Reduce programs and Hive Queries into Spark application using Scala.
- Developed NoSQL database by using CRUD, Indexing, Replication and Sharing in MongoDB.
- Involved in analyzing and adding new features of Oracle 10g like DBMS SHEDULER create directory,Data pump, CONNECT BY ROOT in existing Oracle 9i application.
- Building ETL data pipeline on Hadoop/Teradata using Hadoop/Pig/Hive/UDFs
- Extensively used Erwin for data modeling and Dimensional Data Modeling by ERWIN.
- Used EXPLAIN PLAN, TKPROF to tune SQL queries.
- Developed BO full client Reports, Web intelligence report in 6.5 and XI R2 and universes with contextand loops.
- Setup full CI/CD pipelines so that each commit a developer makes will go through standard process of software lifecycle and gets tested well enough before it can make it to the production.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data comingfrom UNIX, NoSQL and a variety of portfolios.
Environment: Quality center, Quick Test Professional 8.2, SQL Server, J2EE, UNIX, Snowflake, CI/CD, Python, NoSQLMS Project, Oracle, Web Logic, Shell script, JavaScript, HTML, Microsoft Office Suite 2010, Excel