Sr. Azure Data Engineer Resume
Cincinnati, OhiO
SUMMARY
- Having 8 + years of experience in IT which includes Analysis, Design, Development of Big Data using, design and development of web applications.
- Experience in creating Data Governance Policies, Business Glossary, Data Dictionary, Reference Data, Metadata, Data Lineage, and Data Quality Rules.
- Solid work experience on Big Data Analytics with hands on experience in installing, configuring, and using ecosystem components like Hadoop Map reduce, HDFS, HBase, Hive, Pig, Flume, Cassandra, Kafka and Spark.
- Worked on Airflow 1.8(Python2) and Airflow 1.9(Python3) for orchestration and familiar with building custom Airflow operators and orchestration of workflows with dependencies involving multi - clouds.
- Strong knowledge of Data Warehousing implementation concept in Redshift. Has done a POC with Matillion and Redshift for DW implementation.
- Perform System and RegressionETLTesting with each release while ensuring all projects complete Regression Testing in Pre-Production, as applicable, before deploying to Production.
- Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
- Played a key-role is setting up a 50 node Hadoop cluster utilizing Apache Spark by working closely with the Hadoop Administration team.
- Good experience with agile methodology. Scheduled different Snowflake jobs using NiFi.
- Well versed withbig data on AWS cloud services i.e., EC2, S3, Glue, DynamoDB and RedShift
- Good Understanding of Azure Big data technologies like Azure Data Lake Analytics, Azure Data Lake Store, Azure Data
- Factory and created POC in moving the data from flat files and SQL Server using U-SQL jobs.
- Deploying VM's, Storage, Network and Resource Group through Azure Portal.
- Creating Storage Pool and Stripping of Disk for Azure Virtual Machines. Backup, Configure an Restore Azure Virtual Machine using Azure Backup.
- Developed a decision tree classifier using Apache Spark distribution and Scala programming language to provide insight on the type of lens to be prescribed to an individual.
- Design the platform architecture. This is covering the platform utilities, patterns for deploying new services and new applications.
- Experience in dealing with log files to extract data and to copy into HDFS using flume.
- Developed Hadoop test classes using MR unit for checking Input and Output.
- Experience in integrating Hive and HBase for effective operations.
- Creating rich dashboards using QlikView dashboard and prepared user stories to create compelling dashboards to deliver actionable insights.
- Responsible for building sales, marketing and finance reports, and dashboards.
- Experience in working with QlikView server to create data sources, creating users and QlikView installation and providing training to business users in using QlikView.
- Testing, UI Testing Hands on knowledge of writing code in Scala.
- Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
- Good working knowledge on Spring Framework.
- Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
- Expertise in implementing Service Oriented Architectures (SOA) with XML based Web Services (SOAP/REST).
- Coordinating with source systems owners, day-to-day ETL progress monitoring, Data warehouse target schema design (star schema) and maintenance. Monitor Resources and Applications using AWS Cloud Watch, including creating alarms to monitor metrics such as EBS, EC2, ELB, RDS, S3, SNS and configured notifications for the alarms generated based on events defined.
- Expertise on Talend Data Integration suite and Bigdata Integration Suite for Design and development of ETL/Bigdata code and Mappings for Enterprise DWH ETL Talend Project
- Generated graphs and reports using ggplot package in Studio for analytical models.
TECHNICAL SKILLS
Big Data Technologies: HDFS, Hive, MapReduce, Pig, Hadoop distribution, and HBase, Spark, Spark Streaming, Kafka.
Programming languages: Java, Python,R
Databases: MySQL, MS-SQL Server 20012/16, Oracle, Teradata
No SQL: Mongo DB, Cassandra, HBase
Scripting/Web Languages: HTML5, CSS3, XML, SQL, Shell/Unix, Perl, Python,R studio.
Operating Systems: Linux, Windows XP/7/8/10, Mac.
Software Life Cycle: SDLC, Waterfall and Agile models.
Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, Maven, Alteryx, Visio, Jenkins, Jira, IntelliJ.
Data Visualization Tolls: Tableau, SSRS, Cloud Health.
Cloud Services: AWS (EC2, S3, EMR, RDS, Lambda, CloudWatch, Auto scaling, Redshift, Cloud Formation, Glue etc.), Azure(Databricks, Azure Data Lake, Azure HDInsight)
PROFESSIONAL EXPERIENCE
Confidential - Cincinnati, Ohio
Sr. Azure Data Engineer
Responsibilities:
- Build scalable and reliable ETL systems to pull large and complex data together from different systems efficiently.
- Traced and catalogue data processes, transformation logic and manual adjustments to identify data governance issues.
- Linked data lineage to data quality and business glossary work within the overall data governance program.
- Implemented Data Governance using Excel and Collibra.
- Building the pipelines to copy the data from source to destination in Azure Data Factory (ADF V1).
- Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Kafka, Spark with Cloudera distribution.
- Transforming data in Azure Data Factory with the ADF Transformations.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data
- Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure) and processing the data in InAzure Databricks. Experience inMoving Data in and out of Windows Azure SQL Databases and Blob Storage.
- Worked on Developing Data Pipeline to Ingest Hive tables and File Feeds and generate Insights into Cassandra DB,
- Worked on AI Processor piece of SOI. Well versed with building CI/CD pipelines with Jenkins, used tech stack like Gitlab, Jenkins, Helm, Kubernetes.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards. Experience working withSQL Database Migration Wizard SQLAzureMW and Azure SQL Database Preview v12.Worked on installing, configuring, and monitoring Apache Airflow for running both batch and streaming workflows.
- Written Scala and python script notebooks for Azure Data bricks transformation task.
- Involved in converting Hive/aw queries into Spark transformations using Spark RDDs and Scala.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure DW) and cloud migration processing data in Azure Databricks.
- Good Understanding of Data ingestion, Airflow Operators for Data Orchestration and other related python libraries.
- Used Spark-Streaming APIs to perform necessary transformations.
- Worked with spark to consume data from Kafka and convert that to common format using Scala.
- Converted existing Map Reduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL APIs.
- Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.
Environment: Hadoop, HDFS, Hive, Spark, Cloudera, Kafka, PySpark, Scala, Pig, Cassandra, Agile methods, MySQL, HDFS.
Confidential - New York, New York
Sr. Big Data Engineer
Responsibilities:
- Experienced in development using Cloudera distribution system.
- Hands - on experience in Azure Cloud Services, Azure Synapse Analytics, SQL Azure, Data Factory, Azure Databricks services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.
- Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
- Developed Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
- Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday and real-time data processing.
- Data Ingestion to one or more cloud Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and cloud migration processing the data in Azure Databricks.
- Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
- Understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation and aggregation technique, Knowledge of tools such as DBT, DataStage
- Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
- Experience working with Azure SQL Database Import and Export Service.
- Created Snow pipe for continuous data load from staged data residing on cloud gateway servers.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Performed Hive test queries on local sample files and HDFS files.
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
- Develop transformation logic using snow pipeline. Hands-on experience with Snowflake utilities, Snow SQL, Snow Pipe, Big Data model techniques using Python / Java.
- ETL pipelines in and out of data warehouse using combination of Python and Snowflakes Snow SQL Writing SQL queries against Snowflake.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop.
- Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Develop ETL Process using SPARK, SCALA, HIVE and HBASE. Setting up Clusters and jobs for Azure Databricks.
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Worked on the NoSQL databases HBase and mongo DB. Expertise in address data cleansing using Informatica
- Analyzed Data Profiling Results and Performed Various Transformations.
- Written Python scripts to parse JSON documents and load the data in database.
- Generated weekly, bi-weekly reports to be sent to client business team using business objects and documented them too.
Environment: Hadoop, Azure, Azure Data Lake, Azure Databricks, Snowflake, Scala, ETL, Hive, Python, Maven, MySQL, Spark, Informatica Tool 10.0, IDQ Informatica Developer Tool 9.6.1 HF3
Confidential - Mountain view, CA
Sr. Data Engineer
Responsibilities:
- Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.
- Experienced in development using Cloudera distribution system.
- Experienced Data Scientist with over 1 year experience in Data Extraction, Data Modelling, Data Wrangling,
- Statistical Modeling, Data Mining, Machine Learning and Data Visualization.
- Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity.
- Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency.
- Defined and deployed monitoring, metrics, and logging systems on AWS. Architect and design serverless application AWS by using AWS Serverless (Lambda) application model.
- Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler, and database engine tuning advisor to enhance performance.
- Implementing and Managing ETL solutions and automating operational processes.
- Used AWS Redshift, S3, Spectrum and Athena services to query large amount data stored on S3 to create a Virtual Data Lake without having to go through ETL process.
- Provided seamless connectivity between BI tools like Tableau and Qlik to Redshift endpoints.
- Manage IAM roles and console access for EC2, RDS and ELB services.
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
- Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
- Worked on Big data on AWS cloud services i.e., EC2, S3, EMR and DynamoDB
- Built S3 buckets and managed policies for it and used S3 bucket and Glacier for storage and backup on AWS.
- Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates.
- Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
- Analyzed data & defined KPIs; created tasks and set dependencies using QlikView Publisher.
- Involved in performance tuning of various QlikView applications.
- Created ad-hoc reports in QlikView and supported the users with the ad hoc reporting queries.
- Provided demo of QlikView dashboards to enhance user’s knowledge on key capabilities of QlikView application.
- Developed code to handle exceptions and push the code into the exception Kafka topic.
- Was responsible for ETL and data validation using SQL Server Integration Services.
- Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).
- Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.
Environment: SQL Server, Kafka, Python, MapReduce, Oracle 11g, AWS, Redshift, ETL, EC2, S3, Informatica RDS, NOSQL, Terraform, SQL Server, PostgreSQL,R studio.
Confidential
Data Analyst
Responsibilities:
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
- Recommended structural changes and enhancements to systems and databases.
- Conducted Design reviews and technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support.
- Created test plan documents for all back-end database modules.
- Optimized queries with modifications in SQL code removed unnecessary columns and eliminated data discrepancies.
Environment: Informatica, Java, My SQL, UNIX, ETL, AWS,Rstudio, SSAS, spark, My SQL, MS Office, MS Excel.