Sr. Azure Data Engineer Resume
PROFESSIONAL SUMMARY:
- Around 9+ years of experience as a software industry, including 5 years of experience in, Azure cloud services, and 4 years of experience in Data warehouse.
- Experience in Azure Cloud, Azure Data Factory, Azure Data Lake storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB and Data bricks.
- Experience in developing, support and maintenance for the ETL (Extract, Transform and Load) processes using Informatica.
- Experience in developing very complex mappings, reusable transformations, sessions, and workflows using Informatica ETL tool to extract data from various sources and load into targets.
- Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server.
- Experience in DevelopingSparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation, and aggregation from multiple file formats for analyzing and transforming the data to uncover insights into the customer usage patterns.
- Used various file formats like Avro, Parquet, Sequence, Json, ORC and text for loading data, parsing, gathering, and performing transformations.
- Good experience in Hortonworks and Cloudera for Apache Hadoop distributions.
- Designed and created Hive external tables using shared meta-store with Static & Dynamic partitioning, bucketing, and indexing.
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
- Extensive hands on experience tuning spark Jobs.
- Experienced in working with structured data using HiveQL, and optimizing Hive queries.
- Familiarity with libraries like PySpark, Numbly, Pandas, Star base, Matplotlib in python.
- Writing complex SQL queries using joins, group by, nested queries.
- Experience in HBase to load data using connectors and write queries using NOSQL.
- Experience with solid capabilities in exploratory data analysis, statistical analysis, and visualization using R, Python, SQL, and Tableau.
- Running and scheduling workflows using Oozie and Zookeeper, identifying failures and integrating, coordinating, and scheduling jobs.
- In - depth understanding of Snowflake cloud technology.
- Hands on experience on Kafka and Flume to load the log data from multiple sources directly in to HDFS.
- Widely used different features of Teradata such as BTEQ, Fast load, Multifood, SQL Assistant, DDL and DML commands and very good understanding of Teradata UPI and NUPI, secondary indexes and join indexes.
- Having working experience with Building RESTful web services, and RESTful API.
- Implemented proof of concept using AWS technology such as S3 storage, Lambda EMR and Redshift.
- Good understanding of AWS Glue.
TECHNICAL SKILLS:
Big Data Technologies: Hadoop, Map Reduce, HDFS, Sqoop, Hive, HBase, Flume, Kafka, Yarn, Apache Spark.
Databases: Oracle, MySQL, SQL Server, MongoDB, Dynamo DB, Cassandra.
Programming Languages: Python, Pyspark, Shell script, Perl script, SQL, Scala.
Tools: PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, SQL Navigator, SQL Server Management Studio, Eclipse, Postman.
Version Control: SVN, Git, GitHub, Maven
Operating Systems: Windows 10/7/XP/2000/NT/98/95, UNIX, LINUX, OS
Visualization/ Reporting: Tableau, ggplot2, matplotlib, Power BI.
Cloud Tech: Azure, Snowflake, AWS.
PROFESSIONAL EXPERIENCE:
Confidential
Sr. Azure Data Engineer
Responsibilities:
- Architect and implement ETL and data movement solutions using Azure Data Factory, SSIS
- Understand Business requirements, analysis and translate into Application and operational requirements.
- Designed one-time load strategy for moving large databases to Azure SQL DWH.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using Azure Data Factory and HDInsight.
- Created a framework to do data profiling, cleansing, automatic restart ability of batch pipeline and handling rollback strategy.
- Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
- Lead a team of six developers to migrate the application.
- Implemented masking and encryption techniques to protect sensitive data.
- Implemented SSIS IR to run SSIS packages from ADF.
- Capable of using AWS utilities such as EMR, S3 and Cloud Watch to run and monitor Hadoop and Spark jobs on AWS.
- Used AWS Athena extensively to ingest structured data from S3 into other systems such as Redshift or to produce reports.
- Created tables along with sort and distribution keys in AWS Redshift.
- Developed mapping document to map columns from source to target.
- Created azure data factory (ADF pipelines) using Azure blob.
- Performed ETL using Azure Data Bricks. Migrated on-premises Oracle ETL process to Azure Synapse Analytics.
- Involved in migration of large amount of data from OLTP to OLAP by using ETL Packages.
- Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
- Worked on azure data bricks, PySpark, HDInsight, Azure ADW and hive used to load and transform data.
- Implemented and Developing Hive Bucketing and Partitioning.
- Implemented Kafka, spark structured streaming for real time data ingestion.
- Used Azure Data Lake as Source and pulled data using Azure blob.
- Good experience working on analysis tools like Tableau, Splunk for regression analysis, pie charts and bar graphs.
- Developed reports, dashboards using Tableau for quick reviews to be presented to Business and IT users.
- Used stored procedure, lookup, execute pipeline, data flow, copy data, azure function features in ADF.
- Worked on creating star schema for drilling data. Created PySpark procedures, functions, packages to load data.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics.
- Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
- Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark data bricks cluster.
- Creating Databricks notebooks using SQL, Python and automated notebooks using jobs.
- Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data.
- Create and maintain optimal data pipeline architecture in cloudMicrosoft Azure using Data Factory and Azure Databricks
Environment: Hadoop, Hive, Map Reduce, Teradata, SQL, Azure event hubs, Azure synapse, Azure data factory, Azure Databricks.
Confidential
Azure Data Engineer
Responsibilities:
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Working Experience onAzure Databrickscloud to organizing the data into notebooks and making it easy to visualize data using dashboards.
- Performed ETL on data from different source systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
- Created database tables and stored procedures as required for reporting and ETL needs.
- Databricks job configuration, Refactoring of ETL Databricks notebooks
- Worked on managing the Spark Databricazure
- Implemented data ingestion from various source systems using sqoop and PySpark.
- Performed end- to-end Architecture, implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis.
- Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis.
- Hands on experience implementing Spark and Hive jobs performance tuning.
- KS by proper troubleshooting, estimation, and monitoring of the clusters.
- Performed Data Aggregation, Validation and on Azure HDInsight using spark scripts written in Python.
- Performed monitoring and management of the Hadoop cluster by using Azure HDInsight.
- Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL) using SAS/SQL, SAS/macros.
- Generated PL/SQL scripts for data manipulation, validation, and materialized views for remote instances.
- Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
- Good experience working on analysis tools like Tableau, Splunk for regression analysis, pie charts and bar graphs.
- Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
- Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
- Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
- Wrote Python scripts to parse XML documents and load the data in database.
- Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.
- Created HBase tables to store various data formats of data coming from different sources.
- Responsible for importing log files from various sources into HDFS using Flume.
- Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
- Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution
Environment: ADF, Databricks and ADL Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, MapReduce, HDFS, Cloudera, SQL, Apache Kafka, Azure, Python, power BI, Unix, SQL Server.
Confidential
Big data Developer.
Responsibilities:
- Involved in Requirement gathering, Business Analysis and translated business requirements into technical design in Hadoop and Big Data.
- Involved in SQOOP implementation which helps in loading data from various RDBMS sources to Hadoop systems and vice versa.
- Developed Python scripts to extract the data from the web server output files to load into HDFS.
- Written a python script which automates to launch the EMR cluster and configures the Hadoop applications.
- Extensively worked with Avro and Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in PySpark.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions, Documented the systems processes and procedures for future s.
- Involved in Configuring Hadoop cluster and load balancing across the nodes.
- Involved in Hadoop installation, Commissioning, Decommissioning, Balancing, Troubleshooting, Monitoring and, debugging Configuration of multiple nodes using Hortonworks platform.
- Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis.
- Involved in managing and monitoring Hadoop cluster using Cloudera Manager.
- Used Python and Shell scripting to build pipelines.
- Developed data pipeline using Sqoop, HQL, Spark and Kafka to ingest Enterprise message delivery data into HDFS.
- AWS RDS (Relational database services) was created to serve as a Hive Meta store, and it was possible to integrate the Meta data from 20 EMR clusters into a single RDS, avoiding data loss even if the EMR was terminated.
- Worked extensively on migrating our existing on-prem data pipelines to AWS cloud for better scalability and infra structure maintenance.
- Developed workflow in Oozie also in Airflow to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
- Assisted in creating and maintaining technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts.
- Prepared ETL design document which consists of the database structure, change data capture, Error handling, restart and refresh strategies.
- Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.
- Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and loading of massive semi structured and unstructured data. Loaded unstructured data into Hadoop distributed File System (HDFS).
- Skilled in Tableau Desktop versions 10x for data visualization, Reporting and Analysis.
- Created HIVE Tables with dynamic and static partitioning including buckets for efficiency. Also created external tables in HIVE for staging purposes.
- Loaded HIVE tables with data, wrote hive queries which run on MapReduce and Created customized BI tool for manager teams that perform query analytics using HiveQL.
- Aggregated RDDs based on the business requirements and converted RDDs into Data frames saved as temporary hive tables for intermediate processing and stored in HBase/Cassandra and RDBMs.
Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, jQuery.
Confidential
Data Engineer
Responsibilities:
- Anchor artifacts for multiple milestone (application design, code development, testing, and deployment) in software lifecycle.
- Develop Apache Strom program to consume the Alarms in real time streaming from Kafka and enrich the alarm and pass it to EEIM Application.
- Creating rules Engine in Apache Strom to categorize the alarm into Detection, Interrogation & Association types before processing of alarms.
- Responsible to develop EEIM Application as Apache Maven project and commit to code to GIT.
- Analyze the Alarms and enhance the EEIM Application using Apache Strom to predict the root cause of the alarm and exact device where the network failure is happened.
- Worked extensively on migrating our existing on-prem data pipelines to AWS cloud for better scalability and infra structure maintenance.
- Worked extensively on migrating/rewriting existing Oozie jobs to AWS simple workflow
- Accumulate the EEIM Alarm data to the NoSQL database called Mongo DB and retrieve it from Mongo DB when necessary.
- Build Fiber to The Neighborhood or Node (FTTN) Topology and Fiber to The Premises (FTTP) Topology using Apache Spark and Apache Hive.
- Process the system logs using log stash tool and store to elastic search and create dashboard using Kibana.
- Regularly tune performance of Hive queries to improve data processing and retrieving
- Provide the technical support for debugging, code fix, platform issues, missing data points, unreliable data source connections and big data transit issues.
- Developed Java and Python application to call the external REST APIs to retrieve weather, traffic, geocode information.
- Working Experience onAzure Databrickscloud to organizing the data into notebooks and making it easy to visualize data using dashboards.
- Worked on managing the Spark Databricks by proper troubleshooting, estimation, and monitoring of the clusters.
- Performed Data Aggregation, Validation and on Azure HDInsight using spark scripts written in Python.
- Performed monitoring and management of the Hadoop cluster by using Azure HDInsight.
- Worked with Jira, Bit Bucket and source control systems like GiT and SVN and development tools like Jenkins, Artifactory.
Environment: PySpark, MapReduce, HDFS, Sqoop, flume, Kafka, Hive, Pig, HBase, SQL, Shell Scripting, Eclipse, SQL Developer, Git, SVN, JIRA, Unix.
Confidential
Data Warehouse Developer
Responsibilities:
- Creation, manipulation and supporting the SQL Server databases.
- Involved in the Data modeling, Physical and Logical Design of Database
- Helped in integration of the front end with the SQL Server backend.
- Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc on various database objects to obtain the required results.
- Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS)
- Wrote T-SQL statements for retrieval of data and involved in performance tuning of TSQL queries.
- Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc to SQL Server using SSIS/DTS using various features like data conversion etc. Also Created derived columns from the present columns for the given requirements.
- Supported team in resolving SQL Reporting services and T-SQL related issues and Proficiency in creating different types of reports such as Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP and Sub reports, and formatting them.
- Provided via the phone, application support. Developed and tested Windows command files and SQL Server queries for Production database monitoring in 24/7 support.
- Created logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.
- Developed, monitored and deployed SSIS packages.
Environment: IBM WebSphere DataStage EE/7.0/6.0 (Manager, Designer, Director, Administrator), Ascential Profile Stage 6.0, Ascential QualityStage 6.0, Erwin, TOAD, Autosys, Oracle 9i, PL/SQL, SQL, UNIX Shell Scripts, Sun Solaris, Windows 2000.