Azure Data Engineer Resume
Dallas, TexaS
SUMMARY
- Proficient IT professional experience with 7+ years of expertise as a Data Engineer, ETL Developer & implementation of data models for enterprise - level applications.
- Created an Azure SQL database, monitored it, & restored it. Migrated Microsoft SQL server to Azure SQL database.
- Experience with Azure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Big Data Technologies (Apache Spark), & Data Bricks.
- Developed ETL pipelines in & out of the data warehouse using a mix of Python & Snowflakes, SnowSQL Writing SQL queries against Snowflake.
- Extensive experience developing & implementing cloud architecture on Microsoft Azure.
- Excellent understanding of connecting Azure Data Factory V2 with a range of data sources & processing the data utilizing pipelines, pipeline parameters, activities, activity parameters, & manually/window-based/event-based task scheduling.
- Created a connection from Azure to an on-premises data center using the Azure Express Route for Single & Multi-Subscription.
- Excellent understanding of technologies on systems that include huge amounts of data & run in a highly distributed fashion in Cloudera, Hortonworks Hadoop distributions, & Amazon AWS.
- Working noledge of AWS databases such as Elasticache (Memcached & Redis) & NoSQL databases such as HBase, Cassandra, & MongoDB, as well as database performance tuning & data modeling.
- Worked on ETL Migration services by creating & deploying AWS Lambda functions to provide a serverless data pipeline that can be written to Glue Catalog & queried from Athena.
- Experience in Analytics & cloud migration from on-premises to AWS Cloud with AWS EMR, S3, & DynamoDB.
- Experience in creating & managing reporting & analytics infrastructure for internal business clients using AWS services including Athena, Redshift, Spectrum, EMR, & Quick Sight.
- Extensive expertise with Amazon Web Services such as Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, & DynamoDB.
- Proficiency in setting up the CI/CD pipelines using Jenkins, GitHub, Chef, Terraform & AWS.
- Developed ETL pipelines in & out of the data warehouse using a mix of Python & Snowflakes, SnowSQL Writing SQL queries against Snowflake.
- Working noledge in Python programming with a variety of packages such as NumPy, Matplotlib, SciPy, & Pandas.
- Integrated Jenkins with Docker container using Cloud bees Docker pipeline plugin & provisioned the EC2 instance using Amazon EC2 plugin
- Extensive experience creating Web Services with the Python programming language, including implementation of JSON-based RESTful & XML-based SOAP web services.
- Experienced in writing complex Python scripts with Object-Oriented principles such as class creation, constructors, overloading, & modules.
- Experience establishing & maintaining multi-node development & production Hadoop clusters.
- Worked with Spark to improve the speed & optimization of current Hadoop algorithms utilizing Spark Context, Spark-SQL, Data Frame, Pair RDD, & Spark YARN.
- Experience with Hortonworks Ambari in building & maintaining multi-node development & production Hadoop clusters with various Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, CATALOG, HBASE, ZOOKEEPER).
- Worked with the Map Reduce programming paradigm & the Hadoop Distributed File System.
- Expertise in all aspects of the Software Development Life Cycle (SDLC), including Agile & Waterfall techniques.
TECHNICAL SKILLS
Hadoop/BigData: Hadoop, Map Reduce, Sqoop, Hive, Oozie, Spark, Zookeeper & Cloudera Manager, Kafka, Flume.
Amazon AWS: EMR, EC2, EBS, RDS, S3, Athena, Glue, Elasticsearch, Lambda, SQS, DynamoDB, Redshift, ECS, QuickSight, Kinesis.
Microsoft Azure: Databricks, Data Lake, Blob Storage, Azure Data Factory, SQL Database, SQL Data Warehouse, Cosmos DB, Azure Active Directory.
Monitoring & Reporting: PowerBI, Tableau, Custom shell scripts
Hadoop Distribution: Horton Works, Cloudera
Application Servers: Apache Tomcat, JDBC, ODBC
Build Tools: Maven
Programming & Scripting: Python, Scala, SQL, Shell Scripting
Databases: Oracle, MY SQL, Teradata
Version Control: GIT
IDE Tools: Eclipse, Jupyter, Anaconda
Operating Systems: Linux, Unix, Mac OS-X, CentOS, Windows 10, Windows 8, Windows 7
ETL Tools: Informatica
NO SQL Database: HBase, Cassandra, Dynamo DB, Mongo DB.
Cluster Managers: Docker, Kubernetes
Development Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Dallas, Texas
Azure Data Engineer
Responsibilities:
- Worked with data transfer from on-premises SQL servers to cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
- Created Pipelines that were built in Azure Data Factory using Linked Services/Datasets/Pipeline/ to extract, transform, & load data from a variety of sources including Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, & reverse.
- Created CI-CD Pipelines using Azure DevOps.
- Created infrastructure using ARM templets & automated with Azure DevOps pipelines.
- Integrated data storage options with Spark, notably with Azure Data Lake Storage and Blob storage.
- Ingestion of data into one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) & processing of data in Azure Databricks.
- Worked directly with the Big Data Architecture Team, which created the foundation of this Enterprise Analytics initiative in a Hadoop-based Data Lake.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large amounts of data sets to determine the optimal way to aggregate & report on it.
- Developed simple to complex MapReduce Jobs using Hive to cleanse & load downstream data
- Created partitioned tables in Hive & Managed & reviewed Hadoop log files.
- Involved in creating Hive tables, loading with data & writing Hive queries, which will run internally in MapReduce way.
- Used Hive to analyze the partition & bucket data & compute various metrics for reporting.
- Load & transform large sets of structured, semi-structured & unstructured data & manage data coming from different sources.
- Parsed high-level design specifications to simple ETL coding & mapping standards.
- Designed & customized data models for the Data warehouse supporting data from multiple sources in real-time
- Involved in building the ETL architecture & Source to Target mapping to load data into the Data warehouse.
- Extracted the data from the flat files & other RDBMS databases into the staging area & populated it in the Data warehouse.
- Used various transformations like Filter, Expression, Sequence Generator, Update Strategy, Joiner, Stored Procedure, & Union to develop robust mappings in the Informatica Designer.
- Developed mapping parameters & variables to support SQL override.
- Created applets to use in different mappings.
- Developed mappings to load into staging tables & then to Dimensions & Facts.
- Used existing ETL standards to develop these mappings.
- Worked on different tasks in Workflows like sessions, events raise, event wait, decision, e-mail, comm&, worklets, Assignment, Timer & scheduling of the workflow.
- Created sessions, configured workflows to extract data from various sources, transformed data, & loaded it into the data warehouse.
Environment: Azure Cloud, Azure HDInsight, DataBricks (ADBX), CosmosDB, Azure SQL Server, Azure Data Warehouse, MySQL, Azure DevOps, Azure AD, Azure Data Lake, Git, Blob Storage, Data Factory, Data Storage Explorer, Scala, Spark v2.0.2, Airflow, Hive, Sqoop, HBase
Confidential
AWS Data Engineer
Responsibilities:
- Involved in importing the data from various data sources into HDFS using Sqoop & applying various transformations using Hive, and Apache Spark & then loading data into Hive tables or AWS S3 buckets.
- Extensively used AWS Athena to import structured data from S3 into other systems such as RedShift or to generate reports.
- Develop Pipelines for migrating the datafrom Oracle DB toAWSDataLake, using the Glue and Lambda necessarily.
- Created Apache presto and Apache drill configurations on an AWS EMR (Elastic Map Reduce) cluster to integrate different databases such as MySQL and Hive. This allows for the comparison of outcomes such as joins and inserts on many data sources controlled by a single platform.
- Proposed and implemented improvements to increase process efficiency and TEMPeffectiveness, providing input to solution designs to ensure consistency, security, and fault-tolerantAWSsolutions. AWS services such as EC2 and S3 were used for data set processing and storage. Experienced in maintaining a Hadoop cluster on AWS EMR.
- Involved in the development of the new AWS Fargate API, which is comparable to the ECS run task API.
- Experience in implementing CI/CD processes using AWS Code Commit, Code Build,Code Deploy, Code Pipeline, Jenkins, Bit bucket Pipelines, and Elastic Beanstalk.
- Process raw data at scale in Hadoop big data platform & loading from disparate data sets from various environments.
- Developed ETL data flows using Hadoop & Spark in Scala ECO system components.
- Implemented Spark using Scala & Spark for faster testing & processing of data.
- Exploring with Spark to improve the performance & optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, RDDs, and Spark YARN.
- Implemented advanced Spark procedures like text analytics & processing using the in-memory computing capabilities.
- Leading the development of large-scale, high-speed, & low-latency data solutions in the areas of large-scale data manipulation, long-term data storage, data warehousing, low-latency retrieval systems, real-time reporting & analytics Data applications.
- Monitored the SQL scripts & modified them for improved performance using PySpark SQL.
- Implemented advanced Spark procedures like text analytics & processing using the in-memory computing capabilities.
- Using the Spark framework Enhanced & optimized product Spark code to aggregate, group & run data mining tasks.
- Proven track record of optimizing Spark application performance for optimal batch interval time, parallelism level, & memory optimization.
- Implemented Spark best practices like partitions, caching & checkpointing for faster.
- Wrote jobs for processing unstructured data into structured data for analysis, pre-processing, matching & ingesting data.
- Created various analytical reports using Hive, and HiveQL in the MapRed Hadoop environment.
- Involved in designing the various configuration of Hadoop & hive for better performance.
Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, AWS Anthena, map, HDFS, Hive, Apache,Sqoop, Python, Pyspark, Shell scripting, Linux, MySQL Oracle Enterprise DB, Jenkins,Eclipse.
Confidential
Data Engineer
Responsibilities:
- Wrote MapReduce code to parse the data from various sources & stored parsed data into Hbase & Hive.
- Imported data from different relational data sources like Oracle, and Teradata to HDFS using Sqoop.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Wrote ETL jobs using spark data pipelines to process data from a different source to transform data to multiple targets.
- Created Scala apps for loading/streaming data into NoSQL databases (MongoDB) & HDFS is preferred.
- Created streams using Spark, processed real-time data into RDDs & data frames & created analytics using SPARK SQL.
- Developed distributed high-performance systems with Spark and Scala.
- Involved in client meetings & explained the views to supporting & gathering requirements.
- Designed data models for dynamic & real-time data to be used by various applications with OLAP & OLTP needs
- H&s on experience with importing & exporting data from Relational databases to HDFS, Hive & HBase using Sqoop.
- Experienced in writing Python as ETL framework & PySpark to process massive amounts of data daily.
- Used Python to extract, transform & load source data from transaction systems, generated reports, insights, & key conclusions
- TEMPEffectively Communicated plans, project status, project risks & project metrics to the project team and planned test strategies under the project scope.
Environment: Hortonworks 2.0, Hadoop, Hive v1.0.0, HBase, Sqoop v1.4.4, PySpark, Python, Druid, Kafka v0.8.1, Python, SQL, Teradata, Oracle, NoSQL, MongoDB MySQL, Tableau v9.x, SVN, Jira.
Confidential
Data AnalystResponsibilities:
- Performed data analysis & developed complex SQL queries based on requirements to generate data for mock-up reports.
- Performed statistical analysis using SQL, Python, & Excel.
- Participated in project reviews & team meetings to provide report updates.
- Created Dashboards & reported deliverables in Tableau, utilized advanced features, capabilities, & designs.
- Created pivot Table using Excel Pivot table, v-lookups & other excel functionalities are utilized for data presentation.
- Analyzed & designed data in a SQL Server database environment.
- Collaborated with business stakeholders, accountants, & programmers to develop as needed to meet the business needs on reporting & data analysis.
- Implementing ETL (extract, transform & load) in SAS to import data from multiple sources like Mainframe, FLAT files, and spreadsheets to perform data analysis, validations & build tabular reports.
- Analyzed business workflow & system needs.
- Created dashboard reports & ad-hoc reports on a weekly & monthly basis.
- Worked closely with the team of data analysts in defining sources & content for the data warehouse component.
- Worked on integration of data from various sources, created comprehensive data mapping, & dataflow diagrams.
Environment: Informatica, Load Runner 8. x, HP QC 10/11, SQL, PL/SQL, Tableau, Microsoft Power BI, Tableau, Microsoft Excel, Agile, & Scrum methodologies.
Confidential, Bangalore, KA
Data Analyst
Responsibilities:
- Worked on design, development, and testing of mappings, sessions, and workflows to transfer the data from the policy center to BIC- Business Intelligence Center.
- A developed solution to decide at which stage of a policy life cycle, an underwriting issue occurred. Worked on root cause analysis for the cancellation of policy using SAS and SQL.
- Conducted user interviews, gathering requirements, analyzing, and prioritizing Product Backlog.
- Designed and developed Use Cases, flow diagrams, and business functional requirements for Scrum.
- Functioned as a liaison between the Scrum Master, QA Manager, and End-Users in defect tracking prioritization, escalation, and resolution (Environment: Windows 7, Oracle, Mainframes, SharePoint, Structured data, Semi-Structured Data, Unstructured data.)
- Applied models and data to understand and predict infrastructure costs, presenting findings to stakeholders.
- Created interactive cohort analysis report in Tableau.
- Built forecasting using parameters, trend lines, and reference lines. Implemented security guidelines by using user filters and row-level security. Python data wrangling, web scraping, streaming data from sources, and data parsing.
Environment: Data Warehousing, Python/R, Snowflake, Redshift, Data Visualization- SAS/Tableau, Data Science Research Methods- Power BI, Statistical Computing Methods, and Experimental Design & Analysis JSON, SQL, PowerShell, Git, and GitHub.