We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

5.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • Proficient IT professional with 7+ years of experience as a Big Data Engineer, ETL Developer, and Software Engineer who has designed, developed, and implemented data models for enterprise - level systems.
  • Hands-on experience on Azure Cloud services, Management tools, Migration, Storage, Network & Content Delivery, Active Directory, Azure container service, VPN Gateway, Content Delivery Management, Azure Storage Services, Azure Database Services.
  • Experience inAzureCloud Services.
  • ExperienceAzurePlatform Development, Deployment Concepts., hostedCloudServices, platform service and close interface with WindowsAzureMulti-factor Authentications.
  • Understand business requirements and create cloud-based solutions to meet those requirements Architect cloud applications onAzure.
  • Worked on deployingSQLdatabase to Virtual Machines involving Azure Tables for non-relational data.
  • Created and configuredAWSAuto Scaling Groups for EC2 instances.
  • ConfiguredAWSEC2 instances in Virtual Private Cloud (VPC), S3 Buckets and Cloud Formation Services based on the requirements for different applications.
  • Experience in Converting existingAWSInfrastructure to Server less architecture (AWSLambda, Kinesis), deploying via Terraform andAWSCloud Formation templates.
  • Experience in using Kinesis to analyze and stream data intoAWS.
  • Developed the infrastructure using the technologies likeKafka, Splunk, CASSANDRA.
  • Used Kafka to load real-time data from multiple data sources into HDFS.
  • Experience in building Real-time Data Pipelines withKafkaConnect and Spark Streaming.
  • Experience in usingKinesisto analyze and stream data into AWS.
  • Wrote Lambda functions in Python for AWS Lambda and invoked python scripts for data transformations and analytics on large data sets in EMR clusters and AWSKinesisdata streams.
  • Designed and built multiple Elastic Map Reduce (EMR) clusters on AWS cloud for Hadoop map-reduce applications for enabling multiple proofs of concepts.
  • Setting up data models and generating an actual data lake on AWS Athena from S3 for use with AWS quicksight.
  • AWS volumes and EC2 instances were optimized.
  • Using the Pyspark library, the data was migrated from hive to Python using Spark SQL.
  • Experience in machine learning methodology such as Natural Language Processing, Naïve bayes, Logistic Regression, Decision Tree by Python/Pysparkin Cloudera Hadoop Environment
  • Experience in handling python and spark context when writingPysparkprograms for ETL.
  • Carried out data transformation and cleansing using SQL queries, Python andPyspark.
  • UsingPython Boto3, scripts were written to automate the launch, starting, and stopping of EC2 instances, as well as the creation of server snapshots. Using Scala-written in-memory computing capabilities such as PySpark, implemented advanced procedures such as text analytics and processing.
  • Used Airflow DAG for scheduling jobs and manage PySpark data pipeline dependencies.
  • Hands-on experience with Django framework with PyCharm and Airflow workflow management.
  • Writing scripts in pythonboto3which are integrated with AWS API to control instance operations.
  • Effectively builtAlteryxworkflows to compare the data sets from Azure and have built the pipelines. Scheduled the workflows on theAlteryxserver and published the output to Tableau server.

TECHNICAL SKILLS:

Hadoop/Bigdata: Hadoop, Map Reduce, Sqoop, Hive, Spark, Zookeeper & Cloudera. Technologies

ETL Tools: Informatica

NO SQL Database: HBase, Cassandra, Dynamo DB

Monitoring & Reporting: PowerBI, Tableau, Alteryx

Hadoop Distribution: Horton Works, Cloudera

Programming & Scripting: Python, Scala, SQL, Shell Scripting

Databases: Oracle, MY SQL, Teradata

Version Control: GIT

IDE Tools: Eclipse, Jupyter, Anaconda

Operating Systems: Linux, Unix, Mac OS-X, Windows 10, Windows 8, Windows 7

Cloud Computing: AWS, S3, Lamba, EC2, S3, ECS, Kinesis, RDS, SDS, GlueAzure, Azure Databricks, Azure Data Factory, Azure SQL, Azure Analytics, Azure Data Services

Development Methodologies: Agile, Waterfall

PROFESSIONAL EXPERIENCE:

Confidential

Azure Data Engineer

Responsibilities:

  • Extracted, converted, and loaded data from various sources to Azure Data Storage Services utilizing Azure data factory and T-SQL for data lake analytics.
  • Performed data transformations for ML OPs, including adding calculated columns, maintaining relationships, establishing various metrics, merging & appending queries, changing values, splitting columns, and grouping by Date & Time Column.
  • Data Ingestion to Azure Services, including Azure Data Lake, Azure Storage, Azure SQL, and Azure DW, as well as data processing in Azure Databricks.
  • Using Server Manager, created batches and sessions to transport data at preset intervals and on-demand.
  • Combined several data connections and produced multiple joins across different data sources for data preparation.
  • Developed Python scripts to do file validations in Databricks and used ADF to automate the process.
  • Developed audit, balancing, and control architecture utilizing SQL DB audit tables to manage the Azure ingestion, transformation, and load processes.
  • Created tables in Azure SQL DW for business-related data reporting and visualization.
  • Utilized Power BI desktop to generate visualization reports, dashboards, and KPI scorecards.
  • Conceived, designed, and implemented ETL solutions with SQL Server Integration Services (SSIS).
  • Created Pipelines which were built in Azure Data Factory using Linked Services/Datasets/Pipeline/ to extract, transform, & load data from a variety of sources including Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, & reverse.
  • Created CI-CD Pipelines using Azure DevOps.
  • Native integration with Azure Active Directory (Azure AD) and otherAzure services enables to build moderndata warehouse and machine learning and real-time analytics solutions.
  • Used a blend of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics, gather, convert, and load the data from source systems to Azure Data Storage services.
  • Configured Spark Streaming to receive real time data from Kafka. Used Backpressure to control message queuing in the topic.
  • Connected several applications to the existing database and created databases, schema objects, and indexes and tables by developing several functions, stored procedures, and triggers.
  • For Query optimization and quick query retrieval, Normalization and De-Normalization of existing tables were conducted using Joins and indexes effectively.
  • Created and monitored notifications for data integration events (success/failure).
  • Collaborated with product managers, scrum masters, and engineers to build Agile processes and documentation projects for retrospectives, the backlog, and meetings.

Environment: Azure Data Storage, Azure Data Factory, Azure Services, Azure SQL server, Azure data warehouse, MySQL, ETL, Kafka, PowerBI, SQL Database, T-SQL, U-SQL, Azure Data Lake, Azure Databricks, SQL Server Integration Services (SSIS).

Confidential

AWS Data Engineer

Responsibilities:

  • Designing and deploying multi-tier applications using all AWS services (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) with an emphasis on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
  • Developing and maintaining an appropriate Data Pipeline design.
  • Responsible for loading data from the internal server and the Snowflake data warehouse into S3 buckets.
  • CreatedAirflowDAGs for Batch Processing to orchestrate Python data pipelines for csv files preparation pre-ingestion, using conf to parameterize for a multitude of input files from different Hospitals launching separate TaskInstances
  • Wrote Python scripts utilizing the Boto3 library to automatically launch instances on AWS EC2 and OpsWorks stacks, integrating Auto scaling with preset AMIs.
  • Constructed the framework for effective data extraction, transformation, and loading (ETL) from several data sources.
  • Launch Amazon EC2 Cloud Instances and configure launched instances for specific applications using Amazon Web Services (Linux/Ubuntu).
  • Extensive work was performed to migrate data from Snowflake to S3 for the TMCOMP/ESD feeds.
  • Extensively utilized AWS Athena to import structured data from S3 into several systems, including RedShift, and to provide reports. For developing the common learner data model, which obtains data from Kinesis in near real time, we utilized Spark-Streaming APIs to execute on-the-fly conversions and operations.
  • Created Snowflake views for loading and unloading data from and to an AWS S3 bucket, as well as deploying the code to production.
  • Data sources are extracted, processed, and fed into CSV data files using Python programming and SQL queries.
  • Utilized Informatica Power Center Workflow manager to generate sessions, workflows, and batches to execute with the logic encoded within the mappings.
  • Created DAGs in Airflow to automate the process using Python schedule jobs.
  • Analyzed Hive data with the Spark API and the EMR Cluster Hadoop YARN. Enhancements to existing Hadoop algorithms utilizing Spark Context, Spark-SQL, Data Frames, and Pair RDDs.
  • Assisted with the development of Hive tables and the loading and analysis of data using Hive queries.
  • Python was used for exploratory data analysis and data visualization (matplotlib, numpy, pandas, seaborn).

Environment: AWS services, AWS EC2, AWS S3, Dynamo Database, SNS, AWS Athena, Amazon EC2 Cloud Instances, Boto3, AMI, ETL, Hive, Spark API, Spark SQL, Hive Table, Python, Spark, HDFS, Sqoop, MySQL, Linux, Snowflake.

Confidential

Hadoop Developer

Responsibilities:

  • Created Scala packages and UDFFs in Spark for data aggregation, querying, and re-entering data into OLTP systems using Data frames/SQL/Data sets and RDD/MapReduce.
  • Proven track record of optimizing Spark application performance for optimal batch interval time, parallelism level, and memory optimization.
  • Used Spark Context, Spark-SQL, Data Frames, and Pair RDDs to improve current Hadoop approaches.
  • Executed complex processes such as text analytics and processing using Spark's in-memory computing capabilities.
  • Ingestion experience with large datasets utilizing Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations, and other approaches.
  • Designed, built, and supported Hadoop and RDBMS data integration projects with traditional and non-traditional source systems and RDBMS and NoSQL data storage for data access and analysis.
  • Created a proof-of-concept to evaluate Impala and Apache Hive processing times for batch applications to add the former into the project.
  • Developing a proof of concept (POC) with Apache Spark using Scala to integrate Spark into a project.
  • Used Apache Spark to ingest Kafka data. Loading and transforming large volumes of structured, semi-structured, and unstructured data is possible.
  • Helped transfer data from the LINUX file system to HDFS. I used HDFS and, Hive, Sqoop to import and export data. Hive includes partitioning, dynamic partitioning, and buckets.
  • Contributed to creating the HBase Data Model from the existing Oracle Data Model.
  • Using Zookeeper services, we developed high availability and automatic failover infrastructure to eliminate the Name node's single point of failure.
  • Coordinated with several teams to finish hardware and software installation on the production cluster.
  • MCS was used to monitor the functioning of the Hadoop cluster, and I worked with NoSQL databases such as HBase.
  • Utilized Hive and created Hive tables and helped with data loading and writing Hive UDFs.
  • To administer the server hardware and operating system, I worked with the Linux server admin team.
  • Directly collaborated with data analysts to create novel solutions for their analytical tasks, including maintaining and evaluating Hadoop and HBase log files.
  • Used Sqoop to import and export Oracle data to HDFS and HIVE.
  • Collaborated with application teams to provide operating system and Hadoop updates, fixes, and version upgrades.
  • Shell scripts automate data extraction from numerous databases and load it into Hadoop.

Environment: Scala, Spark Scripts, Spark, Spark-SQL, Hadoop, RDBMS, NoSQL Impala, Apache Hive, Apache Spark, Kafka, Linux, HDFS, Sqoop, Zookeeper Services, MCS, Hive UDF.

Confidential

Data Analyst

Responsibilities:

  • Identified and documented accurate business rules and use cases based on requirements analysis.
  • Investigated and resolved issues about data flow integrity into databases.
  • Use historical data to put data prediction systems to the test.
  • Analyzed transactions to provide a consistent business intelligence model for real-time reporting needs.
  • Draft and optimize SQL scripts to evaluate the flow of online quotes into the database and validate the data.
  • Collecting and iteratively refining management specifications to complement current pivot table reporting with high-quality Excel dashboard graphics.
  • Application SQL, PL/SQL stored procedures, triggers, partitions, Primary Keys, various indexes, constraints, and views are developed and maintained.
  • Create and run new or existing SQL queries to connect to business intelligence (BI) tools including Jupyter Notebooks (Python), Tableau Dashboard, and Excel reports.
  • Extracted and analyzed data patterns to translate insights into practical outcomes.
  • Created multiple Excel documents to help collect metrics data and present information to stakeholders to provide straightforward explanations of optimal resource deployment.
  • Analyzed datasets provided by more than 20 APIs and developed reports from underlying tables and fields to offer the leadership team a better understanding of crucial data trends.
  • Designing and delivering interactive dashboards with a range of charts for simple comprehension.
  • In Tableau, I created joins, relationships, data blending (when merging several sources), calculated fields, Level-of-Detail (LOD) expressions, parameters, hierarchies, sorting, groups, and applied actions (filter, highlight, and URL) to dashboards.

Environment: Jupitar Notebook, SQL, Python, Tableau, Microsoft Excel.

Confidential

Python Developer

Responsibilities:

  • Worked on an online logistics platform to increase data storage efficiency.
  • Used the Django framework.
  • Hands-on experience with database issues and connections to SQL and NoSQL databases such as MongoDB via the installation and configuration of various Python packages.
  • Git, a software version control system, was utilized by a team of programmers to organize and monitor development changes.
  • Used Python core packages and modules such as NumPy to boost data processing and analysis efficiency.
  • Developed REST APIs for web applications to improve web system interoperability.
  • Wrote Python code in an integrated development environment, such as Visual Studio Code, to perform Git and Terminal activities more simply.
  • Worked in a Linux environment and ran Unix-based commands
  • Developed complex SQL queries and PL/SQL procedures.
  • Used Python's XML parser framework (SAX) and DOM API to track small amounts of data without the requirement for a database.
  • Used the Python package Beautiful Soup for web scraping.
  • Contributed to the automation of VLAN, Trunk port, and Routing configurations.
  • Designed the Linux Services to run REST web services using a Shell script.
  • Create an RPM package for the product that allows feature upgrades.
  • Designed and developed REST API test cases; participated in developing REST API test framework.
  • Design and build test cases for CLI automation in Python.
  • Used the PyUnit framework to participate in unit testing and construct unit test cases.

Environment: Django, SQL, NoSQL, MongoDB, Python, REST, XML parser framework, DOM, RPM, PyUnit framework.

We'd love your feedback!