We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

2.00/5 (Submit Your Rating)

New, JerseY

SUMMARY

  • Around 9 years of experience as a software industry, including 5 years of experience in AWS, Azure cloud services, and 4 years of experience in Data warehouse and ETL developer role.
  • Extensive experience deploying cloud - based applications using Amazon Web Services such as Amazon EC2, S3, RDS, IAM, Auto Scaling, Cloud Watch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, and DynamoDB.
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a server less data pipeline, which can be written to Glue Catalo and can be queried from Athena.
  • Proven expertise in deploying major software solutions for various high-end clients meeting the business requirements such as big data Processing, Ingestion, Analytics and Cloud Migration from On-perm to AWS Cloud using AWS EMR, S3, DynamoDB
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Experience with migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data warehouse, as well as controlling and giving database access and migrating on-premises databases to Azure Data Lake stores using Azure Data Factory.
  • Developed ETL pipelines in and out ofdatawarehouse using combination of Python and Snowflakes Snows Writing SQL queries against Snowflake.
  • Demonstrated understanding of the Fact/Dimension data warehouse design model, including star and snowflake design methods.
  • Designed and developed logical and physical data models that utilize concepts such asStar Schema, Snowflake Schemaand Slowly Changing Dimensions.
  • Hands on experience across Hadoop Ecosystem that includes extensive experience in Big Data technologies like HDFS, MapReduce, YARN, Apache Cassandra, HBase, Hive, Oozie, Impala, Pig, Zookeeper and Flume, Kafka, Sqoop, Spark.
  • Extensive hands on experience tuning spark Jobs.
  • Experienced in working with structured data using HiveQL, and optimizing Hive queries.
  • Built real timedata pipelinesby developingKafkaproducers andSparkstreaming applications for consuming.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, Pair RDD's and worked explicitly on PySpark.
  • Familiar with data processing performance optimization techniques such as dynamic partitioning, bucketing, file compression, and cache management in Hive, Impala, and Spark.
  • Excellent understanding and knowledge in handling database issues and connections with SQL and NOSQL databases like Mongo DB, Cassandra, HBase and SQL server.
  • Experience in DimensionData modellingconcepts likeStarJoinSchema Modeling, Snow-Flake Modeling,FACTandDimensions Tables,PhysicalandLogical Data Modeling.
  • Created Partitions and Bucketing Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
  • Experience in designing and implementing RDBMS Tables, Views, User Generated Data Types, Indexes, Stored Procedures, Cursors, Triggers, and Transactions.
  • Strong team player, ability to work alone as well as in a team, capacity to adapt to a quickly changing environment, dedication to learning. Have good communication, project management, documentation, and interpersonal skills.
  • Experience in creating, maintaining reporting, and analytics infrastructure for internal business clients utilizing AWS services such as Athena, Redshift, Spectrum, EMR, and Quick Sight.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, Map Reduce, Hive, Pig, Swoop, Flume, Oozy, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, Base, Flume, Impala

Hadoop Distribution: Cloudera CDH, Horton Works HDP, Apache, AWS

Languages: Shell scripting, SQL, PL/SQL, Python, R, PySpark, Pig, Hive QL, Scala, Regular Expressions

Web Technologies: HTML, JavaScript, Restful, SOAP

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Version Control: GIT, GIT HUB

IDE & Tools, Design: Eclipse, Visual Studio, Net Beans, Junit, CI/CD, SQL Developer, MySQL, SQL Developer, Workbench, Tableau

Databases: Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake, NoSQL Database (Base, MongoDB).

Operating Systems: Windows 98, 2000, XP, Windows 7,10, Mac OS, Unix, Linux

Cloud Technologies: AWS (EC2, EMR, Lambda, IAM, S3, Athena, Glue, Kinesis, CloudWatch, RDS, Redshift) Azure (Data Factory, Data Lake, Databricks, Logic App

Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools: Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Swoop, Map Reduce, Flume, YARN, Hortonworks, Cloudera, Mallis, Oozy, Zookeeper, etc. AWS S3, EC2, Athena, Glue, kinesis, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, Linux, Bash Shell, Unix, etc., Power BI, SAS, Crystal Reports, Dashboard Design.

PROFESSIONAL EXPERIENCE

Azure Data Engineer

Confidential, New Jersey

Responsibilities:

  • Created Pipelines inADFusingLinked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Strong experience of leading multiple Azure Big Data and Data transformation implementations in Banking and Financial Services, High Tech and Utilities industries.
  • Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, Azure Data Catalog, HDInsight, Azure SQL Server, Azure ML and Power BI.
  • Designed end-to-end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
  • Developed Spark applications usingScalaandSpark-SQLfor data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Implemented and Developing Hive Bucketing and Partitioning.
  • Implemented Kafka, spark structured streaming for real time data ingestion.
  • Scalable metadata handling, Streaming and batch unification are offered by Delta Lake.
  • Used Delta Lakes for time travelling as Data versioning enables rollbacks, full historical audit trails, and reproducible machine learning experiments.
  • Delta lake supports merge, update and delete operations to enable complex use cases.
  • Used Azure Databricks for fast, easy and collaborative spark-based platform on Azure.
  • Used Databricks to integrate easily with the whole Microsoft stack.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data inAzure Databricks.
  • UsedAzure Data Catalog, which helps in organizing and to get more value from their existing investments.
  • Used Azure Synapseto bring these worlds together with a unified experience to ingest, explore, prepare, manage, and serve data for immediate BI and machine learning needs.
  • Followed the organization defined Naming conventions for naming the Flat file structure, Talen Jobs and daily batches for executing the Talen Jobs.
  • Created Talen jobs to copy the files from one server to another and utilized Talen FTP components.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
  • Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
  • Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB, cosmos DB)
  • Analysed the Sql scripts and designed it by using PySpark SQL for faster performance.
  • Developed spark applications in python (PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Worked on reading and writing multiple data formats like JSON, ORC, Parquet on HDFS using PySpark.
  • Performed all necessary day-to-day GIT support for different projects, Responsible for maintenance of the GIT Repositories, and the access control strategies.

Environment: Hadoop 2.x, Hive v2.3.1, Spark v2.1.3, Databricks, Lambda, Glue, Azure data grid, Azure Synapse analytics, Azure data catalog, Service bus ADF, Delta lake, Blob, cosmos DB, Python, PySpark, Java, Scala, SQL, Swoop v1.4.6, Kafka, Airflow v1.9.0, Oozy, Base, Oracle, Teradata, Cassandra, Mallis, Tableau, Maven, Git, Jira.

AWS Data Engineer

Confidential, San Ramon, California

Responsibilities:

  • Developed various data loading strategies and performed various transformations for analysing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Operated with big data components like HDFS, Spark, YARN, Hive, Base, Druid, Swoop, and Pig.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3, Athena, Glue and Kinesis
  • Applied transformations using Databricks and Spark data analysis after cleaning the data.
  • Involve in creating database objects like tables, views, stored procedures, triggers, packages, and functions using T-SQL to provide structure and maintain dataefficiently.
  • Developed a reconciliation process to make sure the Elastic Search index document count matches source records using Python Flask specification.
  • Implemented data ingestion from various source systems using swoop and PySpark.
  • Hands on experience implementing Spark and Hive jobs performance tuning.
  • Developed data ingestion pipeline from HDFS into AWS S3 buckets using Knife.
  • Created external and permanent tables in Snowflake on the AWS data.
  • Used Flume to collect, aggregate, and store the weblog data from different sources like web servers, mobile and network devices, and pushed to HDFS.
  • Implemented data ingestion from various source systems using Swoop and PySpark.
  • Migrated data between traditional RDBMS and HDFS using Swoop.
  • Implemented Hive Bucketing and portioning
  • Ingested data into HDFS from Teradata, Oracle, and MySQL. Identified required tables and views and exported them into Hive
  • Used AWS Athena extensively to ingest structured data from S3 into other systems such as RedShift or to produce reports.
  • For faster access of data, used performed ad-hoc queries using Hive joins, partitioning, bucketing technique
  • Responsible for collecting, scrubbing, and extractingdatafrom various generate reports, dashboards, and analytical solutions. Helped in debugging the Tableau dashboards.
  • Worked in Agile development environment and participated in daily scrum and other design related meetings.
  • Worked extensively on migrating/rewriting existing oozy jobs to AWS simple workflow

Environment: Python, Databricks, PySpark, Kafka, PyCharm, ADF V2, AWS (EMR, EC2, S3), Data Lake, Snowflake, Hive, Flume, Apache Knife, Shell-scripting, SQL, Pig, Swoop v1.4.4, Oozy v4.1.0, Python, Oracle, SQL Server, Tableau, Agile Methodology, Hadoop, HDFS, Spark

Data Engineer

Confidential

Responsibilities:

  • Designed robust, reusable and scalable data driven solutions and data pipeline frameworks to automate the ingestion, processing and delivery of both structured and semi structured batch and real time data streaming data.
  • Develop framework for converting existing Power enter mappings and to PySpark (Python and Spark) Jobs.
  • Create Pyspark frame to bring data from DB2 to Amazon S3.
  • Applied efficient and scalable data transformations on the ingested data using Spark framework.
  • Gained good knowledge in troubleshooting and performance tuning Spark applications and Hive scripts to achieve optimal performance.
  • Developed various custom UDF’s in spark for performing transformations on date fields, complex string columns and encrypting PI fields etc.,
  • Written complex hive scripts for performing various data analysis and creating various reports requested by business stakeholders.
  • Used Oozy and Oozy Coordinators for automating and scheduling our data pipelines.
  • The Spark-Streaming APIs were used to conduct on-the-fly transformations and actions for creating the common learner data model, which receives data from Kinesis in near real time.
  • Implemented data ingestion from various source systems using swoop and PySpark.
  • Hands on experience implementing Spark and Hive jobs performance tuning.
  • Hive As the primary query engine of EMR, we have built external table schemas for the data being processed.
  • Having hands on experience in creating and managing, the PostgreSQL instance from Amazon RDS.
  • Install, configure, test, monitor, upgrade, and tune new and existing PostgreSQL databases.
  • Worked extensively in automating creation/termination of EMR clusters as part of starting the data pipelines.
  • Good experience working on analysis tools like Tableau, Splunk for regression analysis, pie charts and bar graphs.

Data Warehouse Developer

Confidential

Responsibilities:

  • Creation, manipulation and supporting the SQL Server databases.
  • Involved in the Data modelling, Physical and Logical Design of Database
  • Helped in integration of the front end with the SQL Server backend.
  • Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database objects to obtain the required results.
  • Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS)
  • Wrote T-SQL statements for retrieval of data and involved in performance tuning of TSQL queries.
  • Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc. to SQL Server using SSIS/DTS using various features like data conversion etc. In addition, Created derived columns from the present columns for the given requirements.
  • Supported team in resolving SQL Reporting services and T-SQL related issues and Proficiency in creating different types of reports such as Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP and Sub reports, and formatting them.
  • Provided via the phone, application support. Developed and tested Windows command files and SQL Server queries for Production database monitoring in 24/7 support.
  • Developed, monitored and deployed SSIS packages.
  • Generated multiple Enterprise reports (SSRS/Crystal/Impromptu) from SQL Server Database (OLTP) and SQL Server Analysis Services Database (OLAP) and included various reporting features such as group by, drilldowns, drill through, sub-reports, navigation reports (Hyperlink) etc.
  • Created different Parameterized Reports (SSRS 2005/2008) which consist of report Criteria in various reports to make minimize the report execution time and to limit the no of records required.
  • Worked on all types of report types like tables, matrix, charts, sub reports etc.
  • Created Linked reports, Ad-hoc reports etc. based on the requirement. Linked reports are created in the Report Server to reduce the repetition the reports.

Environment: Microsoft Office, Windows 2007, T-SQL, DTS, SQL Server 2008, HTML, SSIS, SSRS, XML.

We'd love your feedback!