We provide IT Staff Augmentation Services!

Sr.azure Data Engineer Resume

0/5 (Submit Your Rating)

Chicago, NapervilE

SUMMARY

  • Overall, around 9 Years of strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Have very strong inter - personal skills and the ability to work independently and with the group, can learn quickly and easily adaptable to the working environment.
  • Solid Experience and understanding of Implementing large scale Data warehousing Programs and E2E Data Integration Solutions on Snowflake Cloud. AWS Redshift, Informatica Intelligent Cloud Services (IICS - CDI) & Informatica Power Center integrated with multiple Relational databases (MySQL, Teradata, Oracle, Sybase, SQL server)
  • Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned.
  • Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics. Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka. Worked on reading multiple data formats on HDFS using Scala.
  • Experienced with Teradata utilities Fast Load, Multi Load, BTEQ scripting, Fast Export, SQL Assistant and Tuning of Teradata Queries using Explain plan O Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions (SCD).
  • Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirement.
  • Designed, developed, and deployed DataLakes, Data Marts and Datawarehouse using AWS cloud like AWS S3, AWS RDS and AWS Redshift, terraform,Lambda,Glue, EMR, Step Function, CloudWatch events, SNS, Redshift, S3, IAM, etc.
  • Designed, developed, and deployed Datawarehouse, AWS Redshift, applied my best practices
  • Designed, developed, and deployed DataLakes, Data Marts and Datawarehouse using Azure cloud like adls gen2, blob storage, Azure data factory, data bricks, Azure synapse, Key vault and event hub
  • Experience in writing complex SQL queries, creating reports and dashboards.
  • Proficient in using Unix based Command Line Interface, Expertise in handling ETL tools like Informatica.
  • Strong experience using pyspark, HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, and HBase.
  • Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
  • Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new features.
  • Experience in developing Spark Applications using Spark RDD, Spark-SQL and Data frame APIs.
  • Worked with real-time data processing and streaming techniques using Spark streaming and Kafka.
  • Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
  • Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing, and tuning the HQL queries.
  • Significant experience writing custom UDFs in Hive and custom Input Formats in MapReduce.
  • Involved in creating Hive tables, loading with data, and writing Hive ad-hoc queries that will run internally in MapReduce and TEZ, replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing, Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data.
  • Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2, MongoDB, HBase and SQL Server databases.
  • Deep understanding of MapReduce with Hadoop and Spark. Good knowledge of Big Data ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Spark (SparkSQL, Spark MLlib, Spark Streaming).
  • Experienced in writing complex SQL Queries like Stored Procedures, triggers, joints, and Sub quires.
  • Interpret problems and provide solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
  • Large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)
  • Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG,HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
  • Integrated Kafka with Spark Streaming for real time data processing.
  • Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents.
  • Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence solutions using Data Warehouse/Data Mart Design, ETL, BI, Client/Server applications and writing ETL scripts using Regular Expressions and custom tools (Informatica, Pentaho, and Sync Sort) to ETL data.
  • Experienced on Hadoop Ecosystem and Big Data components including Apache Spark, Scala, Python, HDFS, Map Reduce, KAFKA.
  • Expert in designing Server jobs using various types of stages like Sequential file, ODBC, Hashed file, Aggregator, Transformer, Sort, Link Partitioner and Link Collector.
  • Proficiency in Big Data Practices and Technologies like HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Oozie, Flume, Spark, Kafka.
  • Expertise in configuring the monitoring and alerting tools according to the requirement like AWS CloudWatch.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Experienced with JSON based RESTful web services, and XML/QML based SOAP web services and worked on various applications using python integrated IDEs like Sublime Text and PyCharm
  • Developed web-based applications using Python, DJANGO, QT, C++, XML, CSS3, HTML5, DHTML, JavaScript and jQuery.

PROFESSIONAL EXPERIENCE

Confidential, Chicago, Napervile

Sr.Azure Data Engineer

Responsibilities:

  • Develop deep understanding of the data sources, implement data standards, maintain data quality and master data management.
  • Build Complex distributed systems involving huge amount data handling, collecting metrics building data pipeline, and Analytics
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Involved in Data Ingestion to one or more Azure Services - (Azure Data Lake,Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine the impact of new implementation on existing business processes.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract,Transform and load data from different sources like Azure SQL, Blob storage, AzureSQL Data warehouse, write-back tool and backwards.
  • Involved in developing Spark applications using PySpark and Spark-SQL for data extraction,transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Created several Databricks Spark jobs with PySpark to perform several tables to table operations.
  • Create and maintain optimal data pipeline architecture in cloud Microsoft Azure using Data Factory and Azure Databricks
  • Involved in Architecting and developing the Azure data factory pipelines by creating the datasets and source and destination connections to move the data from oracle database to Azure Data Lake Datalake Store Raw Zone.
  • Create Self Service reporting in Azure Data Lake Store Gen2 using an ELT approach
  • Transfer data in logical stages from System of records to raw zone, refined zone and produce zone for easy translation and denormalization
  • Setting up Azure infrastructure like storage accounts, integration runtime, service principal id, app registrations to enable scalable and optimized utilization of business user analytical requirements in Azure.
  • Creating Datafactory pipelines that can bulk copy multiple tables at once from relational database to Azure datalake gen2
  • Create custom logging framework for ELT pipeline logging using Append variables in Datafactory
  • Enabling monitoring and azure log analytics to alert support team on usage and stats of the daily runs
  • Involved in Configuring and Upgrading the On Premises Data gateway between various Data sources like SQL Server to Azure Analysis Services and Power BI service.
  • Design and develop business intelligence dashboards, analytical reports and data visualizations using power BI by creating multiple measures using DAX expressions for user groups like sales, operations and finance team needs.
  • Took proof of concept projects ideas from business, lead, developed and created production pipelines that deliver business value using Azure Data Factory
  • Kept our data separated and secure across national boundaries through multiple data centers and regions.
  • Utilized Ansible playbook for code pipeline deployment delivered denormalized data for PowerBI consumers for modeling and visualization from the produced layer in Data lake

Confidential, Charlotte NC

AWS developer

Responsibilities:

  • Designed and developed scalable and cost-effective architecture in AWS Big Data services for data life cycle of collection, ingestion, storage, processing, and visualization.
  • Developed PySpark Data Ingestion framework to ingest source claims data into HIVE tables by performing Data cleansing, Aggregations and applying De-dup logic to identify updated and latest records.
  • Involved in creating End-to-End data pipeline within distributed environment using the Big data tools, Spark framework and Tableau for data visualization.
  • Worked on developing CFT’s for migrating the infra from lower environment to higher environment.
  • Leverage Spark features such as In-Memory processing, Distributed Cache, Broadcast, Accumulators, Map side Joins to implement data preprocessing pipelines with minimal latency.
  • Experience in creating python topology script to generate cloud formation template for creating the EMR cluster in AWS.
  • Experience in using the AWS services Athena, Redshift and Glue ETL jobs.
  • Integrated Big Data Spark jobs with EMR and glue to create ETL jobs for around 450 GB of data daily.
  • Created S3 bucket structure and Data Lake layout for optimal use of glue crawlers and S3 buckets.
  • Used Hive Glue data catalog to obtain and validate schema of data and lake formation for data governance.
  • Involved in loading data from AWS S3 to Snowflake and processed data for further analysis.
  • Developed Analytical dashboards in Snowflake and shared data to downstream.
  • Developed Spark Applications by using Scala and Python and Implemented Apache Spark data processing project to handle data from various RDBMS and streaming sources.
  • Used Spark and PySpark for streaming and batch applications on many ETL jobs to and from data sources.
  • Developed new API Gateway for streaming to Kinesis and ingestion of event streaming data.
  • Worked on building data centric queries to cost optimization in Snowflake.
  • Good knowledge on AWS Services like EC2, EMR, S3, Service Catalog, and Cloud Watch.
  • Experience in using Spark SQL to handle structured data from Hive in AWS EMR Platform.
  • Exploring with Spark, improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Written unit test cases for Spark code for CICD process.
  • Good knowledge about the configuration management tools like BitBucket/Github and Bamboo(CICD).

Confidential, Chicago, Illinois

Sr.AWS Data Engineer

Responsibilities:

  • Designed, Developed and Deployed Batch and streaming pipelines using AWS Services.
  • Developed data pipelines using cloud and container services like Docker and Kubernetes.
  • Developed high performance data pipelines using AWS Glue and PySpark Jobs in EMR cluster.
  • Developed multiple ETL pipelines to deliver data to the Stakeholders.
  • Designed and developed monitoring solution using AWS services like AWS CloudWatch, AWS IAM, AWS Glue and AWS QuickSight.
  • Used AWS services like Lambda, Glue, EMR, Ec2 and EKS for Data processing.
  • Used Spark and Kafka for building batch and streaming pipelines.
  • Developed Data Marts, Data Lakes and Data Warehouse using AWS services.
  • Extensive experience using AWS storage and querying tools like AWS S3, AWS RDS and AWS Redshift.
  • Evaluated and implemented next generation AWS Serverless Architecture.
  • Developed UDF’s to perform standardization on the entire dataset.
  • Worked on End-to-End Development of the ingestion Framework using AWS Glue, IAM, CloudFormation, Athena, RESTAPI.
  • Worked on the core logic of masking and creating the mask utility using the SHA-2.
  • Used Redshift to store the hashed and un-hashed values for a corresponding PII attribute and to map the user with his email id or the oxygen id which is unique for the user.
  • Changed the entire existing datasets into GDPR Complaint Datasets and pushed it to production.
  • Worked on the successful transformation of the project in the PRD among the users so the company is completely GDPR complaint.
  • Provided Architectural guidance, validation, and Implementation.
  • Wrote pipelines for automated hashing and un-hashing of datasets in case of business-critical scenarios.
  • Actively responsible for onboarding new technologies by evaluating tools or components which meet with functional and cost needs.
  • Created POC projects in DBT and wrote macros to Calculate Formula Fields for Salesforce Data.
  • Worked on improving operational efficiency by partnering with Business Partners and Data Scientists.

Confidential

Hadoop Developer

Responsibilities:

  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Developed custom Input Adaptor utilizing the HDFS File system API to ingest click stream log files from FTP server to HDFS.
  • Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
  • Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's.
  • Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Created components like Hive UDFs for missing functionality in HIVE for analytics.
  • Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
  • Used to create, validate and maintain scripts to load data using Sqoop manually.
  • Utilized Oozie and its coordinators to deploy end-to-end data processing pipelines and scheduling the workflows.
  • Developed interactive shell scripts for scheduling various data cleansing and data loading process.

Confidential

Data Engineering Analyst

Responsibilities:

  • Enterprise Insurance data warehouse is a conversion project of migrating existing data marts at an integrated place to get the advantage of corporate wide data warehouse.
  • It involves rewriting/developing existing data marts and adding new subject areas to existing data marts, it helps business users a platform queries across various subject areas using single OLAP tool (Cognos).
  • Created map design document to transfer data from source system to data warehouse, built ETL pipeline which made analyst’s job easy and reduced the patient’s expense on treatment up to 40%.
  • Development of Informatica Mappings, Sessions, Worklets, Workflows.
  • Wrote Shell scripts to monitor load on database and Perl scripts to format data extracted from data warehouse based on user requirements.
  • Designed, developed, and delivered the jobs and transformations over the data to enrich the data and progressively elevate for consuming in the layer of the delta lake.
  • Managed multiple small projects with a team of 5 members, planned milestones, scheduled project milestones, and tracked project deliverables.
  • Performed network traffic and analysis expertise using data mining, Hadoop ecosystem (MapReduce, HDFS Hive) and visualization tools by considering raw packet data, network flow, and Intrusion Detection Systems (IDS).
  • Analyzed the company’s expenses on software tools and came up with a strategy to reduce those expenses by 30%.
  • Created chat-bot to receive complaints from the customers and give them an estimated waiting time to resolve the issue.

We'd love your feedback!