We provide IT Staff Augmentation Services!

Sr. Azure Data Engineer Resume

0/5 (Submit Your Rating)

SUMMARY

  • Around 8 years of work experience in IT consisting of Data Analytics Engineering & as a Programmer Analyst. Experienced with cloud platforms like Amazon Web Services, Azure, Databricks (both on Azure as well as AWS integration of Databricks).
  • Proficient with complex workflow orchestration tools namely Oozie, Airflow, Data pipelines and Azure Data Factory, CloudFormation & Terraforms.
  • Implemented Data warehouse solution consisting of ETLS, On - premises to Cloud Migration and good expertise building and deploying batch and streaming data pipelines on cloud environments.
  • Worked on Airflow 1.8(Python2) and Airflow 1.9(Python3) for orchestration and familiar with building custom Airflow operators and orchestration of workflows with dependencies involving multi-clouds.
  • Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala.
  • Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN.
  • Experience in application of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured files into a data warehouse.
  • Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.
  • Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, PowerBI and Microsoft SSIS.
  • Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.
  • Comprehensive experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java for data cleansing, filtering and data aggregation. Also possess detailed knowledge of MapReduce framework.
  • Leveraged Spark as ETL tool for building data pipelines on various cloud platforms like AWS EMRS, Azure HD Insights and MapR CLDB architectures.
  • Career Interest and future aspirations include but not limited to: ML, AI, RPA & Automation everywhere motives.
  • Spark for ETL follower, Databricks Enthusiast, Cloud Adoption & Data Engineering enthusiast in Open source community.
  • Proven expertise in deploying major software solutions for various high-end clients meeting the business requirements such as Big Data Processing, Ingestion, Analytics and Cloud Migration from On-prem to Cloud.
  • Proficient with Azure Data Lake Services (ADLS), Databricks & iPython Notebooks formats, Databricks Deltalakes & Amazon Web Services (AWS).
  • Orchestration experience using Azure Data Factory, Airflow 1.8 and Airflow 1.10 on multiple cloud platforms and able to understand the process of leveraging the Airflow Operators.
  • Developed and Deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala with custom Libraries.
  • Expertise understanding of AWS DNS Services through Route53. Understanding of Simple, Weighted, Latency, Failover & Geolocational Route tynes.
  • Architect and implement ETL and data movement solutions using Azure Data Factory (ADF), SSIS
  • Develop Power BI reports & effective dashboards after gathering and translating end-user requirements.
  • Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database, and SQL Data warehouse environment.
  • Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure. Data Center Migration, Azure Data Services have a strong virtualization experience.
  • Experience in troubleshooting and resolving architecture problems including database and storage, network, security and applications.
  • Experience managing Big Data platforms deployed in Azure Cloud.
  • Implemented Copy activity, Custom Azure Data Factory Pipeline Activities for On-cloud ETL processing.
  • Experience in Monitoring and Tuning SQL Server Performance.
  • Experience in configuration of report server and report manager for job scheduling, giving permissions to a different level of users in SQL Server Reporting Services (SSRS).
  • Expert in creating, debugging, configuring, and deploying ETL packages designed MS SQL Server Integration Services (SSIS). Configure SQL Azure firewall for a security mechanism.
  • Work in wearing multiple hats: Azure Architect/System Engineering, network operations and data engineering.
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools).
  • Collaborate with application architects on infrastructure as a service (IaaS) applications to Platform as a Service (PaaS). Deploy Azure Resource Manager JSON Templates from PowerShell.
  • Experience in Performance Tuning and Optimization (PTO), Microsoft Hyper-V virtual infrastructure. Fluent programming experience with Scala, Java, Python, SQL, T-SQL, R.
  • Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka.
  • Adept at configuring and installing Hadoop/Spark Ecosystem Components.
  • Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala.
  • Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD and Spark YARN.
  • Experience in application of various data sources like Oracle SE2, SQL Server, Flat Files and Unstructured files into a data warehouse.
  • Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.
  • Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, PowerBI and Microsoft SSIS.
  • Hands-on experience with Hadoop architecture and various components such as Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop MapReduce programming.
  • Comprehensive experience in developing simple to complex Map reduce and Streaming jobs using Scala and Java for data cleansing, filtering and data aggregation.
  • Also possess detailed knowledge of MapReduce framework.
  • Used IDEs like Eclipse, IntelliJ IDE, PyCharm IDE, Notepad++, and Visual Studio for development.
  • Seasoned practice in Machine Learning algorithms and Predictive Modeling such as Linear Regression, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, KNN, Neural Networks, and K-means clustering.
  • Ample knowledge of data architecture including data ingestion pipeline design, Hadoop/Spark architecture, data modeling, data mining, machine learning and advanced data processing.
  • Experience working with NoSQL databases like Cassandra and HBase and developed real-time read/write access to very large datasets via HBase.
  • Developed Spark Applications that can handle data from various RDBMS (MySQL, Oracle Database) and Streaming sources.
  • Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range of applications.

TECHNICAL SKILLS

Hadoop/Spark Ecosystem: Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow, MongoDB, Cassandra, HBase, and Storm.

Hadoop Distribution: Cloudera distribution and Horton works

Programming Languages: Scala, Hibernate, JDBC, JSON, HTML, CSS, SQL, R, Shell Scripting

Script Languages: JavaScript, jQuery, Python.

Databases: Oracle, SQL Server, MySQL, Cassandra, Teradata, PostgreSQL, MS Access, Snowflake, NoSQL, HBase,MongoDB

Cloud Platforms: AWS, Azure, GCP

Distributed Messaging System: Apache Kafka

Data Visualization Tools: Tableau, Power BI, SAS, Excel, ETL

Batch Processing: Hive, MapReduce, Pig, Spark

Operating System: Linux (Ubuntu, Red Hat), Microsoft Windows

Reporting Tools/ETL Tools: Informatica Power Centre, Tableau, Pentaho, SSIS, SSRS, Power BI

PROFESSIONAL EXPERIENCE

Confidential

Sr. Azure Data Engineer

Responsibilities:

  • Involved in gathering requirements, design, implementation, deployment, testing and maintaining of the applications to meet the organization's needs using SCRUM methodology.
  • Participated in scrum meetings and coordinated with Business Analysts to understand the business needs and implement the same into a functional design.
  • Used Azure Data Factory extensively for ingesting data from disparate source systems. Involved in Requirement gathering, business Analysis, Design and Development, testing and implementation of business rules.
  • Understand business use cases, integration business, write business & technical requirements documents, logic diagrams, process flow charts, and other application related documents.
  • Used Pandas in Python for Data Cleansing and validating the source data.
  • Designed and developed ETL pipeline in Azure cloud which gets customer data from API and processes it to Azure SQL DB.
  • Orchestrated all Data pipelines using Azure Data Factory and built a custom alerts platform for monitoring.
  • Created custom alerts queries in Log Analytics and used Web hook actions to automate custom alerts.
  • Created Databricks Job workflows which extracts data from SQL server and upload the files to sftp using pyspark and python.
  • Used Azure Key vault as central repository for maintaining secrets and referenced the secrets in Azure Data Factory and also in Databricks notebooks.
  • Built Teradata ELT frameworks which ingests data from different sources using Teradata Legacy load utilities.
  • Built a common sftp download or upload framework using Azure Data Factory and Databricks. Maintain and support Teradata architectural environment for EDW Applications.
  • Involved in full lifecycle of projects, including requirement gathering, system designing, application development, enhancement, deployment, maintenance and support
  • Involved in logical modeling, physical database design, data sourcing and data transformation, data loading, SQL and performance tuning.
  • Project development estimations to business and upon agreement with business delivered project accordingly Created proper Teradata Primary Indexes (Pl) taking into consideration of both planned access of data and even distribution of data across all the available AMPS.
  • Considering both the business requirements and factors, created appropriate Teradata NUSI for smooth (fast and easy) access of data.
  • Developing Data Extraction, Transformation and Loading jobs from flat files, Oracle, SAP, and Teradata Sources into Teradata using BTEQ, FastLoad, FastExport, MultiLoad and stored procedure.
  • Design of process oriented UNIX script and ETL processes for loading data into data warehouse Developed mappings in Informatica to load the data from various sources into the Data Warehouse, using different transformations like Source Qualifier, Expression, Lookup, aggregate, Update Strategy, and Joiner
  • Worked on Informatica Advanced concepts & also Implementation of Informatica Push down Optimization technology and pipeline partitioning.
  • Performed bulk data load from multiple data source (ORACLE 8i, legacy systems) to TERADATA RDBMS using BTEQ, MultiLoad and FastLoad.
  • Used various transformations like Source qualifier, Aggregators, lookups, Filters, Sequence generators, Routers, Update Strategy, Expression, Sorter, Normalizer, Stored Procedure, Union etc.
  • Used Informatica Power Exchange to handle the change data capture (CDC) data from the source and load into Data Mart by following slowly changing dimensions (SCD) Type II process.
  • Used Power Center Workflow Manager to create workflows, sessions, and also used various tasks like command, event wait, event raise, email.
  • Designed, created and tuned physical database objects (tables, views, indexes, PPI, UPI, NUPI, and USI) to support normalized and dimensional models.
  • Created a cleanup process for removing all the Intermediate temp files that were used prior to the loading process.
  • Used volatile table and derived queries for breaking up complex queries into simpler queries.
  • Responsible for performance monitoring, resource and priority management, space management, user management, index management, access control, execute disaster recovery procedures.
  • Used Python and Shell scripts to Automate Teradata ELT and Admin activities.
  • Performed Application level DBA activities creating tables, indexes, and monitored and tuned Teradata BETQ scripts using Teradata Visual Explain utility.
  • Performance tuning, monitoring, UNIX shell scripting, and physical and logical database design.
  • Developed UNIX scripts to automate different tasks involved as part of the loading process.
  • Worked on Tableau software for the reporting needs.
  • Worked on creating a few Tableau dashboard reports, Heat map charts and supported numerous dashboards, pie charts and heat map charts that were built on Teradata database.

Environment: Azure Cloud, Azure Data Factory (ADF v2), Azure functions Apps, Azure DataLake, BLOB Storage, SQL server, Teradata Utilities, Windows remote desktop, UNIX Shell Scripting, AZURE PowerShell, Data bricks, Python, Erwin Data Modelling Tool, Azure Cosmos DB, Azure Stream Analytics, Azure Event Hub, Azure Machine Learning.

Confidential, NJ

Sr. Data Engineer-AWS

Responsibilities:

  • Followed Agile Software Development Methodology to build the application iteratively and incrementally. Participated in scrum related activities and daily scrum meetings.
  • Involved in gathering requirements, design, implementation, deployment, testing and maintaining of the applications to meet the organization's needs using SCRUM methodology.
  • I help developers automatically build and deploy software into production multiple times a day safely while maintaining compliance in a highly regulated financial industry.
  • Extensively used tools like Atlassian Bamboo, Bitbucket, Confluence, JIRA, Jenkins, Sonar type Nexus and Nexus IQ, SonarQube, Grunt, and Maven to get the job done.
  • Created Function as a service is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage application functionalities without the complexity of building and maintaining the infrastructure typically associated with developing.
  • Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets.
  • Created a Lambda Deployment function, and configured it to receive events from your S3 bucket
  • Designed the data models to be used in data intensive AWS Lambda applications which are aimed to do complex analysis creating analytical reports for end-to-end traceability, lineage, definition of Key Business elements from Aurora.
  • Writing code that optimizes performance of AWS services used by application teams and provide Code-level application security for clients (IAM roles, credentials, encryption, etc.)
  • Using SonarQube for continuous inspection of code quality and to perform automatic reviews of code to detect bugs. Managing AWS infrastructure and automation with CLI and API.
  • Creating AWS Lambda functions using python for deployment management in AWS and designed, investigated and implemented public facing websites on Amazon Web Services and integrated it with other applications infrastructure.
  • Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function.
  • Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB, Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service Catalog.
  • Regular monitoring activities in Unix/Linux servers like Log verification, Server CPU usage, Memory check, Load check, Disk space verification, to ensure the application availability and performance by using cloud watch and AWS X-ray. implemented AWS X-Ray service inside Confidential, it allows development teams to visually detect node and edge latency distribution directly from the service map Tools.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Automate Datadog Dashboards with the stack through Terraform Scripts.
  • Developed file cleaners using Python libraries and made it clean.
  • Utilized Python Libraries like Boto3, NumPy for AWS.
  • Used Amazon EMR for map reduction jobs and test locally using Jenkins.
  • Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark. Create external tables with partitions using Hive, AWS Athena and Redshift.
  • Developed the PySprak code for AWS Glue jobs and for EMR.
  • Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline/Glue.
  • Designed, developed, and deployed ETL pipelines using AWS services like, Lambda,Glue, EMR, StepFunction, CloudWatch events, SNS, Redshift, S3, IAM, etc.
  • Designed, developed, and deployed Datawarehouse, AWS Redshift, applied my best practices.
  • Experience in writing SAM template to deploy serverless applications on AWS cloud.
  • Design, develop and implement next generation cloud infrastructure at Confidential.
  • Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc.
  • Developed internationalized multi-tenant SaaS solutions with responsive UI's using React or AngularJS, with NodeJS and CSS. Creation of indexes, forwarder & indexer management, Splunk Field Extractor IFX, Search head Clustering, Indexer clustering, Splunk upgradation.
  • Install and configured Splunk clustered search head and Indexer, Deployment servers, Deployers.
  • Designing and implementing Splunk - based best practice solutions.
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift. Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift. Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
  • Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table.
  • Deployed Micro Services, including provisioning AWS environments using Ansible Playbooks. automated various infrastructure activities like
  • Continuous Deployment, Application Server setup, stack monitoring using Ansible playbooks and has Integrated Ansible with Jenkins.
  • Prepared projects, dashboards, reports and questions for all JIRA related services.
  • POC to explore AWS Glue capabilities on Data cataloging and Data integration.

Confidential, Florida

Hadoop Engineer

Responsibilities:

  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily data.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
  • Import the data from different sources like HDFS/HBase into Spark RDD
  • Developed Spark scripts by using Python shell commands as per the requirement
  • Issued SQL queries via Impala to process the data stored in HDFS and HBase.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra.
  • Used Restful Web Services API to connect with the MapRtable. The connection to Database was developed through restful web services API.
  • Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Experience in data migration from RDBMS to Cassandra. Created data-models for customer data using the Cassandra Query Language.
  • Responsible for building scalable distributed data solutions using Hadoop cluster environment with Horton works distribution
  • Involved in developing Spark scripts for data analysis in both Python and Scala. Designed and developed various modules of the application with J2EE design architecture.
  • Implemented modules using Core Java APIs, Java collection and integrating the modules.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers
  • Installed Kibana using salt scripts and built custom dashboards that can visualize aspects of important data stored by Elastic search.
  • Used File System Check (FSCK) to check the health of files in HDFS and used Sqoop to import data from SQL server toCassandra
  • Streaming the transactional datatoCassandra using Spark Streaming/Kafka
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • ConfigMap and Daemon set files to install File beats on Kubernetes PODS to send the log files to Log stash or Elastic search to monitor the different types of logs in Kibana.
  • Created Database in Influx DB also worked on Interface, created for Kafka also checked the measurements on Databases.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc. Successfully Generated consumer group lags from Kafka using their API.
  • Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing. Loading data from different source
  • (database & files) into Hive using the Talend tool.
  • Used Oozie and Zookeeper operational services for coordinating cluster and Scheduling workflows.
  • Implemented Flume, Spark, and Spark Streaming framework for real time data processing.

Environment: Hadoop, Python, HDFS, Hive, Scala, MapReduce, Agile, Cassandra, Kafka, Storm, AWS, YARN, Spark, ETL, Teradata, NoSQL, Oozie, Java, Cassandra, Talend, LINUX, Kibana, HBase

Confidential

Java/Scala Developer

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, modeling, analysis, architecture design & development and the project was developed using Agile Methodologies.
  • Used Java Developer Kit 1.4 for Java/J2EE development.
  • Developed the E-commerce site using JSP, Servlet, EJBs, JavaScript, JDBC.
  • Experience in creating EJBs that implemented business logic.
  • Implemented the presentation layer using JSP, HTML, CSS and client validations using JavaScript. jQuery.
  • Designed and developed GUI using JSP, HTML, XHTML and CSS.
  • Involved in both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming. Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python. Develop ETL Process using SPARK, SCALA, HIVE and HBASE.
  • Developed REST APIs using Scala, Play framework and Akka.
  • Used ScalaTest for writing test cases and coordinated with QA team on end to end testing.
  • Developed REST APIs using Scala and Play framework to retrieve processed data from Cassandra database.
  • Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Used Scala collection framework to store and process the complex consumer information.
  • Used Scala functional programming concepts to develop business logic. Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Processed the schema oriented and non-schema-oriented data using Scala and Spark.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Provided architecture and design as product is migrated to Scala, Play framework and Sencha UI Implemented applications with Scala along with Aka and Play framework.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Auction web app - calculated bids for energy auctions utilizing Scala, JPA and Oracle.
  • Built Kafka-Spark-Cassandra Scala simulator for MetricStream, a big data consultancy; Kafka-Spark-Cassandra prototypes.
  • Developed a Restful API using & Scala for tracking open-source projects in GitHub and computing the in-process metrics information for those projects.
  • Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
  • Experience in using the Docker container system with the Kubernetes integration Developed a Web Application using Java with the Google Web Toolkit API with Postgresql Redis.
  • Creating a dashboard using Flask, Python libraries, and AngularJS to visualize their progress.
  • Improve site performance by making better use of caches via Memcached. On Amazon Web Services.
  • Used R for prototype on a sample data exploration to identify the best algorithmic approach and then wrote Scala scripts using spark machine learning module.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Implemented a Python-based distributed random forest via Python streaming.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, Caffe, TensorFlow, MLLib, Python, a broad variety of machine learning methods including classifications,

Environment: Java1.6, HTML, CSS, JSP, JSF, Spring 2.0, Web services, Microservices, Maven, JavaScript, jQuery, Junit, Oracle10g, Web Sphere.

We'd love your feedback!