Azure Data Engineer Resume
TX
SUMMARY
- Experience Data Engineer with 6+ Years of progressive experience on Analysis, Design, Development and Implementation of end - to-end production data pipelines.
- Expert in providing ETL solutionsfor any type of business model.
- Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand Controlling and granting database accessandMigrating On premise databases toAzure Data lake storeusing Azure Data factory.
- Experience in writing Spark transformations and actions usingSpark SQLinScala.
- Handled Confidential data cube using SPARK framework by writing Spark SQL queries in Scala so as to improve efficiency of data processing and reporting query response time.
- Good experience in writing Spark applications using Scala.
- Developedsparkprogramming code inSCALAon Databrikcs.
- Implemented large Lamda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
- Experience in development and design of various scalable systems usingHadooptechnologies in various environments mostly using Python. Extensive experience in analyzing data using Hadoop Ecosystems includingHDFS, MapReduce, Hive & PIG.
- In - depth knowledge of ApacheHadoopArchitecture (1.x and 2.x) and ApacheSpark1.x Architecture.
- Experience building data pipelines using Azure DataBricks and PySpark.
- Developed SQL/HSQL queries inside AzureDatabricks.
- Experience in understanding the security requirements for Hadoop.
- Developed Spark applications using PySpark and Scala in Azure DataBricks Environment.
- Used Azure Pipelines for CI/CD production devops.
- Strong skills in visualization toolsPower BI,Confidential Excel - formulas, Pivot Tables, Charts and DAX Commands.
- Proficient inSQL, PL/SQLandPythoncoding.
- Experience developingOn - premiseandReal Time processes.
- Developed DAG’s using Airflow to manage real time data pipelines.
- Excellent understanding of best practices ofEnterprise Data Warehouseand involved in Full life cycle development ofData Warehousing.
- Expertise inDBMSconcepts.
- Involved in buildingData ModelsandDimensional Modelingwith3NF, Star and Snowflakeschemas forOLAPandOperational data store (ODS)applications.
- Highly experienced in importing and exporting data betweenHDFSandRelational Database Management systemsusingSqoop.
- Skilled in designing and implementingETL Architecturefor cost effective and efficient environment.
- Optimized and tuned ETL processes & SQL Queries for better performance.
- Performed complexdata analysisand provided critical reports to support various departments.
- Work with Business Intelligence tools likeBusiness Objectsand Data Visualization tools likeTableau.
- ExtensiveShell/Python scriptingexperience for Scheduling and Process Automation.
- Good exposure to Development, Testing, Implementation, Documentation and Production support.
- Develop effective working relationships with client teams to understand and support requirements, develop tactical and strategic plans to implement technology solutions, and effectively manage client expectations.
- An excellent team member with an ability to perform individually, good interpersonal relations, strong communication skills, hardworking and high level of motivation.
TECHNICAL SKILLS
Languages: Scala, SQL, UNIX shell script, JDBC, Python, Spark, PySpark
Hadoop Ecosystem.: HDFS, Map Reduce YARN, Hive, Pig, Hbase, Kafka, Zookeeper, Sqoop, Oozie, DataStax & Apache Cassandra, Drill, Flume, Spark, NIFI
Cloud: Azure Databricks, Azure Blob storage, Azure Virtual machine, AWS
Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL.
Databases: Snowflake(cloud), Teradata, IBM DB2, Oracle, SQL Server, MySQL, NoSQL, Cassandra.
Data Warehousing: Informatica Powercenter/ Powermart/ Dataquality/Bigdata, Pentaho, ETL Development, Amazon Redshift, IDQ.
Version Control Tools: SVM, GitHub, Bitbucket.
BI Tools: Power BI, Tableau
Operating System: Windows, Linux, Unix, Macintosh HD.
PROFESSIONAL EXPERIENCE
Confidential, TX
Azure Data Engineer
Responsibilities:
- Perform Data Profiling to learn about behavior with various features such as traffic pattern, location, Date and Time etc.
- Developed Data lakes using Azure Data Lakes and Blob Storage.
- Implemented Spark using Scala and utilizingData framesand Spark SQL API for faster processing of data.
- Worked Extensively on Azure Data Factory to create batch pipelines.
- Experience in DevelopingSparkapplications using Spark - SQLinDatabricksfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
- Developed Spark applications usingScalaandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer patterns.
- Worked on designing and developing the Real - Time Tax Computation Engine usingOracle, StreamSets, NIFI, Spark Structured StreamingandMemSql.
- Developedsparkprogramming code inSCALAon INTELLIJ IDE using SBT tools.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Implemented Pyspark and utilizing Data frames andSpark SQLAPI for faster processing of data.
- Worked on PySpark Data sources, PySpark Data frames,Spark SQLand Streaming using Scala.
- Worked extensively on Azure Components such as Databrick, Virtual machine, Blob storage
- Experience in developingPySparkapplication usingScala SBT
- Performed a POC to check the time taking for Change Data Capture (CDC) of oracle data acrossStriim, StreamSetsandDBVisit.
- Expertise in using different file formats likeText files, CSV, Parquet, JSON
- Experience in custom compute functions usingSpark SQLand performed interactive querying.
- Responsible for masking and encrypting the sensitive data on the fly
- Responsible for creating multiple applications for reading the data from different Oracle instances to NIFI topics.
- Responsible in maintaining and creating DAG’s using Apache Airflow
- Responsible for setting up a MemSql cluster on Azure Virtual Machine Instance
- Experience in Real time streaming the data usingPySparkwithNIFI.
- Responsible for creating a NIFI cluster using multiple brokers.
- Experience working on Vagrant boxes to setup a local NIFI and Stream Sets pipelines
Environment: Azure Data Bricks, Azure Data Lakes, Azure Data Factory, Spark 2.2, Scala, Linux, Apache NIFI 1.0, Striim, Streamsets, Pyspark, Spark SQL, Spark Structured Streaming, IntelliJ, SBT, git.
Confidential, TX
Data Engineer
Responsibilities:
- Explore clickstream events data with Spark SQL.
- Architecture and Hands-on production implementation of the big data MapR Hadoop solution for Digital Media Marketing using Telecom Data, Shipment Data, Point of Sale (POS), exposure and advertising data related to Consumer Product Goods.
- Spark SQLis used as a part of Apache Spark big data framework for structured, Shipment, POS, Consumer, Household, Individual digital impressions, Household TV impressions data processing.
- Created Data Frames from different data sources like Existing RDDs, Structured data files, JSON Datasets, Hive tables, External databases.
- Load terabytes of different level raw data into Spark RDD for data Computation to generate the Output response.
- Leadership of a major new initiative focused on Media Analytics and Forecasting will have the ability to deliver the sales lift associated the customer marketing campaign initiatives.
- Responsibility includes platform specification and redesign of load processes as well as projections of future platform growth.
- Coordinating the QA, PROD environments deployments.
- Pythonwas used in automation of Hive and Reading Configuration files.
- Involved in Spark for fast processing of data. Used both Spark Shell and Spark Standalone cluster.
- UsingHiveto analyze the partitioned data and compute various metrics for reporting.
- Involved inETL, Data Integration and Migration
- Responsible for creatingHive UDF’sthat helped spot market trends.
- Optimizing HadoopMapReducecode,Hive/Pigscripts for better scalability, reliability and performance
- Experience in storing the analyzed results back into theCassandracluster.
- Developed custom aggregate functions usingSpark SQLand performed interactive querying
Environment: Map Reduce, HDFS, Hive, Python, Scala, Kafka, Spark, Spark Sql, Oracle, SQL, MapR, Sqoop, Data Pipeline, Jenkins, GIT, JIRA, Unix/Linux, Agile Methodology, Scrum.
Confidential
Graduate Trainee
RESPONSIBILITIES:
- The Company uses relational database to store, contain and inform data. The purpose of this project was to use necessary tool to extract data from those databases.
- Procure data from company back end databases and integrate it to Front end user interface using HTML
- Building responsive interfaces using Buttons, Dropdown and lists and connecting them to triggers to display information as requested by using SQL
- Writing Linux Shell Scripts Commands for automation to. Increase operation efficiency
- Developed proper middle ware with content bandwidth to handle the request with low lag and smaller delay using Apache server.
- Enhancement on the company’s Proprietary analytical tools. Identified and executed processing improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
- Experience in Designing Data Mart and Data Warehouse using Star Schema and Snowflake Schema in implementing Decision Support Systems. Ability to quickly stage & shape data from data warehouses into reporting and analytics solutions.
- Create and maintain data mappings, transformation logic (for logical and physical models) to support implementation and ongoing maintenance of the operational data stores and warehouses.
- Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python
- Deployed Marketing strategies which could adapt to rapid changes in the market using recurrent using ad-hoc and ad-stocking