We provide IT Staff Augmentation Services!

Sr. Azure Data Engineer Resume

5.00/5 (Submit Your Rating)

Irving, TX

SUMMARY

  • More TEMPthan10 years of experience in IT industry, including big data environment, data engineering, Hadoop ecosystem and Design, Developing, Maintenance of various applications.
  • Strong experience of leading multiple Azure Big Data and Data transformation implementations in Banking and Financial Services, and Healthcare industries.
  • Hands on experience in Hadoop ecosystem including Spark, Kafka, HBase, Scala, Pig, Impala, Sqoop, Oozie, Flume, Storm, big data technologies and experienced in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Scala, Java into Pig Latin and HQL (HiveQL).
  • Experience in implementing large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
  • Expertise in core Java, JDBC and proficient in using Java APIs for application development and experience includes development of web - based applications using Core Java, JDBC, Java Servlets, JSP, Struts Framework, Hibernate, HTML, JavaScript, XML and Oracle.
  • Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
  • Experience in build scripts using Maven and do continuous integrations (CI/CD) systems like Docker, Kubernetes and Jenkins.
  • Good experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions and leveraged and integrated Google Cloud Storage and Big Query applications, which connected to Tableau for end user web-based dashboards and reports.
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory.
  • Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions and worked on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
  • Experience in Database Design and development with Business Intelligence usingSQL Server 2014/2016, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS),DAX, OLAP Cubes, Star Schema and Snowflake Schema.
  • Exploring on Azure Cognitive Services (LUIS) and Machine Learning and IOT technologies.
  • Experienced in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open-source tools and experience in installation, configuration, supporting and managing Hadoop clusters
  • Experienced in Data Modeling &Data Analysis experience using Dimensional Data Modeling and Relational Data Modeling, Star Schema/Snowflake Modeling, FACT & Dimensions tables, Physical & Logical Data Modeling.
  • Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - Map Reduce framework.
  • Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters and Hortonworks.
  • Experience in database design, entity relationships, database analysis, programming SQL, stored procedures PL/ SQL, packages and triggers in Oracle and experience in using PL/SQL to write Stored Procedures, Functions and Triggers.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HIVE, Pig, Sqoop, Flume, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks and Flume

SQL and NoSQL Databases: Oracle 12c/11g, MySQL, MS-SQL, Teradata, HBase, MongoDB, Cassandra.

Version Control: GIT, GitLab, SVN

Data Warehousing & Reporting Tools: Informatica, Erwin, ER Studio, Databricks, Snowflake, SSIS, SSRS, PowerBI and Tableau.

Cloud Technologies: Azure Data Lake, Azure Data Factory, Azure Databricks, Synapse, Azure SQL, Azure DW and Storage Blob.

Programming Languages: Java, Python, Pyspark, SQL, PL/SQL, HiveQL, UNIX Shell Scripting, Scala.

Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)

Operating Systems: Windows, UNIX/Linux and Mac OS.

CI/D: Docker, Jenkins and Kubernetes.

PROFESSIONAL EXPERIENCE

Sr. Azure Data Engineer

Confidential - Irving TX

Responsibilities:

  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine teh impact of new implementation on existing business processes.
  • Implemented solutions for ingesting data from various sources and processing teh Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
  • Involved in Extracting Transforming and Loading data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in In Azure Databricks.
  • Managed and lead teh development effort with teh halp of a diverse internal and overseas group and design/ architected and implemented complex projects dealing with teh considerabledatasize (TB/ PB) and with high complexity.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib and Exploring with teh Spark for improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL,DataFrame, Pair RDD's, Spark YARN.
  • Performed data profiling and transformation on teh raw data using Pig, Python, and Java and developed predictive analytic using Apache Spark Scala APIs.
  • Performed ETL using Azure Data Bricks. Migrated on-premise Oracle ETL process to Azure Synapse Analytics.
  • Developed Spark applications usingPysparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
  • Developed JSON Scripts for deploying teh Pipeline in Azure Data Factory (ADF) dat process teh data using teh SQL Activity.
  • Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries and involved in working of big data analysis using Pig and User defined functions (UDF) and created Hive External tables and loaded teh data into tables and query data using HQL.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored teh data into HDFS in CSV format and used Data Frame API in Scala for converting teh distributed collection of data organized into named columns.
  • Worked on catapulting data from teradata to snowflake to consume on Databricks and worked on TeradataSQL queries, Teradata Indexes, Utilities such as Mload, Tpump, Fast load and FastExport.
  • Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning and design and implement streaming solutions using Kafka or Azure Stream Analytics.
  • Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up teh preparation of high-quality data and worked on Azure Streaming Analytics with Event Hubs and sending output to PowerBi Dashboard.
  • Exploring DAG's, their dependencies and logs usingAirflowpipelines for automationand useApacheAirflowto schedule and run theairflowdags to execute code and involved in schedulingAirflowworkflow engine to run multiple Hive and pig jobs using python.
  • Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau and involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts and Developed HiveQL scripts for performing transformation logic and also loading teh data from staging zone to landing zone and Semantic zone.
  • Maintain and work with ourdatapipeline dat transfers and processes several terabytes ofdatausing Spark, Scala, Python, Apache Kafka, Pig/ Hive & Impalaand Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with .csv, JSON, parquet and HDFS files.
  • Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off teh jobs on time for data availability and worked on Oozie scheduler to automate teh pipeline workflow and orchestrate teh Sqoop, hive and pig jobs dat extract teh data on a timely manner.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
  • Virtualized teh servers using teh Docker for teh test environments and dev-environments needs and utilized Kubernetes and Docker for teh runtime environment for teh CI/CD system to build, test, and deploy.

Environment: Big Data, Spark, YARN, HIVE, Azure Databricks, Azure Data Lake, Azure Stream, Data Factory, Azure ADF, Azure DW, Azure SQL, Pig, ETL, CI/CD, Jenkins, JavaScript, Azure Synapse, JSP, HTML, Scala, Python, Hadoop Framework, Dynamo DB, Snowflake, Cloudera, ETL, Airflow, RDS, Oozie, Zookeeper, SQL, EMR, JDBC, NOSQL, Sqoop, MYSQL.

Sr. Azure Data Engineer

Confidential - Denver CO

Responsibilities:

  • Evaluate deep learning algorithms for text summarization using Python, Keras and TensorFlow on Cloudera Hadoop System and Use Spark API for Machine learning and translate a predictive model from SAS code to Spark and used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Recreating existing application logic and functionality in teh Azure Data Lake, Data Factory, SQL Database and SQL data warehouse environment
  • Exploring with teh Spark for improving teh performance and optimization of teh existing algorithms inHadoopusing Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Utilized teh Azure polybase utility to run t-sql queries on external data in Hadoop and as well as to import and export data from Azure blob storage.
  • Developed Sqoop scripts for teh extractions of data from various RDBMS databases into HDFS and developed scripts to automate teh workflow of various processes using python and shell scripting and collected and aggregate large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed data pipeline using Pig and Hive from Teradata, DB2 data sources and these pipelines had customized UDF'S to extend teh ETL functionality and extensively used ETL methodology for supporting Data Extraction, transformations and loading processing, using Hadoop.
  • Created real-time streaming Operational Data Store environment with Azure event hubs and stream analytics which directly streamed into MS Power BI for corporate reporting
  • Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive and used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting on teh dashboard.
  • Deploying Azure Resource Manager JSON Templates from PowerShell worked on Azure suite: Azure SQL Database, Azure Data Lake, Azure Data Factory, Azure SQL Data Warehouse, Azure Analysis Service
  • Developed Map-Reduce programs using java andpythonto parse teh raw data and store teh refined data in Hive and Used UDF's to implement business logic in Hadoop by using Hive to read, write and query teh Hadoop data in HBase.
  • Implemented microservices, data lake ingestion of data and BI functions using teh power U-SQL scripting language provided by Azure and python.
  • Worked on writing Perl scripts covering data feed handling, implementing mark logic, communicating with web services through SOAP Lite module and WSDL.
  • Used Oozie workflow engine to run multiple Hive and Pig Scripts with teh halp of Kafka for teh real-time processing of data to navigate through data sets in teh HDFS storage by loading Log File data directly into HDFS using Flume.
  • DevelopedPythonMapReduce programs for log analysis and Designed Algorithm for finding teh fake review by usingpython.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce and developed teh code for Importing and exporting data into HDFS and Hive using Sqoop and Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on teh dashboard.
  • Using SparkSQL to create data frames by loading JSON data and analyzing it and developed Spark code using Scala and Spark-SQL for faster testing and data processing.

Environment: Pig, Sqoop, Kafka, Apache Cassandra, Oozie, Impala, Azure Data Factory, Azure SQL, Azure ADF, Azure DW, Cloudera, Flume, Apache Hadoop, HDFS, Hive, Azure Synapse, PowerBI Map Reduce, Cassandra, Zookeeper, MySQL, Eclipse, Dynamo DB, PL/SQL and Python.

Sr. Azure Data Engineer

Confidential - Cary NC

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop and designed teh projects using MVC architecture providing multiple views using teh same model and theirby providing efficient modularity and scalability.
  • Designed, deployed, maintained and lead teh implementation of Cloud solutions using MicrosoftAzureand underlying technologies and analyzed business requirements, facilitating planning and implementation phases of teh OLAP model in Team meetings.
  • Extensively worked with Avro and Parquet files and converted teh data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in Spark.
  • Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing teh packages in both teh environments (Development and Production).
  • Designed complex data intensive reports inPower BIutilizing various graph features such as gauge, funnel.
  • Implemented Spark Core in Scala to process data in memory and performed job functions using Spark APIs in Scala for real time analysis and for fast querying purposes.
  • Interacted with multiple teams who are responsible forAzurePlatform to fix theAzure Platform Bugs and worked on container-based technologies like Docker, and Kubernetes.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using teh Spark framework and Used Hadoop Pig, Hive and Map Reduce for analyzing teh data to halp by extracting data sets for meaningful information.
  • Handled importing of data from various data sources, performed transformations using MapReduce, Spark and loaded data into HDFS.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as merging many small files into a handful of very large, compressed files using pig pipelines in teh data preparation stage.
  • Implemented OLAP multi-dimensional cube functionality using AzureSQL Data Warehouse and Wrote AZUREPOWERSHELLscripts to copy or move data from local file system to HDFS Blob storage.
  • Used Pig in three distinct workloads like pipelines, iterative processing and research and used Pig UDF's in Pythoncode and uses sampling of large data sets.
  • Used various Transformations in SSIS Dataflow, Control Flow using for loop Containers and Fuzzy Lookups etc.
  • Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop and implemented exception tracking logic using Pig scripts and involved in moving all log files generated from various sources to HDFS for further processing through Flume and process teh files.
  • Involved in teh design of Data-warehouse usingStar-Schemamethodology and converted data from various sources to SQL tables.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it and test-Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA and scheduled map reduce jobs in production environment using Oozie scheduler.
  • Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files and exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team.

Environment: Hadoop, MS Azure, Map Reduce, Spark, SSIS, SSRS, Kafka, SQL, HDFS, Hive, Pig, Oozie, Core Java, Eclipse, HBase, Flume, Cloudera, Oracle10g, UNIX Shell Scripting, Azure Synapse, Scala, MongoDB, HBase, Cassandra, Python.

Sr. Data Analyst

Confidential - Amherst OH

Responsibilities:

  • Worked with Business users during requirements gathering and business analysis to prepare high level LogicalDataModels and PhysicalDataModels and Performed Reverse Engineering of teh current application using ER/Studio and developed Logical and PhysicalDataModels for Central Model consolidation.
  • Analysis of functional and non-functional categorized data elements for data profiling and mapping from source to target data environment and Written SQL scripts to test teh mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
  • Involved in integration of various relational and non-relational sources such as DB2, Teradata 13.1, Oracle 9i, SFDC, Netezza, SQL Server, COBOL, XML and Flat Files.
  • Redefined many attributes and relationships in teh reverse engineered model and cleansed unwanted tablescolumns on Teradata database as part of data analysis responsibilities.
  • Performed data mining on Claim’s data using very complex SQL queries and discovered claims pattern and Created DML code and statements for underlying & impacting databases.
  • Involved inNormalization/De-normalization, Normal Form and database design methodology. Expertise in using data modeling tools like MS Visio and ER/Studio Tool for logical and physical design of databases.
  • PerformedDataModeling, Database Design, andDataAnalysis with teh extensive use of ER/Studio and documented ER Diagrams, Logical and Physical models, business process diagrams and process flow diagrams.
  • Created reports inOracleDiscoverer by importing PL/SQL functions on teh Admin Layer, in order to meet teh sophisticated client requests.
  • Extensively used SQL, Transact SQL and PL/SQL to write stored procedures, functions, packages and triggers and created tables, views, sequences, indexes, constraints and generated SQL scripts for implementing physicaldatamodel.
  • Performing data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel and SQL.
  • Created PhysicalDataModelfrom theLogicalDataModelusing Compare and Merge Utility in ER/Studio and worked with teh naming standards utility.
  • Involved in implementing teh Land Process of loading teh customer Data Set into Informatica Power Center, MDMfrom various source systems
  • Migration of databases fromDB2 toSQLServer2005, 2008 and tuning and code optimization using different techniques like dynamic SQL, dynamic cursors, tuning SQL queries, writing generic procedures, functions and packages
  • Extensively worked on Shell scripts for running SSIS programs in batch mode on UNIX and Created mappings using pushdown optimization to achieve good performance in loading data intoNetezza.

Environment: ER/Studio, Teradata 13.1, SSIS, SAS, Excel, T-SQL, SSRS, Tableau, SQL Server, Cognos, Pivot tables, Graphs, MDM, PL/SQL, ETL, DB2, Oracle 9i, SQL, Teradata14.1, Informatica Power Center etc.

Data Analyst

Confidential - Minneapolis MN

Responsibilities:

  • Performeddataanalysis and profiling of sourcedatato better understand teh sources.
  • Created logicaldatamodel from teh conceptual model and its conversion into teh physical database design using Erwin.
  • Worked with DBAs to create a best-fit PhysicalDataModel from teh logicaldatamodel.
  • Designed Sources to Targets mappings from SQL Server, Excel/Flat files, Xml Files to Teradata using Informatica Power Center Data cleansing, integrating, and matching using Informatica Data Quality IDQ
  • Redefined many attributes and relationships in teh reverse engineered model and cleansed unwanted tables/columns as part ofdataanalysis responsibilities.
  • Interacted with teh database administrators and business analysts for data type and class words.
  • Conducted design sessions with business analysts and ETL developers to come up with a design dat satisfies teh organization's requirements.
  • Worked on enterprise logicaldatamodeling project (in third normal form) to gatherdatarequirements for OLTP enhancements. Converted third normal form ERDs into dimensional ERDs fordatawarehouse effort.
  • Used Model Mart of Erwin for TEMPeffective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Created ER Diagrams,DataFlow Diagrams, grouped and created teh tables, validated thedata, identified PK/ FK for lookup tables.
  • Created 3NF business areadatamodeling with de-normalized physical implementationdataand information requirements analysis using Erwin tool.
  • Developed Star Schema and Snowflake Schema in designing teh Logical Model into Dimensional Model.
  • Assisted teh ETL team to document teh transformation rules fordatamigration from OLTP to Warehouse environment for reporting purposes.
  • Implement necessary DQ rules inIDQAnalyst while profiling teh data.
  • Involved in extensivedataanalysis on teh Teradata and Oracle systems querying and writing in SQL and TOAD.
  • Used SQL joins, aggregate functions, analytical functions, group by, order by clauses and interacted with DBA and developers for query optimization and tuning.
  • Conducted several PhysicalDataModel training sessions with teh ETL Developers. Worked with them on day-to-day basis to resolve any questions on Physical Model.

Environment: CA Erwin 9.1, Oracle11g, SQL server 2005, IBM DB2, Informatica Power Center, IDQ, SQL BI 2008, Oracle BI, Visual Studio, SSIS&SSRS, Tibco Spotfire, SQL server management studio 2012.

We'd love your feedback!