We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

3.00/5 (Submit Your Rating)

Wilmington, OH

SUMMARY

  • Over 7+ years of strong experience in Data Analyst, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statastical modeling, Data modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like Confidential and Python, SAS, Apache Spark, Matlab including Big Data technologies like Hadoop, Hive, Pig.
  • Experience with analysis, design, development, test, and deployment phases of the software Development lifecycle targeting the Microsoft Azure.
  • Extensively used SQL, Numpy, Pandas, Scikit - learn, Spark, Hive for Data Analysis and Model building.
  • Performed the migration of Hive and MapReduce Jobs from on - premise MapR to AWS cloud using EMR and Qubole.
  • Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis)
  • Expert in understanding the data and designing/Implementing the enterprise platforms like AWS Data lake and Huge Data warehouses.
  • Experience in creating Views, Constraints, Triggers, Joins, Cursors, Temp Table, Table Variable, Functions and Stored Procedures
  • Architecture, Design, Development, Hadoop installation and Tuning.
  • Designed and developed a reporting framework using BIRT writing HIVE query.
  • Experience in handling, configuration and administration of databases likeMySQLand NoSQL databases likeMongoDBand Cassandra.
  • Experience in working with databases like MongoDB, MySQL and Cassandra.
  • Extensively worked on Sqoop, Hadoop, Hive, Spark, Cassandra to build ETL and Data Processing systems having various data sources, data targets and data formats
  • Proficient experience inData Warehousing methodologies and concepts, including Star Schemas, Snowflakes Schemas, Dimensional Modeling, and Reporting tools.
  • Expert in writing Complex SQL queries and optimizing the queries in Oracle, SQL Server, and Teradata. Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources, and scheduling.
  • Implementation and technical expertise in data flow diagrams, data dictionary, database normalization, entity relation modeling and design techniques
  • Responsible for creating SSIS Packages using ETL tools such as SSIS Designer for exporting heterogeneous data from OLE DB Source, Excel Spreadsheet to SQL Server.
  • Experience in developing custom reports and various Tabular Reports, Matrix Reports, Ad hoc Reports, Distributed Reports in multiple formats using SSRS and BIDS.
  • Excellent experience on using Teradata SQL Assistant, Teradata Administrator, PMON and data load/export utilities like BTEQ, Fast Load, Multi Load, Fast Export, and Exposure to T pump on UNIX/Windows environments and running the batch process for Teradata.
  • Excellent experience in writing SQL queries to validate data movement between Developed complex mappings in Informatica & DataStage to load the data from various sources into the Data Warehouse, using different transformations/ Stages like Joiner, Transformer, Aggregator, Update Strategy, Rank, Router, Lookup, Sequence Generator, Filter, Sorter, Source Qualifier, Stored Procedure transformation etc.
  • Good exposure to all phases of the Software Development Life Cycle from analysis, design, development, testing, implementation, and maintenance with timely delivery against aggressive deadlines.
  • Experience on working withMongoDB Ops Manager, Cloud Manager and Atlas Manager
  • Proficient knowledge in Statistics and Machine Learning algorithms with excellent understanding of business operations and analytics tools for effective data analysis.
  • Experience in testing Business Intelligence reports generated by various BI Tools like Cognos and Tableau.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, Map Reduce, Azure,HDFD, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume and Talend.

Databases: Microsoft Access, Oracle 11g, MySQL, Amazon DynamoDB, MangoDB

ETL Tools: Informatica, Alteryx, Talend, DataStage, Pentaho

Database Programming: SQL, PL/SQL, HIVE, PostgreSQL

BI and Visualization tools: Tableau, Cognos, Qlikview, Microsoft Power BI

Programming Languages: Python, Confidential, SQL, Perl, Java

Cloud: AWS, Microsoft Azure, Google Cloud

Microsoft Tools: SSIS, SSAS, SSRS, Microsoft Excel

PROFESSIONAL EXPERIENCE

Confidential, Wilmington, OH

Azure Data Engineer

Responsibilities:

  • Responsible for gathering business requirements, SDLC Process, designed data maps and data models.
  • Interacting with Business Analysts and Developers in identifying the requirements, designing, and implementing the Database Schema.
  • Performing codebase maintenance and quality checks for Microsoft Azure.
  • Documenting and maintaining database system specifications, diagrams, and connectivity charts.
  • Experience with creating script for data modeling and data import and export. Extensive experience in deploying, managing, and developing MongoDB clusters.
  • Creation, configuration, and monitoring Shards sets. Analysis of the data to be shared, choosing a shard Key to distribute data evenly. Architecture and Capacity planning for MongoDB clusters. Implemented scripts for mongo DB import, export, dump and restore.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design. Created multiple databases with sharded collections and choosing shard key based on the requirements. Experience in managing MongoDB environment from availability, performance, and scalability perspectives.
  • Responsible for using Cloudera Manager, and end to end tool to manage Hadoop operation.
  • Utilized Apache Spark with Python/Java to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Developed Spark/Scala, Python/Java for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data toHDFSusingScala.
  • Worked on Azure Data Bricks workspace, cluster, mount, secret key and delta DB setup. Stored Schema on External SQL Server.
  • Hands on experience inSparkand Spark Streaming creatingRDD's, applying operations -Transformation and Actions.
  • UsedsparkSQLfor reading data from external sources and processes the data usingScalacomputation framework.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for Development and deployment of custom Hadoop Applications.
  • Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
  • Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances.
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
  • Architected and designed the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.
  • ETL tools were used to inject the data into Data Lake. MapReduce and Pig used for data processing and transformation. Confidential played role as staging and core databases. Exploring the feasibilities of spot fire, business objects and Tableau as the reporting tools.
  • Expertise in designing and deployment of Hadoop cluster and different Big data Analytic tools including pig, hive, HBase, Oozle, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Analyzed performance improvement pathways by parallelizing the scheduled of Informatica jobs using UNIX Shell Scripting thereby improving time to deliver.
  • Assisted in upgrading, Configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Generated custom and parameterized reports, sub reports in SSRS.
  • Created SSRS Reports that have templates and cannot accommodate through SSIS such as ad-hoc, Drill down and Parameterized reports.
  • Developing and optimizing Stored Procedures, Views, and User-Defined Functions for the Application
  • Developing physical data models and creating DML scripts to create database schema and database objects.
  • Compiled source to target mapping documents to design the ETL jobs.
  • Created Clustered and Non-Clustered Indexes to improve data access performance.
  • Developed and implemented ETL (Extract, Transform and Loading) Components based on the filter rules to obtain needed data from different source systems for calculating required metrics using Informatica Power center, UNIX, PL/SQL.
  • Actively involved in developing Complex SSRS Reports, Sub Reports, Matrix/Tabular Reports, Charts and Graphs.
  • Designing dashboards and reports, parameterized reports, predictive analysis in Power BI.
  • Deploying and managing user permissions for reports and dashboards on Power BI web portal.
  • Creating DAX Queries to generated computed columns in Power BI.
  • Responsible for the Database backup and Restoration using SQL native tool.
  • Worked with VB Script and UNIX Shell scripting for File Validations.
  • Partnering closely with business and IT teams in meeting the deadlines pertaining to design and development deliverables and maintaining audit and compliance needs.

Environment: SQL Server reporting Service, Azure,SQL Server Integration Services, UNIX Operating System, Informatica PowerCenter 8.x/7.x/6.x/5.x, Power BI, Tableau.

Confidential, Watertown, MA

Azure Data Engineer

Responsibilities:

  • Developed scripts to migrate data and the database objects to/from multiple sources like patients’ demographics, patient vital information and hospitals visits and insurance claims history.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Executed and verified report data regarding claims (EDI), insurance, population, treatment plans and other funding/financial data.
  • Designed and implemented reports for Healthcare Accreditation, NCQA (National Committee for Quality Assurance).
  • Collected data from the Health Information Network system for reporting purposes.
  • Collaborated with PCE Systems a provider of Electronic Health Record (EHR), to ensure claims and health data accuracy for reporting purposes.
  • Using Apache Spark for streaming applications and write the API using scala, python and java.
  • Expertise in Spark authentication encryption and manage, monitoring spark applications.
  • Using Spark shell for interactive data analysis and process using Spark Sql to query structured data.
  • Experienced in developing scripts for doing transformations usingScala.
  • Created data bricks notebooks using Python (PySpark), Scala and Spark SQL for transforming the data that is stored in Azure Data Lake stored Gen2 from Raw to Stage and Curated zones.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Worked on ETL pipeline to source these tables and to deliver this calculated ratio data from AWS to Datamart (SQL Server) & Credit Edge server
  • Extensively used joins and sub-queries to simplify complex queries involving multiple tables.
  • Worked with developers to do the Database Schemas Changes.
  • Designed and developed efficient SSIS packages for processing fact and dimension tables with complex transforms.
  • Created SSIS packages to load data by using for each loop Containers, Lookup, Fuzzy Lookup, Derived Columns, Condition Split, Term Extraction, Aggregate, Data Conversion, Pivot Transformation, and Slowly Changing Dimension, FTP task, SMTP task.
  • Developed various reports using Tableau Desktop, Tableau Prep and published on Tableau Online.
  • Created stored procedures, triggers, functions and added/changed tables for extraction, transformation and loading data.
  • Involved in restore operations to refresh the data from production to test servers.

Environment: SQL Server Integration Services, Tableau, Azure MySQL Server, Aginity Pro-Netezza driver.

Confidential

Senior Data Engineer

Responsibilities:

  • Designed and Developed Tableau dashboards from scratch to generate actionable insights and solutions for client services. Unified dashboard which displayed Implementation and Opportunity stages; Opportunities won by sales rep, features most used, Live customer count, breakdown by Persona of users etc. which helped in increasing product’s sales by 4 % and improving the overall user experience.
  • Created databases and schema objects including tables, indexes, and applied constraints, connected various applications to the database and written functions, stored procedures, and triggers.
  • Build and published customized interactive reports and dashboards, report scheduling using Tableau server.
  • Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau.
  • Designed, developed, and implemented BI solutions for Sales, Product and Customer KPIs.
  • Involved in re-designing of database to create de-normalized reporting schema for better throughput.
  • Worked in both Waterfall and Agile Methodology.
  • Experience in Troubleshooting Report Processing, Subscription, Delivery and Configuration Problems.
  • HR Predictive Modeling: Fetched employee data from database. Pre-processed data using Python/Java and identified trends in attrition and retention rate over the years using Tableau. Created Dashboards and Story to convey the analysis & predictions.
  • Created Stored Procedures, User-defined Functions, designing and implementing of Database Triggers, Views, and Indexes.
  • Excellent report creation skills using Microsoft SQL Server Reporting Services (SSRS) with proficiency in using Report Designer as well as Report Builder.
  • Experience in designing dashboards and reports, parameterized reports, predictive analysis in Tableau.
  • Performed day-to-day Database Maintenance tasks including Database Monitoring, Backups, Space and Resource Utilization.
  • Identified, documented, and created linking diagrams to Excel using Microsoft Visio.

Environment: Matplot, Oracle 11G, SQL Server Reporting Service, MS Access SQL, Tableau, Power BI, Microsoft Visio.

Confidential

Data Analyst Engineer

Responsibilities:

  • Worked through all the phases of Software Development Life Cycle (SDLC) including Requirements Gathering, Analysis, Design, Development, Testing, Production and Post-production Support.
  • Developed Stored Procedures and User Defined Functions for providing input feed for front end applications.
  • Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, and joins.
  • Designed and worked on Extraction, Transformation and Loading (ETL) process by pulling up large volume of data from various data sources using SSIS.
  • Developed a 360-business dashboard in Tableau with multiple panels and parameters for Salesforce team.
  • Map sources to targets using Excel Macro functions, SQL scripts, including BusinessObjects Data Services/BODI.
  • Design and develop ETL processes. Establish coding standards, perform Peer review, and automate the health of the platform.
  • Develop mapping / sessions / workflows.
  • Developed complex SQL queries using stored procedures, common table expressions (CTEs), temporary table to support SSRS reports.
  • Actively involved in developing Complex SSRS Reports involving Sub Reports, Matrix/Tabular Reports, Charts and Graphs.
  • Worked on the data warehouse design and maintained different dimension and fact tables.
  • Installed/configured SQL Server in Virtual Environments using VMWare.
  • Reduced the timeline of individual migration projects by several months through optimization and documentation.

Environment: Microsoft Server Integration Services, MS Excel Macros, SSRS, VMware.

We'd love your feedback!