We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

3.00/5 (Submit Your Rating)

CA

SUMMARY

  • Over 8 years of professional experience as Big Data Engineer dealing wif Apache Hadoop Ecosystem like HDFS, Map Reduce, Hive, Sqoop, Oozie, HBase, Spark - PySpark, Kafka and Big Data Analytics.
  • Experience in designing, implementing large scale data pipelines for data curation using Spark/Data Bricks along wif Python and Pyspark.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Highly experienced in developing Hive Query Language and Pig Latin Script.
  • Experienced in using distributed computing architectures such as AWS products (EC2, Redshift, and EMR, Elastic search, Atana and Lambda), Hadoop, Python, Spark and TEMPeffective use of MapReduce, SQL and Cassandra to solve big data type problems.
  • Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autopsy’s.
  • Defined and deployed monitoring, metrics, and logging systems on AWS.
  • Built data pipelines using Azure Data Factory, Azure Databricks.
  • Loaded the data to Azure Data Lake, Azure SQL Database
  • Used Azure SQL Data Warehouse to control and grant database access.
  • Worked wif Azure services such as HDInsight, Databricks, Data Lake, Blob Storage, Data Factory, Storage Explorer, SQL DB, SQL DWH, Cosmos DB.
  • Experience in developing CI/CD (continuous integration and continuous deployment) and automation using Jenkins, Git, docker, Kubernetes for ML models deployment.
  • Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Informatica Power Centre.
  • Experience in designing, building, and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase, MongoDB, and Spark.
  • Extensive hands on experience tuning spark Jobs.
  • Experienced in working wif structured data using HiveQL, and optimizing Hive queries.
  • Experience wif Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
  • Working experience in migrating several other databases to Snowflake
  • Strong experience wif architecting highly per formant databases using MySQL and MangoDB.
  • Extensive experience in loading and analyzing large datasets wif Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume and Sqoop)
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting and Object- Oriented Programming (OOPs), multithreading in Core Java, JDBC.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Hands on experience in scheduling data ingestion process to data lakes using Apache Airflow.
  • Good Knowledge on architecture and components of Spark, and excellent noledge in Spark Core, Spark SQL, Spark streaming for interactive analysis, batch processing and stream processing
  • Shown expertise in building PySpark applications.
  • Worked wif various streaming ingest services wif Batch and Real-time processing using Spark streaming, Kafka, confluent, Storm, Flume and Sqoop.
  • Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.

TECHNICAL SKILLS

Languages: Python, Java, R, Pyspark, SQL, PL/SQL, T-SQL, NoSQL.

Web Technologies: HTML, CSS, XML.

Big data eco system: Hadoop, Hive, Pig, Spark, Sqoop, Oozie, Kafka, Zookeeper, Cloudera, Hortonworks.

Databases: Oracle, SQL Server, Neo4j, MongoDB.

Development Tools: Jupyter, Anaconda, Eclipse, SSIS, SSRS, Pycharm.

Visualization Tools: Tableau, PowerBI

Cloud Technologies: Azure, AWS (S3, Redshift, Glue, EMR, Lambda, Atana)

Automation/ Scheduling: Jenkins, Docker, Kubernetes, Airflow.

Version Control: Git, SVN.

PROFESSIONAL EXPERIENCE

Confidential, CA

Azure Data Engineer

Responsibilities:

  • Implemented Big data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive for ingesting data from diverse sources and processing Data-at-Rest.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Used Hadoop technologies like spark and hive Including using the PySpark library to create spark data frames and converting them to normal panda’s data frames for analysis.
  • Played a key role in migrating Cassandra, Hadoop cluster on Azure and defined different read/write strategies.
  • Designed and build a Data Lake using Hadoop and its ecosystem components.
  • Developed Spark, Python for regular expression (regex) project in the Hadoop/Hive environment wif Linux/Windows for big data resources.
  • Worked wif data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in ORC format.
  • Executed multiple Spark SQL queries after forming the Database to gather specific data corresponding to an image.
  • Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system wif CSV, JSON, distributed files.
  • Implemented data ingestion from various source systems using sqoop and PySpark.
  • Has noledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster and Implemented to reprocess the failure messages in Kafka using offset id.
  • Reviewed Kafka cluster configurations and provided best practices to get peak performance.
  • Extract Transform and Load data from Sources Systems to Azure Data.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Data bricks.
  • Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.
  • Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
  • Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight.
  • Implement software enhancements to port legacy software systems to Spark and Hadoop ecosystems on Azure Cloud.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in Azure.
  • Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
  • Enhancements to conventional data warehouses based on the STAR schema, data model updates, and Tableau data analytics and reporting.

Environment: Azure, ADF, ADL, Spark, Hadoop, YARN, Azure, HTML, Python, Data bricks, Kubernetes JDBC, TERADATA, NOSQL, Sqoop, MYSQL.

Confidential, NC

AWS Data Engineer

Responsibilities:

  • Evaluated business requirements and prepared detailed specifications dat follow project guidelines required to develop written programs.
  • Created the automated build and deployment process for application, application setup for better user experience, and leading up to building a continuous integration system.
  • Worked on developing PySpark script to encrypting the raw data by using Hashing algorithms concepts on client specified columns.
  • Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
  • Developed data pipeline using Spark, Hive, python to ingest customer
  • Worked on migrating Map Reduce programs into Spark transformations using Python.
  • Worked on Spark Data sources (Hive, JSON files, Spark Data frames, Spark SQL and Streaming using Python.
  • Developed Spark scripts by writing custom RDDs in Python for data transformations and perform actions on RDDs.
  • Implemented Kafka, spark structured streaming for real time data ingestion.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using Map Reduce programs.
  • Involved in designing optimizing Spark SQL queries, Data frames, import data from Data sources, perform transformations and stored the results to output directory into AWS S3.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Created spark jobs to apply data cleansing/data validation rules on new source files in inbound bucket and reject records to reject-data S3 bucket.
  • Involved in converting Hive/SQL queries into transformations using Python and performed complex joins on tables in hive wif various optimization techniques.
  • Created Hive tables as per requirements, internal or external tables defined wif appropriate static and dynamic partitions, intended for efficiency.
  • Worked extensively wif HIVE DDLS and Hive Query language(HQLs)
  • Involved in creating Hive tables, loading wif data and writing hive queries which will run internally in Map Reduce way.
  • Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers.
  • Develop Kafka producer and consumers, HBase clients, Spark jobs using Python along wif components on HDFS, Hive.
  • Created Kafka producer API to send live-stream data into various Kafka topics and developed Spark- Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
  • Extracted, transformed and loaded data from various heterogeneous data sources and destinations using AWS.
  • Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.
  • Prepared ETL design document which consists of the database structure, change data capture, Error handling, restart and refresh strategies.
  • Worked in Production Environment which involves building CI/CD pipeline using Jenkins wif various stages starting from code checkout from GitHub to Deploying code in specific environment.
  • Developed AWS cloud formation templates and setting up Auto scaling for EC2 instances and involved in the automated provisioning of AWS cloud environment using Jenkins.
  • Created automated pipelines in AWS Code Pipeline to deploy Docker containers in AWS ECS using S3.

Environment: Spark, Hive, HBase, Sqoop, Flume, ADF, Blob, cosmos DB, Map Reduce, HDFS, Cloudera, SQL, Apache Kafka, AWS, S3, Kubernetes, Python, Unix.

Confidential, Ohio

Big Data Engineer

Responsibilities:

  • Implemented Big data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive for ingesting data from diverse sources and processing Data-at-Rest.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Used Hadoop technologies like spark and hive Including using the PySpark library to create spark data frames and converting them to normal panda’s data frames for analysis.
  • Played a key role in migrating Cassandra, Hadoop cluster on Azure and defined different read/write strategies.
  • Designed and build a Data Lake using Hadoop and its ecosystem components.
  • Developed Spark, Python for regular expression (regex) project in the Hadoop/Hive environment wif Linux/Windows for big data resources.
  • Worked wif data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in ORC format.
  • Executed multiple Spark SQL queries after forming the Database to gather specific data corresponding to an image.
  • Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system wif CSV, JSON, distributed files.
  • Implemented data ingestion from various source systems using sqoop and PySpark.
  • Hands on experience implementing Spark and Hive jobs performance tuning.
  • Has noledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster and Implemented to reprocess the failure messages in Kafka using offset id.
  • Reviewed Kafka cluster configurations and provided best practices to get peak performance.
  • Extract Transform and Load data from Sources Systems to Azure Data.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in in Azure Data bricks.
  • Implemented Copy activity, Custom Azure Data Factory Pipeline Activities.
  • Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprise data lake built on Azure Cloud.
  • Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in Azure.
  • Designed and Implemented Error-Free Data Warehouse-ETL and Hadoop Integration.
  • Enhancements to conventional data warehouses based on the STAR schema, data model updates, and Tableau data analytics and reporting.
  • Evaluated the performance of Data bricks environment by converting complex Redshift scripts to spark SQL as part of new technology adaption project.

Environment: Spark, Hadoop, YARN, Azure, HTML, Python, Data bricks, Kubernetes JDBC, TERADATA, NOSQL, Sqoop, MYSQL.

Confidential - Chicago, IL

ETL Developer / Data Warehouse Developer

Responsibilities:

  • Creation, manipulation and supporting the SQL Server databases.
  • Involved in the Data modelling, Physical and Logical Design of Database
  • Helped in integration of the front end wif the SQL Server backend.
  • Created Stored Procedures, Triggers, Indexes, User defined Functions, Constraints etc. on various database objects to obtain the required results.
  • Import & Export of data from one server to other servers using tools like Data Transformation Services (DTS)
  • Wrote T-SQL statements for retrieval of data and involved in performance tuning of TSQL queries.
  • Transferred data from various data sources/business systems including MS Excel, MS Access, Flat Files etc. to SQL Server using SSIS/DTS using various features like data conversion etc. Also Created derived columns from the present columns for the given requirements.
  • Supported team in resolving SQL Reporting services and T-SQL related issues and Proficiency in creating different types of reports such as Cross-Tab, Conditional, Drill-down, Top N, Summary, Form, OLAP and Sub reports, and formatting them.
  • Created logging for ETL load at package level and task level to log number of records processed by each package and each task in a package using SSIS.
  • Developed, monitored and deployed SSIS packages.
  • Created different Parameterized Reports (SSRS 2005/2008) which consist of report Criteria in various reports to make minimize the report execution time and to limit the no of records required.
  • Worked on all types of report types like tables, matrix, charts, sub reports etc.
  • Created Linked reports, Ad-hoc reports and etc. based on the requirement. Linked reports are created in the Report Server to reduce the repetition the reports.

Environment: Microsoft Office, Windows 2007, T-SQL, DTS, SQL Server 2008, HTML, SSIS, SSRS, XML.

Confidential

DW/ETL Developer

Responsibilities:

  • Involved as a developer for the commercial business group data warehouse.
  • Development of source data profiling and analysis, reviewed the data content and metadata would facilitate data mapping and validate assumptions dat were made in the business requirements.
  • Created logical and physical designs of the database and ER Diagrams for Relational and Dimensional databases using Erwin.
  • Extracted data from relational databases Oracle and flat files.
  • Developed complex transformations, Mapplets using Informatica Power Centre 8.6.1 to Extract, Transform and load data into Operational Data Store (ODS).
  • Lead, created and launched new automated testing tools and accelerators for SOA services and data driven automation built wifin our practice.
  • Designed complex mappings using Source Qualifier, Joiners, Lookups (Connected and Unconnected) and Expression, Filters, Router, Aggregator, Sorter, Update Strategy, Stored procedure and Normalizer transformations.
  • Ensured the data consistency by cross-checking sampled data upon migration between the database environments.
  • Developed a process to extract the source data and load it into the flat files after cleansing, transforms and integrating.
  • Designed SSIS packages to Extract, Transfer and load the (ETL) existing data into SQL server from different environments for the SSAS cubes.
  • Worked as an architecture and modelling teams and used Middleware SOA services.
  • Performed data alignment and data cleansing and debugger to test the mapping and fixed the bugs.
  • Created Sessions, Sequential and Concurrent sessions for proper execution of mappings in workflow manager.
  • Provided SSRS and SSIS support for internal IT projects requiring report developments.
  • Involved in System Integration Testing (SIT) and User Acceptance Testing (UAT).

Environment: Informatica 8.6.1, SQL Server 2005, RDBMS, Fast load, FTP, SFTP.

We'd love your feedback!