We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

2.00/5 (Submit Your Rating)

Malvern, PA

SUMMARY

  • Cloud Machine Learning Engineer with nearly 7 years of experience as a Big Data Engineer with expertise in SQL, Python and Spark/Hadoop. Significant experience working with AWS and Azure.
  • Worked in large - scale database environments like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes, and Hadoop Distributed File System.
  • Good experience in Hadoop ecosystem and Apache Spark framework such as HDFS, MapReduce, HiveQL, SparkSQL, Pyspark.
  • Experience in Data Architecture, Design, Pipelining, Configuration and Management using Hadoop and Apache Spark ecosystems on different distributions.
  • Running and scheduling workflows using Oozie and Zookeeper, identifying failures and integrating, coordinating, and scheduling jobs.
  • Experience working with Azure Blob Storage, Azure Data Lake, Azure Data Factory, Azure SQL, Azure SQL Datawarehouse, Azure Analytics, Polybase, Azure HDInsight, Azure Databricks. cloud experience in Technologies like S3, EC2, IAM.
  • Development level experience in Microsoft Azure providing data movement and scheduling functionality to cloud-based technologies such as Azure Blob Storage and Azure SQL Database.
  • Experience in building high performance and scalable solutions using various Hadoop ecosystem tools likePig,Hive,Sqoop,Spark.
  • Extensively worked onSparkand its components likeSparksql,SparkRandSpark streaming.
  • Extensive experience with SQL, PL/SQL, PostgreSQL, and database concepts.
  • Built Data pipelines to migrate data between systems for different types of databases like MySQL, Postgres, Oracle in RDMS and MongoDB, NoSQL in Non- Relational Databases.
  • Strong experience working with databases likeSQL Server,Oracle,MS Access, No SQL databases like MongoDB and Cassandra.
  • Experience in using to analyze data from multiple sources and creating reports with Interactive Dashboards using power BI.
  • In-Depth noledge of Statistics and Machine Learning Algorithms such as Classification and Regression models.

TECHNICAL SKILLS

Big Data Frameworks: HDFS, MapReduce, Apache Spark, Apache Hive, YARN, HBase, Apache Pig, Apache HBase, Spark Streaming, Spark SQL, Spark ML, Oozie, Hue, Sqoop, Flume, Kafka, Zookeeper

Languages & Scripting: Python, Java, JavaScript, R, HTML, CSS, SQL, SAS, XML, Django, Machine learning, Pyspark, Scala

Cloud Services: AWS, Azure SQL, Blob storage, Azure SQL Data warehouse, Azure Data Lake, Azure Storage, Azure SQL, Azure DW

Library & Packages: matplotlib, NumPy, SciPy, Pandas, MLLIB, NLTK

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, Visual Studio

Database: MySQL, MS-SQL server, Postgres, MongoDB, NoSQL

PROFESSIONAL EXPERIENCE

Confidential, Malvern, PA

Azure Data Engineer

Responsibilities:

  • Automated Spark jobs to load teh data using PySpark libraries by applying various actions and transformations on teh various datasets.
  • Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
  • Used Spark Stream processing using Python to get data into in-memory, implemented RDD transformations, and performed actions.
  • Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, and Python.
  • Implement ETL and data movement solutions using Azure Data Factory, SSIS create and run SSIS Package ADF V2 Azure-SSIS IR.
  • Created DDL's for tables and executed them to create tables in teh warehouse for ETL data loads.
  • Deployed and optimized two tierJava, Pythonweb applications toAzure DevOps CI/CDto focus on development by using services such asReposto commit codes,Test Plansto unit test, deployApp Service,Azure Application Insightcollects health performance and usage data of teh process, stored artifacts in blob storages.
  • Managed Azure InfrastructureAzure Web Roles, Worker Roles,SQL Azure, Azure Storage, Azure AD Licenses. Virtual Machine Backup and Recover from a Recovery Services Vault usingAzurePowerShell and Portal.
  • DevelopedStarandSnowflake schemasbased dimensional model growing teh data warehouse.
  • Used HTML, CSS, Bootstrap, AJAX, JSON designed and developed teh user interface of teh website.
  • Refactor Restful APIs and Django modules to deliver certain format of data.
  • Worked on performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Worked on troubleshooting spark application to make them more error tolerant.

Environment: Python, Spark SQL, Pyspark, MS Azure, SQL AZURE, Azure Storage, Azure Data Warehouse, Snowflake, HTML, CSS.

Confidential, Newjersey

Azure Data Engineer

Responsibilities:

  • Worked on building modern data pipelines to ingest, process, transform, and model data into Cerner's Corporate Data Platform. Participate in architecture discussions, design sessions and code reviews to review for functional correctness, architectural maintainability, and performance. Use big data tools to design batch and streaming feature pipelines and pipelines to populate a data lake.Develop and document teh strategy to maintain and support teh platform, pipelines, and tooling.Worked on Kafka APIs to ingest teh data into azure data lake.Created ETL pipelines from corporate systems using a combination of Azure Data Factory, Azure Data Lake, Azure Databricks, Snowflake and data analytics. Created clear, well-constructed strategic technical designs for large or complex scope projects with assistance.Worked on python and spark for transforming teh data in data lake.Worked on GIT, GitHub for version control and deployment.

Environment: Python, Spark SQL, Pyspark, Azure, Data Lake, Azure Data Factory, Databricks, Azure Storage, Azure Data Warehouse, Kafka, Teradata, Snowflake.

Confidential

Big Data Engineer

Responsibilities:

  • Expertly handled teh stream processing and storage of data to feed into teh HDFS systems using Apache Spark, Sqoop.
  • Experience in DevelopingPySparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
  • Created Pipelines inADFusingDatasets/Pipeline/ to Extract, Transform, and load data from different sources likeAzure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in InAzure Databricks.
  • Create Source to Target Mapping for teh SSIS Package Development. Design ETL/SSIS packages to process data from various sources to target databases. Create SQL Server Configurations, Performance Tuning of stored procedures, SSIS Packages.
  • Achieved Performance tuning using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
  • Developed Json Scripts for deploying teh Pipeline in Azure Data Factory (ADF) dat process teh data using teh Cosmos Activity.
  • Developing scripts for build, deployment, maintain and related task using Jenkins, Maven, Python, Bash.
  • Involved in various phases of teh project like Analysis, Design, Development, and Testing.
  • Developed rich user interface for teh driver using CSS, HTML, JavaScript and JQuery.
  • UsingDjangoFramework model, implementedMVCarchitecture and developed web applications with superb interface.
  • Involved in teh Load transformations for Teradata databases into HDFS using Sqoop.
  • Created Sqoop Jobs, Pig and Hive Scripts to perform data ingestion from relational databases and compared with teh historical data.
  • Involved in scheduling Hadoop jobs using Oozie workflow to organize events for High Data Availability.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Created Oozie workflow environment to import real-time data using Kafka into teh Hadoop system
  • Responsible for developing data pipeline using Flume, Sqoop and Pig to extract teh data from weblogs and store in HDFS.
  • Involved in teh development of Snowflake schema to Model teh data warehouse.
  • Used Power BI, Power Pivot to develop data analysis prototype, and used Power View and Power Map to visualize reports.
  • Developed SSIS packages bringing data from diverse sources such as Excel, SQL server, flat files, Oracle DB for teh daily load to create and maintain a centralized data warehouse.

Environment: Azure SQL, Blob storage, Azure SQL Data warehouse, Azure Data Lake, Azure Storage, HDFS, Hive, Spark, Scala, Spark-SQL, YARN, Flume, Sqoop, Pig, Kafka, Oozie, MySQL, Hive, MapReduce, SAS, XML, RC, Sequence, ETL, SSIS, Snowflake, CSS, HTML, JavaScript, Jquery, Django.

Confidential, Houston TX

Spark / Hadoop Developer

Responsibilities:

  • Working on Big Data infrastructure for batch processing as well as real-time processing. Responsible for building scalable distributed data solutions usingHadoop.
  • Involved in creating HiveTables, loading with data, and writingHivequeries which will invoke and run Map Reduce jobs in teh backend.
  • Involved in converting Hive/SQL queries into spark transformations using Spark RDD and PySpark concepts. Encoded and decoded json objects using PySpark to create and modify teh data frames in Apache Spark.
  • Experience in designing and developing applications inSpark using Scalato compare teh performance of Spark with Hive and SQL/Oracle.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs, PythonandScala.
  • Updated noledge onAmazon AWSconcepts likeEMR & EC2web services which provides fast and efficient processing ofBig Data.
  • Experience in bothSQLContextandSparkSession.
  • Experienced in working with sparkeco system using Spark SQL and Scalaqueries on different formats likeText file, CSV file.
  • Involved in gathering requirements from client and estimating timeline for developing complex queries using Hive.
  • Developed data pipeline using Flume, Pig, Sqoop to ingest data and customer histories into HDFS for analysis.
  • Experienced in Monitoring Cluster using Cloudera manager.
  • Extensively used AWS services S3 for storing data and EMR for resource intensive jobs.
  • Used GitHub as code repository and version control system.

Environment: MapReduce, Hadoop, HDFS, Scala, Python, EMR, EC2, S3, Hive, SQLContext, SparkSession, Spark RDD, Flume, Pig, Sqoop, PySpark.

Confidential

Hadoop/Java Developer

Responsibilities:

  • Designed, implemented, and tested clustered multi-tiered, e-Commerce products. Core technologies used include IIS, SQL Server, ASP, XML/XSLT, JSP, Tomcat, JavaBeans, and Java Servlets.
  • Developed teh XML Schema and Web services for teh data maintenance and structures.
  • Experienced in implementing different kind of joins tointegratedata from different data sets like Map and reduce side join.
  • Involved in loading data fromUNIXfile system toHDFS.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting teh dashboard.
  • Analyzed teh data by performing Hive queries and running Pig scripts to no user behavior.
  • Developed scripts to extract data from MySQL into HDFS.
  • Ran data formatting scripts in java & created terabyte csv files to be consumed by Hadoop MapReduce jobs.
  • Strong expertise on MapReduce programming model using XML, JSON, CSV file formats.
  • Experience in managing and reviewing Hadoop log files.
  • Job workflow scheduling and monitoring using tools likeOozie.
  • Implemented test scripts to support test driven development and continuous integration.
  • Extensive working noledge of partitioned table, UDFs, performance tuning compression-related properties, thrift server in Hive.
  • Followed Agile Methodology (TDD, SCRUM) to satisfy teh customers and wrote JUnit test cases for unit testing teh integration layer.

Environment: HDFS, Hadoop, Flume, Eclipse, SQL Server, Map Reduce, Hive, Pig, JavaBeans, Sqoop, Oozie, Zookeeper and NOSQL database.

We'd love your feedback!