We provide IT Staff Augmentation Services!

Cloud Data Engineer Resume

0/5 (Submit Your Rating)

Omaha, NE

SUMMARY

  • Around 8 years of professional experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using and Big Data/Cloud technologies.
  • Proficiency in Azure Cloud Stack Services like Azure AD Domain Configuration, Azure Data Factory V2, maintaining and troubleshooting Azure Data Lake Storage (ADLS), Azure DevOps, Azure Functions, Azure Data Lake Analytics, Azure SQL DW, Polybase T - SQL queries.
  • Good exposure on usage of NoSQL databases like Cassandra.
  • Extensive experienced in working with structured data using Hive QL, join operations.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Experienced in job workflow scheduling and monitoring tools like Oozie.
  • Experienced in migrating map reduce programs into Spark RDD transformations, actions to improve performance.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks) to fully implement and leverage new Hadoop features.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
  • Experience on Source control repositories using Azure DevOps.
  • Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
  • Adequate knowledge and working experience in Agile & Waterfall methodologies.
  • Excellent problem solving, and analytical skills.
  • Hands on experience on writing complex shell scripts for custom scenarios.
  • Good knowledge on security principles while configuring security for using Cloudbreak in Azure environment.
  • Hands-on experience on various Azure Cloud Services like Azure AD Domain Configuration, Azure Data Factory V2, maintaining and troubleshooting Azure Data Lake Storage (ADLS), Azure DevOps, Azure Functions, Azure Data Lake Analytics, Azure SQL DW, Polybase T-Sql queries.
  • Good knowledge on Hive performance optimizations like partitioning, bucketing and perform several types of joins on Hive tables and implementing Hive Serdes like JSON and Avro.
  • Reviewing application architectures for better understanding of the dependencies, file formats, types of data, tools, service-accounts etc.., i.e. important factors in order to migrate the apps to HDP platform.
  • Application Software - Worked on Horton works (HDP 2.3 and HDP 2.1), Cloudera (CDH3, CDH4, CDH5) on Linux.
  • Installed and configured applications, operating systems on mobile devices (laptops, iPads)
  • Capacity to troubleshoot and resolve a variety of PC hardware and software issues (respond to error codes, beep codes)
  • Installed and configured domain name system (DNS) for active directory.
  • Installed and configured the active directory infrastructure (created user accounts, groups, organizational units)
  • Creating and maintaining active directory objects (implemented group polices objects, linked GPOs to the respective organizational units, groups, departments)
  • Planning and configuring the root domain, child domains, DNS, DHCP, implemented NAT technology to connect to the internet through our given subnet during the team lab Assigned permissions for the users and groups created using a batch file through the delegation control, setting up trusts relationships with other forests.
  • Installed replica server, backup server for the root server during the team lab for redundancy.

PROFESSIONAL EXPERIENCE

Confidential, Omaha, NE

Cloud Data Engineer

Responsibilities:

  • Design and Development of Data Integration processes right from ingestion of data to its consumption for Analytical and Operational needs.
  • Worked on designing and developing a reusable, robust framework for ADW layer on top of Data Lake using DataBricks.
  • Collaborating with business to create and maintain Logical and Physical Data Models for the Enterprise Data Warehouse in Azure SQL Database utilizing Dimensional Modeling techniques and best practices.
  • Ensure adherence of all deliverables to appropriate coding standards and best practices by conducting periodic Design & Code Reviews.
  • Used Azure Data Factory V2 to integrate with OnPrem SQl Server's and scheduled data import to Azure Data Lake Store.
  • Designing and implementing the strategy to apply Data Quality in Azure
  • Built Data pipelines in Azure DevOps and automated to run daily using Tidal jobs.
  • Organizing data in the Data Lake by designing some basic organizational structures up front for ease of discovery and optimal data retrieval.
  • Was involved in designing and developing a data driven Controls framework between different layers (OnPrem-SQL, AzureDataLake, AzureDataWarehouse) of the project to prove Data Integrity
  • Implemented a hybrid cloud solution using Azure Databricks to support noncritical batch analytics.
  • Worked on creating a baseline POCs for different Data warehousing scenario's using stored procs on MS SQL DW on Azure environment.

Environment: Microsoft Azure Cloud, DataBricks, Tidal, Azure SQL DB, Azure Synapse, Azure Data Factory, Azure DevOps, Visual Studio.

Confidential, Plano, TX

Azure/Data Engineer

Responsibilities:

  • Worked on Schema mapping/Conversion as part of Integration and Consumption team.
  • Generated reports for Business team buy identifying Critical Data Elements.
  • Implemented CDC process.
  • Implemented SCD Type 1, 2, 3 to build Historical and Current version tables.
  • Used Azure Data Factory V2 to integrate with OnPrem cluster and scheduled data import to Azure Data Lake Store.
  • Used Python, U-SQL, PowerShell to process data in ADLS.
  • Built Data pipelines in Azure Runbook and automated to run daily.
  • Worked on designing and developing a reusable, robust framework for Consumption layer on top of Data Lake using Apache Spark
  • Was involved in designing and developing a data driven Controls framework between different layers (Ingestion, Integration, Consumption and BSIG) of the project to prove Data Integrity
  • Implemented a hybrid cloud solution using Azure Databricks to support noncritical batch analytics.
  • Worked on creating a baseline POCs for different Data warehousing scenario's using stored procs on MS SQL DW on Azure environment.
  • Created API's for front end's dev's Integration from the underlying Hadoop Data and persisted the data using transformations using Azure Data Factory.
  • Acquired business, domain knowledge from respective teams of Confidential and developed data Consumption views with the CDE’s (critical data elements) using Hive.

Environment: Microsoft Azure Cloud, Cloudera Data Platform (CDP), Hadoop, HDFS, Spark, Hive, Apache Spark.

Confidential, Neenah, WI

Hadoop Engineer

Responsibilities:

  • Worked on installing cluster, commissioning and decommissioning of Data Nodes, Name Node recover, capacity planning in the cloud environment (Microsoft Azure).
  • Managed Hadoop cluster with 29 nodes having HDP(Hortonworks) distribution using Ambari and HDP 2.6 leveraging the cloud environment from Microsoft Azure.
  • Used a tool called Cloudbreak for provisioning and managing Apache Hadoop clusters in the cloud (Microsoft Azure). Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure.
  • Monitored cluster for performance, networking and data integrity issues. Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Formulated procedures for installation of Hadooppatches, updates and version upgrades.
  • Installed and configured Tableau Desktop to connect to the Hortonworks Hive Framework (Database) which contains the click stream data from Mixpanel and Google Analytics.
  • Used Apache Nifi for ingestion of data from Mixpanel API on to HDFS in raw JSON format.
  • Developed optimal strategies for distributing the click stream data over the cluster by importing the data into HDFS through connecting to the Mixpanel, Google Analytics API.
  • Developed custom shell scripts to connect to the Mixpanel and Google Analytics API and used Crontab for scheduling purposes.
  • Designed and implementedHive queries and functions for evaluation, filtering, loading and storing of data.
  • Developed hive tables on top of the consumed JSON data from Mixpanel API and stored them in ORC format for optimized querying in tableau.
  • Used custom shell scripts to convert the Google Analytics data format(dic) to JSON and then dumped it on HDFS for further analytics.
  • Worked Hive database to provide both Historical and live clickstream data from Mixpanel and Google Analytics to tableau for historical and live reporting.

Environment: Hortonworks Data Platform (HDP), Hortonworks Data Flow (HDF), Hadoop, HDFS, Spark, Hive, MapReduce, Apache Nifi, Tableau Desktop, Linux, Microsoft Azure, Cloudbreak.

Confidential, Jacksonville, FL

Big Data Developer

Responsibilities:

  • Collected and aggregated large amounts of data from different sources such as COSMA ( Confidential Onboard System Management Agent), BOMR (Back Office Message Router), ITCM (Interoperable train control messaging), Onboard mobile and network devices from the PTC (Positive Train Control) network using Apache Nifi and stored the data into HDFS for analysis.
  • Used Apache Nifi for ingestion of data from the IBM MQ’s (Messages Queue).
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Developed Java Map Reduce programs on ITCM log data to transform into structured way.
  • Developed optimal strategies for distributing the ITCM log data over the cluster; importing and exporting the stored log data into HDFS and Hive using Apache Nifi.
  • Developed custom code to read messages of IBM MQ and to dump them onto the Nifi Queues.
  • Worked with Apache Nifi flow to perform conversion of Raw XML data into JSON, AVRO.
  • Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
  • Configuring Spark Streaming to receive real time data from IBM MQ and store the stream data to HDFS.
  • Analyzed the Bandwidth data from the locomotive using the HiveQL to extract the Bandwidth consumed by each locomotive in a day using different carriers AT&T, Verizon or Wi-Fi.
  • Designed and implemented Hive queries and functions for evaluation, filtering, loading and storing of data.
  • Installed and configured Tableau Desktop to connect to the Hortonworks Hive Framework (Database) which contains the Bandwidth data from the locomotive through the Hortonworks ODBC connector for further analytics of the data.
  • Collected and provided locomotive communication usage data by locomotive, channel, protocol and by application.
  • Analyzed the Locomotive Communication Usage from COSMA to monitor in/out-bound traffic bandwidth by communication channel.
  • Worked on back-end Hive database to provide both Historical and live Bandwidth data from locomotives to tableau for historical and live reporting.

Environment: Hortonworks Data Platform (HDP), Hortonworks Data Flow (HDF), Hadoop, HDFS, Spark, Hive, MapReduce, Apache Nifi, Tableau Desktop, Linux.

Confidential, Houston, TX

Big Data Systems Engineer

Responsibilities:

  • Installed and configured a three-node cluster with Hortonworks Data Platform (HDP 2.3) on the HP infrastructure and Management.
  • Worked with HP Intelligent provisioning and the smart storage array for setting up the disks for the installation.
  • Used a Big Data Benchmark tool called BigBench to benchmark the three-node cluster.
  • Configured the tool BigBench and had it running on one of the nodes in the cluster.
  • Ran the Benchmark for different Datasets of 5GB, 10GB, 50 GB, 100 GB and 1 TB.
  • Worked with structured, semi-structured and unstructured data which is automated in the tool BigBench having to run with the workloads using Spark’s machine learning libraries.
  • Configured a PAT (Performance Analysis Tool) for having the benchmark results dumped into the automated charts using MS-Excel.
  • Collected the performance metrics from Hadoop nodes, to analyze the resource utilization and draw automated charts using MS-Excel, a Performance Analysis Tool (PAT) was used.
  • Worked with various performance monitoring tools like top, dstat, atop and also Ambari metrics.
  • Collected the results from the different Datasets (5GB, 10GB, 50GB, 100GB and 1TB) tests on the Server and was able to dump them on to the PAT (Performance Analysis Tool) for further analyzing the resource utilization.
  • Had a chance to work with HPE insight CMU (Cluster Management Utility) for managing the cluster and also HPE Vertica for SQL on Hadoop.
  • Worked on configuring the performance tuning parameters used during the benchmark.
  • Used Tableau Desktop for creating visual Dashboards of CPU utilization, Disk IO, Memory, Network IO and Query Times obtained from the PAT (Performance Analysis tool) automated charts using MS-Excel.
  • Had the results obtained from the benchmark output in terms of automated charts being dumped into Tableau Desktop for further data analytics.
  • Installed and configured Tableau Desktop on one of the three nodes to connect to the Hortonworks Hive Framework (Database) through the Hortonworks ODBC connector for further analytics of the cluster.

Environment: Hortonworks Data Platform (HDP), Hadoop, HDFS, Spark, Hive, MapReduce, BigBench, Tableau Desktop, Linux.

We'd love your feedback!