We provide IT Staff Augmentation Services!

Big Data Developer Resume

SUMMARY:

Looking for opportunities in Big data and Data Analytics.

SKILL:

Apache Spark, Scala, SQL, Java, Python, R, HDFS, Java Map Reduce, Hive, Intellij, Kusto, Azure HDInsight, Hadoop, Databricks, VSTS, RStudio, Orange Toolkit, GIT, Microsoft Azure ML Studio

EXPERIENCE:

BIG DATA DEVELOPER

Confidential

Responsibilities:

  • Extended feature to Reserve Instances for additional services - Sql Databases, SUSE Linux and Cosmos DB features.
  • Implemented the core of allocating usage based on its type which is identified by resource provider that emitted the usage and apply discount accordingly.
  • Scheduled and orchestrated the process using Azure Data Factories.
  • Set up monitors to ensure Data Quality using Sql Store Procs.
  • Actively involved in unit testing, integration testing and real time testing to ensure the sanity and system requirements.
  • Involved in resolving incidents as part of scheduled on-call system to keep the system free from errors and reduce latency. Also, raise tickets against services or teams if encountered any issue.

PROGRAMMER ANALYST

Confidential

Responsibilities:

  • Reserve Instance is a service provided to the Microsoft Azure users to save up on billing with an upfront 1 or 3 year commitment, whereas Free Tier is a special offer given to new users for the first one year to use certain services at a discounted rate.
  • Usage is captured using Microsoft EventHub and Spark streaming to near real time, ETL is performed and data is processed in batches using Apache Spark, Scala, Hql and clustered processing using Microsoft HDInsight. Benefits are applied accordingly and sent to billing team to calculate the billing cost after discount is applied.
  • Migrated Scala code base of Reserve Instance to AutoSpark, a framework that converts HQL to Spark SQL.
  • Implemented the core of allocating usage based on eligibility as Reserved Instance or Free Tier benefits or both and set up alerts in case of under allocation to avoid overcharging the customers.
  • Scheduled and orchestrated the process using Azure Data Factories.
  • Set up monitors to ensure Data Quality using Sql Store Procs.
  • Actively involved in unit testing, integration testing and real time testing to ensure the sanity and system requirements.
  • Recommendation Engine provides information to customers with details of savings the customer would have made if instances were reserved based on past usage and recommend the same. Currently, Recommendation is provided only for Virtual Machines. This information is projected in Microsoft Azure Advisor and Microsoft Customer Portal.
  • Implemented the end to end feature which involves various levels of aggregation, cost calculation and estimating savings. Metrics are processed and calculated using Apache Spark and cluster computing using Azure HDInsight
  • Set up monitors to check Data Quality using Sql Store Procs.
  • Performed analysis on the processed output using Kusto tool, for Data Quality or analysis based on Customer complaint or questions regarding the recommended data provided, and sanity checks.
  • Captured and stored router packets using Wireshark in PCAP format.
  • Converted PCAP data to CSV and XML format for processing data using Spark Dataframes and stored the results in HIVE using TSHARK commands.
  • Established an ODBC connection between Hadoop technology
  • Hive and PowerBI for reporting and analytics, assuming HIVE resides in server and PowerBI in client/local system.

Hire Now