We provide IT Staff Augmentation Services!

Big Data Developer Resume

2.00 Rating

Fairfax, VA

SUMMARY

  • I am having 6+ Years of Professional work experience as a Data Analyst, Big Data Developer. With expertise in Hadoop and Spark and programming languages like Python, Scala, Java.
  • Gathered information and requirements from the users, then documented in Business Requirement Document (BRD), and Functional Specification Document (FSD).
  • Building Data pipelines using Sqoop, Flume and Kafka.
  • Sqoop to transfer data between RDBMS and HDFS.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Developed Kafka producer and consumers, to stream data from different sources to HDFS.
  • Writing Web Scraping programs in Python, Java to extract data.
  • Implemented complex MapReduce programs to perform map side joins using distributed cache.
  • Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats
  • Responsible for creating Hive tables based on business requirements
  • Implemented Partitioning, Dynamic Partitions and Buckets in Hive for efficient data access.
  • Implemented UDFs, UDAFs, UDTFs in java and python for hive to process the data that can’t be performed using Hive inbuilt functions
  • Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Created and transformed RDD’s in Spark using Scala and Python.
  • Using Spark SQL for analyzing the data.
  • Using Spark Streaming to extract real time data from different sources.
  • Effectively used Oozie & Airflow to schedule automatic workflows of Sqoop, MapReduce and Hive jobs.
  • Applied Machine Learning Algorithms to the applications developed in Spark.
  • Involved in NoSQL database design, integration and implementation.
  • Loaded data into NoSQL database HBase for data processing.
  • Created Search server using Solr, indexed the files and then queried using HTTP GET calls.
  • Weekly meetings with technical collaborators and active participation in ETL code review sessions with senior and junior developers.
  • Parsed high - level design specification to simple ETL coding and mapping standards.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Developed front end web applications using Angular JS and backend web applications using Node JS
  • Developed Android Application user interface using Android Studio
  • Also used Informatica tool for cleaning, enhancing and protecting the data.
  • Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.

TECHNICAL SKILLS

  • Hadoop
  • Spark
  • Hive
  • Pig
  • Sqoop
  • Flume
  • Kafka
  • MySQL
  • Hbase
  • Nifi
  • Linux Scripting.
  • Spark Streaming
  • Spark SQL
  • Spark MLlib and Integrating Kafka and Spark
  • JAVA
  • SCALA
  • PYTHON
  • HDFS

PROFESSIONAL EXPERIENCE

Confidential, Fairfax, VA

Big Data developer

Responsibilities:

  • Gathered information and requirements from the users, then documented in BRD, FSD.
  • Using Kafka to build pipelines from different sources to HDFS.
  • Written java Map-Reduce programs in AWS EMR to get the semi & un-structured data to structured data and to incorporate all the business transformations.
  • Developing the process to move the output of map-reduce data to MarkLogic and HBase for analytics
  • Used Informatica tool for cleaning, enhancing and protecting the data.
  • Performed Hive Queries to analyze the data in HDFS and to identify issues.
  • Worked on shell scripting to automate jobs.
  • Involved in building Hadoop Cluster.
  • Configured Hive MetaStore to use Oracle/MySQL database for establishing multiple connections.
  • Experience in retrieving data from MySQL and Oracle databases into HDFS using Sqoop and ingesting them into HBase for data processing.
  • Responsible for deriving the new requirements based on business data driven method for ETL applications.
  • Created Search server using Solr, indexed the files and then queried using HTTP GET calls
  • Used Oozie & Airflow to schedule automatic workflows in Hadoop Ecosystem.
  • Writing web scraping programs using python, java
  • Using PySpark SQL and Streaming for querying and analyzing real time data.
  • Writing Spark Programs in Python, Scala.
  • Building PySpark Models by using Machine Learning Algorithms.

Environment: HDFS, MapReduce, Hive, Kafka, Spark 2.1.0, Java, Python, Scala, MySQL, HBase, Oracle, Sqoop, MarkLogic, Informatica

Confidential

Software Engineer- Oracle Data Cloud

Responsibilities:

  • Obtain requirements from business SME’s, documentation of system requirements, create data models, reporting specifications and test plans/cases.
  • Develop the custom map-reduce programs to get the semi-structured data to structured data and to incorporate all the business transformations.
  • Developing the process to move the output of map-reduce data to MarkLogic and HBase for analytics.
  • Used Informatica tool for cleaning, enhancing and protecting the data.
  • Design the new modules to support the market share project
  • Develop the market share project
  • Modify the existing process as part of change request or fixing the identified issues.
  • Participating the re-processing in case of any major changes.
  • Participated and implemented the security to anonymous the sensitive customer data.
  • Building data pipelines using Kafka and loading the data into HDFS.
  • Creating the Hbase tables and design HDFS data models to optimize the store.
  • Data enrichment project data by integrating with registrations or other telemetry data.
  • Building Spark Applications using Spark SQL, Streaming libraries.
  • Writing Scala and Python programs to build Spark Models.
  • Developing web scraping (python) programs to pull the data from retail stores.
  • Processing the scrapped data to perform the sentiment analysis of a printer and cartridge.

Environment: Hadoop, Hive, Kafka, Spark 1.6.0, Spark SQL, Spark Streaming, Java, Python, Scala, MySQL, HBase, Oracle, Informatica

Confidential

BigData Developer, BI Data Modeler

Responsibilities:

  • Obtain requirements from business SME’s, documentation of system requirements, create data models, reporting specifications and test plans/cases.
  • Work with off shore and near shore developers (India, Brazil) to communicate requirements in the form of design documents.
  • Develop project schedule for the reporting track, estimates and work break down structures.
  • Develop the project framework consisting of the ad-hoc environment and the objects that are leveraged for reports.
  • Develop the ETL framework and mapping sheets.
  • Perform testing of the deliverables to ensure conformance to requirements and design.
  • Perform status of deliverables to team, business and management.
  • Developing all the architecture documents like Application design document, System Design Document
  • Creating the tables, Index, sequences, Procedures, Packages, Views, Materialized views, Partitions and Performance tuning, AWR report, analyzing the tables, collecting the statistics
  • Establish the HADOOP cluster to archive the historical data to Hadoop Cluster and helping the analyst to SQL through Hive. Writing the java map-reduce programs to get the aggregated data for data warehouse.
  • POC on the Vehicle testing video data to the MongoDB and catalog data to MongoDB for web services.

Environment: Hadoop, Java, MongoDB, Hive, SQL, MapReduce.

We'd love your feedback!