Big Data developer Resume Fairfax, VA - Hire IT People

SUMMARY

I am having 6+ Years of Professional work experience as a Data Analyst, Big Data Developer. With expertise in Hadoop and Spark and programming languages like Python, Scala, Java.
Gathered information and requirements from the users, then documented in Business Requirement Document (BRD), and Functional Specification Document (FSD).
Building Data pipelines using Sqoop, Flume and Kafka.
Sqoop to transfer data between RDBMS and HDFS.
Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
Developed Kafka producer and consumers, to stream data from different sources to HDFS.
Writing Web Scraping programs in Python, Java to extract data.
Implemented complex MapReduce programs to perform map side joins using distributed cache.
Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats
Responsible for creating Hive tables based on business requirements
Implemented Partitioning, Dynamic Partitions and Buckets in Hive for efficient data access.
Implemented UDFs, UDAFs, UDTFs in java and python for hive to process the data that can’t be performed using Hive inbuilt functions
Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Created and transformed RDD’s in Spark using Scala and Python.
Using Spark SQL for analyzing the data.
Using Spark Streaming to extract real time data from different sources.
Effectively used Oozie & Airflow to schedule automatic workflows of Sqoop, MapReduce and Hive jobs.
Applied Machine Learning Algorithms to the applications developed in Spark.
Involved in NoSQL database design, integration and implementation.
Loaded data into NoSQL database HBase for data processing.
Created Search server using Solr, indexed the files and then queried using HTTP GET calls.
Weekly meetings with technical collaborators and active participation in ETL code review sessions with senior and junior developers.
Parsed high - level design specification to simple ETL coding and mapping standards.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Developed front end web applications using Angular JS and backend web applications using Node JS
Developed Android Application user interface using Android Studio
Also used Informatica tool for cleaning, enhancing and protecting the data.
Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.

TECHNICAL SKILLS

Hadoop
Spark
Hive
Pig
Sqoop
Flume
Kafka
MySQL
Hbase
Nifi
Linux Scripting.
Spark Streaming
Spark SQL
Spark MLlib and Integrating Kafka and Spark
JAVA
SCALA
PYTHON
HDFS

PROFESSIONAL EXPERIENCE

Confidential, Fairfax, VA

Big Data developer

Responsibilities:

Gathered information and requirements from the users, then documented in BRD, FSD.
Using Kafka to build pipelines from different sources to HDFS.
Written java Map-Reduce programs in AWS EMR to get the semi & un-structured data to structured data and to incorporate all the business transformations.
Developing the process to move the output of map-reduce data to MarkLogic and HBase for analytics
Used Informatica tool for cleaning, enhancing and protecting the data.
Performed Hive Queries to analyze the data in HDFS and to identify issues.
Worked on shell scripting to automate jobs.
Involved in building Hadoop Cluster.
Configured Hive MetaStore to use Oracle/MySQL database for establishing multiple connections.
Experience in retrieving data from MySQL and Oracle databases into HDFS using Sqoop and ingesting them into HBase for data processing.
Responsible for deriving the new requirements based on business data driven method for ETL applications.
Created Search server using Solr, indexed the files and then queried using HTTP GET calls
Used Oozie & Airflow to schedule automatic workflows in Hadoop Ecosystem.
Writing web scraping programs using python, java
Using PySpark SQL and Streaming for querying and analyzing real time data.
Writing Spark Programs in Python, Scala.
Building PySpark Models by using Machine Learning Algorithms.

Environment: HDFS, MapReduce, Hive, Kafka, Spark 2.1.0, Java, Python, Scala, MySQL, HBase, Oracle, Sqoop, MarkLogic, Informatica

Confidential

Software Engineer- Oracle Data Cloud

Responsibilities:

Obtain requirements from business SME’s, documentation of system requirements, create data models, reporting specifications and test plans/cases.
Develop the custom map-reduce programs to get the semi-structured data to structured data and to incorporate all the business transformations.
Developing the process to move the output of map-reduce data to MarkLogic and HBase for analytics.
Used Informatica tool for cleaning, enhancing and protecting the data.
Design the new modules to support the market share project
Develop the market share project
Modify the existing process as part of change request or fixing the identified issues.
Participating the re-processing in case of any major changes.
Participated and implemented the security to anonymous the sensitive customer data.
Building data pipelines using Kafka and loading the data into HDFS.
Creating the Hbase tables and design HDFS data models to optimize the store.
Data enrichment project data by integrating with registrations or other telemetry data.
Building Spark Applications using Spark SQL, Streaming libraries.
Writing Scala and Python programs to build Spark Models.
Developing web scraping (python) programs to pull the data from retail stores.
Processing the scrapped data to perform the sentiment analysis of a printer and cartridge.

Environment: Hadoop, Hive, Kafka, Spark 1.6.0, Spark SQL, Spark Streaming, Java, Python, Scala, MySQL, HBase, Oracle, Informatica

Confidential

BigData Developer, BI Data Modeler

Responsibilities:

Obtain requirements from business SME’s, documentation of system requirements, create data models, reporting specifications and test plans/cases.
Work with off shore and near shore developers (India, Brazil) to communicate requirements in the form of design documents.
Develop project schedule for the reporting track, estimates and work break down structures.
Develop the project framework consisting of the ad-hoc environment and the objects that are leveraged for reports.
Develop the ETL framework and mapping sheets.
Perform testing of the deliverables to ensure conformance to requirements and design.
Perform status of deliverables to team, business and management.
Developing all the architecture documents like Application design document, System Design Document
Creating the tables, Index, sequences, Procedures, Packages, Views, Materialized views, Partitions and Performance tuning, AWR report, analyzing the tables, collecting the statistics
Establish the HADOOP cluster to archive the historical data to Hadoop Cluster and helping the analyst to SQL through Hive. Writing the java map-reduce programs to get the aggregated data for data warehouse.
POC on the Vehicle testing video data to the MongoDB and catalog data to MongoDB for web services.

Environment: Hadoop, Java, MongoDB, Hive, SQL, MapReduce.

We provide IT Staff Augmentation Services!

Big Data Developer Resume

Fairfax, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship