Big Data Developer Resume
2.00/5 (Submit Your Rating)
Fairfax, VA
SUMMARY
- I am having 6+ Years of Professional work experience as a Data Analyst, Big Data Developer. With expertise in Hadoop and Spark and programming languages like Python, Scala, Java.
- Gathered information and requirements from the users, then documented in Business Requirement Document (BRD), and Functional Specification Document (FSD).
- Building Data pipelines using Sqoop, Flume and Kafka.
- Sqoop to transfer data between RDBMS and HDFS.
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Developed Kafka producer and consumers, to stream data from different sources to HDFS.
- Writing Web Scraping programs in Python, Java to extract data.
- Implemented complex MapReduce programs to perform map side joins using distributed cache.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats
- Responsible for creating Hive tables based on business requirements
- Implemented Partitioning, Dynamic Partitions and Buckets in Hive for efficient data access.
- Implemented UDFs, UDAFs, UDTFs in java and python for hive to process the data that can’t be performed using Hive inbuilt functions
- Converted Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
- Created and transformed RDD’s in Spark using Scala and Python.
- Using Spark SQL for analyzing the data.
- Using Spark Streaming to extract real time data from different sources.
- Effectively used Oozie & Airflow to schedule automatic workflows of Sqoop, MapReduce and Hive jobs.
- Applied Machine Learning Algorithms to the applications developed in Spark.
- Involved in NoSQL database design, integration and implementation.
- Loaded data into NoSQL database HBase for data processing.
- Created Search server using Solr, indexed the files and then queried using HTTP GET calls.
- Weekly meetings with technical collaborators and active participation in ETL code review sessions with senior and junior developers.
- Parsed high - level design specification to simple ETL coding and mapping standards.
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
- Developed front end web applications using Angular JS and backend web applications using Node JS
- Developed Android Application user interface using Android Studio
- Also used Informatica tool for cleaning, enhancing and protecting the data.
- Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
TECHNICAL SKILLS
- Hadoop
- Spark
- Hive
- Pig
- Sqoop
- Flume
- Kafka
- MySQL
- Hbase
- Nifi
- Linux Scripting.
- Spark Streaming
- Spark SQL
- Spark MLlib and Integrating Kafka and Spark
- JAVA
- SCALA
- PYTHON
- HDFS
PROFESSIONAL EXPERIENCE
Confidential, Fairfax, VA
Big Data developer
Responsibilities:
- Gathered information and requirements from the users, then documented in BRD, FSD.
- Using Kafka to build pipelines from different sources to HDFS.
- Written java Map-Reduce programs in AWS EMR to get the semi & un-structured data to structured data and to incorporate all the business transformations.
- Developing the process to move the output of map-reduce data to MarkLogic and HBase for analytics
- Used Informatica tool for cleaning, enhancing and protecting the data.
- Performed Hive Queries to analyze the data in HDFS and to identify issues.
- Worked on shell scripting to automate jobs.
- Involved in building Hadoop Cluster.
- Configured Hive MetaStore to use Oracle/MySQL database for establishing multiple connections.
- Experience in retrieving data from MySQL and Oracle databases into HDFS using Sqoop and ingesting them into HBase for data processing.
- Responsible for deriving the new requirements based on business data driven method for ETL applications.
- Created Search server using Solr, indexed the files and then queried using HTTP GET calls
- Used Oozie & Airflow to schedule automatic workflows in Hadoop Ecosystem.
- Writing web scraping programs using python, java
- Using PySpark SQL and Streaming for querying and analyzing real time data.
- Writing Spark Programs in Python, Scala.
- Building PySpark Models by using Machine Learning Algorithms.
Environment: HDFS, MapReduce, Hive, Kafka, Spark 2.1.0, Java, Python, Scala, MySQL, HBase, Oracle, Sqoop, MarkLogic, Informatica
Confidential
Software Engineer- Oracle Data Cloud
Responsibilities:
- Obtain requirements from business SME’s, documentation of system requirements, create data models, reporting specifications and test plans/cases.
- Develop the custom map-reduce programs to get the semi-structured data to structured data and to incorporate all the business transformations.
- Developing the process to move the output of map-reduce data to MarkLogic and HBase for analytics.
- Used Informatica tool for cleaning, enhancing and protecting the data.
- Design the new modules to support the market share project
- Develop the market share project
- Modify the existing process as part of change request or fixing the identified issues.
- Participating the re-processing in case of any major changes.
- Participated and implemented the security to anonymous the sensitive customer data.
- Building data pipelines using Kafka and loading the data into HDFS.
- Creating the Hbase tables and design HDFS data models to optimize the store.
- Data enrichment project data by integrating with registrations or other telemetry data.
- Building Spark Applications using Spark SQL, Streaming libraries.
- Writing Scala and Python programs to build Spark Models.
- Developing web scraping (python) programs to pull the data from retail stores.
- Processing the scrapped data to perform the sentiment analysis of a printer and cartridge.
Environment: Hadoop, Hive, Kafka, Spark 1.6.0, Spark SQL, Spark Streaming, Java, Python, Scala, MySQL, HBase, Oracle, Informatica
Confidential
BigData Developer, BI Data Modeler
Responsibilities:
- Obtain requirements from business SME’s, documentation of system requirements, create data models, reporting specifications and test plans/cases.
- Work with off shore and near shore developers (India, Brazil) to communicate requirements in the form of design documents.
- Develop project schedule for the reporting track, estimates and work break down structures.
- Develop the project framework consisting of the ad-hoc environment and the objects that are leveraged for reports.
- Develop the ETL framework and mapping sheets.
- Perform testing of the deliverables to ensure conformance to requirements and design.
- Perform status of deliverables to team, business and management.
- Developing all the architecture documents like Application design document, System Design Document
- Creating the tables, Index, sequences, Procedures, Packages, Views, Materialized views, Partitions and Performance tuning, AWR report, analyzing the tables, collecting the statistics
- Establish the HADOOP cluster to archive the historical data to Hadoop Cluster and helping the analyst to SQL through Hive. Writing the java map-reduce programs to get the aggregated data for data warehouse.
- POC on the Vehicle testing video data to the MongoDB and catalog data to MongoDB for web services.
Environment: Hadoop, Java, MongoDB, Hive, SQL, MapReduce.