We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

Piscataway, NJ

SUMMARY:

  • Big Data Hadoop developer and Java developer with solid working experience.
  • Proficiency in developing Scala Spark program to perform large scale data collection, data mining, data wrangling, data transformation, data integration and data quality. Strong proficiency in implementing and optimizing MapReduce programs to support big data ETL procedures
  • Hands on experience in design and implement distributed data processing pipelines using Spark, Hive, HDFS, Scala and other tools in B ig Data Hadoop Ecosystem. Experience with real - time Distributed Stream Processing frameworks for Fast & Big Data like Spark Streaming, Kafka, Storm, Sqoop, Hive, HBase, Flume. Extensive experience in administrating and tuning Kafka/Spark applications. Hands on experience in developing data pipeline in Lambda architecture using Kafka, Flume
  • Practical experience in implementing multi-threading and concurrency framework
  • Working experience in Hadoop Distribution including Cloudera and AWS
  • Proficient in writing and optimizing HiveQL queries to achieve data manipulation. Hands on experience in data ingestion between HDFS/NoSQL and RDBMS using Sqoop. Working experience with Database including MySQL 5.x, HBase 0.98+, MongoDB 2.4
  • Expert in developing methods, design and evaluation criteria in full backend stack development framework such as Spring Boot, Spring MVC, Hibernate, Node.js, back-end development technology such as JSP, Servlet, JDBC, and strong experience with RESTful APIs design.
  • Experience with front-end development framework Angular, React and front-end development technology such as HTML, CSS, JavaScript, Bootstrap, jQuery, AJAX.
  • Knowledge on serialization formats like Sequence File, Avro, Parquet
  • Experienced in descriptive and predictive analysis with Microsoft Excel, Impala, Hive
  • Hands on experience in unit testing such as JUnit, ScalaTest.
  • Hands on experience in solving software design issues by applying design patterns including Singleton Pattern, Business Delegator Pattern, Controller Pattern, MVC Pattern, Factory Pattern, Abstract Factory Pattern, DAO Pattern and Template Pattern.
  • Self-driven goal getter, excellent communication skills in collaborative team and have motivations to take independent responsibility.

TECHNICAL SKILLS:

Backend Development / Hadoop/Spark Ecosystem: AJAX, Sprin5.0.4, Spring MVC, Spring Boot, \ Hadoop 2.7.3, MapReduce, HDFS, Yarn, \ Hibernate 5.3.1, Node.js, Express.js .16.4, \ Spark Core 1.6.2, Spark SQL, Spark Streaming, PHP, ASP.Net 4.7.1, RESTful API, GSON, \ Hive 1.2.1, Kafka 0.10.X, Storm, Sqoop, HBase\ Bootstrap Resample, Servlet, JSP, JDBC, \ JUnit, ScalaTest\

Frontend Development/ Programming Languages: Angular JS 1.6.9, HTML 5, CSS 3, React, \ Java 8, Scala 2.11.X, Python 2.7/3.6, R\ jQuery 2.0.2, Bootstrap 3, \ PHP 5, JavaScript\

Database \ Version Control: MySQL 5.5.51, MongoDB, HBase\ Git, JIRA\

PROFESSIONAL EXPERIENCE:

Confidential, Piscataway, NJ

Big Data Developer

Responsibilities:

  • Implemented and supported in big data ETL procedure using Kafka, Spark and Big Data APIs.
  • Developed and optimized multi-thread scripts using Kafka producer and consumer API.
  • Applied Spark using Scala to do the data batch processing and store the output in HBase for scalable storage and fast query.
  • Implemented Hive custom UDFs and Analyzed large data sets by running HiveQL to achieve comprehensive data analysis
  • Used NiFi to ingest data from various data sources to NoSQL database/ HBase.
  • Created multiple Hive tables with partitioning and bucketing for efficient data access.
  • Used Hive and Impala for batch reporting and managed HDFS long term storage.
  • Worked with application development team to develop Web Service Application with Java Spring MVC and Hibernate.
  • Used Git for version control, JIRA for project tracking, Confluence for documentation collaboration.
  • Actively participated and provided feedback constructively during daily Stand up meetings and weekly Iterative review meetings

Environment: Hadoop 2.X, Linux, UNIX Shell, Kafka 0.10.X, Scala 2.11.X, Hive 0.14, Sqoop 1.4.5, HDFS, Spark 2.X, HBase, MapReduce, Git, JIRA

Confidential, Pittsburgh, PA

Big Data Developer

Responsibilities:

  • Designed, developed, implemented, testing and maintenance of data ingestion and integration ETL pipelines including Kafka, Spark Streaming to identify user’s behavior pattern and improve marketing strategy.
  • Used Flume to monitor and collect real-time log data from the users’ requests and sink log-data into Message Queue of the Kafka.
  • Used Spark Streaming to do real-time streaming processing to the log data from the Message Queue, and process data with both stateless and stateful transformations with different log data.
  • Wrote UDFs by using Spark SQL and Spark Core to do ETL processes including data processing and data storage, transform processed data into tables and store them into Hive for scalable storage and fast query.
  • Designed and created of Hive tables and worked on various performance optimizations like Partition, Bucketing.
  • Implemented website interface of near real-time data analytics and visualization using Spring MVC by Java
  • Wrote unit test by using JUnit and ScalaTest to do Functionality Validation.

Environment: Hadoop 2.5, MapReduce, HDFS, Spark 1.6, Kafka 0.10.0.1, Hive 0.14, Sqoop 1.4.2, Flume 1.7.0, ETL, Zookeeper 3.4, Java, JUnit 4.8, ScalaTest

Confidential, Pittsburgh, PA

Full Stack Developer

Responsibilities:

  • Implemented a data pipeline which monitors, scrapes and dedupes latest news by MongoDB, Redis, Kafka .
  • Used Spark Core and Spark SQL to do process to the log data, wrote the cleaned data into numerical tables and saved them into Hive as Parque.t
  • Extracted recommendation contents from news scraper to users by TF-IDF;
  • Built a single-page web application for users to browse news by React, Node.js, RPC, SOA, JWT;
  • Designed and built an offline training pipeline for news topic modeling by Tensorflow;
  • Deployed an online classifying service for news topic modeling using the trained model.
  • Build the whole server in Ubuntu, installed and configured the Kafka, Spark, React and all the Python packages used in the project.
  • Used the GIT to do version control

Environment: Linux, UNIX Shell, Kafka 1.0, Spark 2.X, Python 3, MongoDB, React 16.X, Git

Confidential, Pittsburgh, PA

Hadoop Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used Hue for running Hive queries and created partitions according to day using Hive to improve performance
  • Worked on tuning the performance Hive queries by using map optimization and streaming techniques.
  • Experience with NoSQL databases like HBase, Cassandra
  • Implemented Kafka Connectors to move large datasets into and out of Kafka
  • Used Zookeeper for cluster coordination and monitoring.
  • Migrated of MapReduce jobs and Hive queries into Spark transformations and actions to improve the performance.
  • Hands on experience in Spark and Spark Streaming, Spark SQL by creating RDD's, applying operations -Transformation and Actions.
  • Developed Spark applications using Scala for easy Hadoop transitions
  • Involved in integration of Hadoop cluster with Spark engine to perform batching operations.
  • Worked on indexes, scalability and query language supporting using Cassandra.
  • Highly involved in development/implementation of Cassandra environment using different snitch types.
  • Created Sqoop scripts for importing data from different data sources to Hive and HDFS
  • Moved Relational Database data using Sqoop into Hive Dynamic Partitioned tables using Staging tables

Environment: Apache Spark 2.X, Kafka 1.X, Zookeeper, MapReduce, Cassandra, YARN, Sqoop, HDFS, Hive, Java, Hadoop distribution of Cloudera 5.9, Linux, MySQL

Confidential

Java Developer

Responsibilities:

  • Developed user interface using HTML , CSS3 and JavaScript for the presentation tier
  • Used JSP and JavaScript for encapsulating presentation for sales module
  • Developed Controller Servlet to handle all the request and MySQL database access.
  • Involved in integration with Spring and developing ORM using Hibernate
  • Installed and configured Apache Tomcat.
  • Deployed the application, supported and maintained regular functioning on server.

Environment: Java, Servlet 3.0, JSP 2.2, HTML, CSS3, JavaScript, Spring MVC, Hibernate 4.0, Apache Tomcat 7.0, MySQL 5.1.54, IntelliJ IDEA

We'd love your feedback!