We provide IT Staff Augmentation Services!

Big Data Developer Resume

2.00/5 (Submit Your Rating)

Piscataway, NJ

SUMMARY

  • Over 8+ years of IT experience as Big Data Developer using Hadoop, HDFS, Hortonworks, Map Reduce and Hadoop Ecosystem (Pig, Hive, Impala and Spark, Scala), Java and J2EE.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
  • Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, Hbase and MongoDB.
  • Good understanding of designing attractive data visualization dashboards using Tableau.
  • Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
  • Create batch data by using spark with the help of Scala API in developing Data Ingestion pipelines using Kafka.
  • Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
  • Used Flume and Kafka to direct data from different sources to/from HDFS.
  • Worked with Azure cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
  • Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
  • Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
  • Knowledge about using Data Bricks Platform, Cloudera Manager and Hortonworks Distribution to monitor and manage clusters.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Expertise in working with Linux/Unix and shell commands on the Terminal.
  • Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large scale distributed systems.
  • Good experience in developing MapReduce jobs in J2EE/Java for data cleansing, transformations, pre-processing and analysis.
  • Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 Web Services which provides fast and efficient processing of Teradata Big Data Analytics.
  • Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
  • Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required.
  • Strong knowledge on Hadoop eco-systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
  • Expertise in JavaScript, JavaScript MVC patterns, object Oriented JavaScript Design Patterns and AJAX calls.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Experience working with Hortonworks and Cloudera environments.
  • Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
  • Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
  • Experience on Spark and Spark SQL, Spark Streaming, Spark GraphX, Spark MLlib.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Platform: Amazon Web Services (AWS), MS Azure

Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016

NoSQL Databases: HBase, Hive 2.3, and MongoDB

Version Control: GIT, GitLab, SVN

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.

PROFESSIONAL EXPERIENCE

Confidential - Piscataway, NJ

Big Data Developer

Responsibilities:

  • As a Big Data Developer, worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
  • Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
  • Implemented Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake and Data Factory.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
  • Built pipelines to move hashed and un-hashed data from Azure Blob to Data Lake.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Maintained Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
  • Created Hive schemas using performance techniques like partitioning and bucketing.
  • Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
  • Used Hadoop YARN to perform analytics on data in Hive.
  • Developed and maintained batch data flow using HiveQL and Unix scripting
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
  • Developed and execute data pipeline testing processes and validate business rules and policies
  • Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
  • Build large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Used windows Azure SQL reporting services to create reports with tables, charts and maps.
  • Developed code in Java which creates mapping in Elastic Search even before data is indexed into.

Environment: Hadoop 3.0, Azure, Java 8, MapReduce, Agile, HBase 1.2, JSON, Spark 2.4, Kafka, JDBC, Hive 2.3, JSON, Pig 0.17

Confidential - Phoenix, AZ

Hadoop Developer

Responsibilities:

  • Worked as a Hadoop Developer on Hadoop eco-systems including Hive, Zookeeper, Spark Streaming with MapR distribution.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
  • Worked with Apache Nifi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems.
  • Developed Scala, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Exported analyzed data to relational databases using Sqoop in deploying data from various sources into HDFS and building reports.
  • Exported analyzed data to relational database using Sqoop for visualization to generate reports for the BI team.
  • Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
  • Involved in developing ETL data pipelines for performing real-time streaming by ingesting data into HDFS and HBase using Kafka and Storm.
  • Involved in moving log files generated from varied sources to HDFS, further processing through Flume.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in MongoDB.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Used Spark to create the structured data from large amount of unstructured data from various sources.
  • Used Apache Spark on Yarn to have fast large scale data processing and to increase performance.
  • Responsible for design & development of Spark SQL Scripts using Scala/Java based on Functional Specifications.
  • Worked on Cluster co-ordination services through Zookeeper.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: Hadoop 3.0, Kafka 2.0.0, Pig 0.17, Hive 2.3, MVC, Scala 2.12, JDBC, AWS, POC, Sqoop 2.0, Zookeeper 3.4, Python, Spark 2.3, HDFS, Agile

Confidential - Brooklyn, NY

Spark Developer

Responsibilities:

  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Migrated existing on-premises application to AWS.
  • Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
  • Maintained the Hadoop cluster on AWS EMR.
  • Implemented Spark RDD's in Scala.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
  • Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
  • Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Experienced in using Spark Core for joining the data to deliver the reports and for delivering the fraudulent activities.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
  • Experienced in Creating data-models for Client's transactional logs, analyzed the data from Cassandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).
  • Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
  • Developed Sqoop Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.

Environment: Spark, Scala, Java, Hadoop 3.0, Kafka, JSON, AWS, Hive 2.3, Pig 0.17, Sqoop, Oozie, Cassandra 3.11

Confidential - Bellevue, WA

Java/J2EE Developer

Responsibilities:

  • Worked as a Java/J2EE Developer to manage data and to develop web applications.
  • Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case diagrams.
  • Extensively worked on n-tier architecture system with application system development.
  • Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
  • Developed UI using HTML, CSS, Bootstrap, JQuery, and JSP for interactive cross browser functionality and complex user interface.
  • Developed the web interface using MVC design pattern with Struts framework
  • Designed and implemented most of the Java related portions of the application including EJBs for encapsulating business logic.
  • Developed server side utilities using J2EE technologies Servlets, JSP, JDBC using JDeveloper.
  • Developed the JSP’s using the struts framework tag libraries.
  • Developed the WORKFLOW concept using the struts framework to avoid the back button problems.
  • Responsible to analyze existing C ++ project to prepare business logic documents.
  • Was responsible to communicate with End client to support the application and analyze the issue and fixed the issue.
  • Maintained the struts config files, tiles definition files and web.xml.
  • Session Beans are designed in such a way to serve the following: Inserting, updating, and deleting data from the database
  • Developed and executed the business validation logic in form beans.
  • The framework involves struts framework, which internally uses the J2EE design patterns.
  • Developed the servlets, beans for the application
  • Preparation of Test Plans.
  • Involved in the application development and unit testing.
  • Responsible for design and architecture of the project by using MVC Struts framework.
  • Implemented Business Logic using POJO's and used WebSphere to deploy the applications.
  • Used the built tools Maven to build JAR & WAR files and ANT for clubbing all source files and web content in to war files.
  • Worked on various SOAP and Restful services used in various internal applications.

Environment: Java, J2EE, Eclipse, HTML, CSS, Bootstrap, JQuery, MVC, Struts, ANT, Maven

Confidential

Java Developer

Responsibilities:

  • Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
  • Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
  • Developed web components using MVC pattern under Struts framework.
  • Wrote JSPs, Servlets and deployed them on Weblogic Application server.
  • Used JSP's, HTML on front end, Servlets as Front Controllers and JavaScript for client side validations.
  • Wrote the Hibernate-mapping XML files to define java classes-database tables mapping.
  • Developed the UI using JSP, HTML, CSS and AJAX and learned how to implement JQuery, JSP and client & server validations using JavaScript.
  • Implemented MVC architecture by using spring to send and receive the data from front-end to business layer.
  • Designed, developed and maintained the data layer using JDBC and performed configuration of Java Application Framework.
  • Extensively used Hibernate in data access layer to access and update information in the database.
  • Migrated the Servlets to the Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
  • Used Jenkins for continuous integration purpose in using SVN, JUnit and Mockito as version control and Unit testing by Creating design documents and test cases for development work.
  • Worked on Eclipse IDE for front end development environment for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
  • Developed the application using Servlets and JSP for the presentation layer along with JavaScript for the client side validations.
  • Wrote Hibernate classes, DAO's to retrieve & store data, configured Hibernate files.
  • Used Web Logic for application deployment and Log4J used for Logging/debugging.
  • Used CVS version controlling tool and project build tool using ANT.
  • Used various Core Java concepts such as multi-threading, Exception Handling, Collection APIs to implement various features and enhancements.
  • Wrote and debugged the Maven Scripts for building the entire web application.
  • Designed and developed Ajax calls to populate screens parts on demand.

Environment: JSP, CSS, HTML, Struts, Spring, Hibernate, MVC, JavaScript, XML, AJAX, JSP, Maven

We'd love your feedback!