Big Data Developer Resume Piscataway, NJ - Hire IT People

SUMMARY

Over 8+ years of IT experience as Big Data Developer using Hadoop, HDFS, Hortonworks, Map Reduce and Hadoop Ecosystem (Pig, Hive, Impala and Spark, Scala), Java and J2EE.
Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, Hbase and MongoDB.
Good understanding of designing attractive data visualization dashboards using Tableau.
Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
Create batch data by using spark with the help of Scala API in developing Data Ingestion pipelines using Kafka.
Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
Used Flume and Kafka to direct data from different sources to/from HDFS.
Worked with Azure cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
Knowledge about using Data Bricks Platform, Cloudera Manager and Hortonworks Distribution to monitor and manage clusters.
Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
Expertise in working with Linux/Unix and shell commands on the Terminal.
Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large scale distributed systems.
Good experience in developing MapReduce jobs in J2EE/Java for data cleansing, transformations, pre-processing and analysis.
Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 Web Services which provides fast and efficient processing of Teradata Big Data Analytics.
Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required.
Strong knowledge on Hadoop eco-systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
Expertise in JavaScript, JavaScript MVC patterns, object Oriented JavaScript Design Patterns and AJAX calls.
Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
Experience working with Hortonworks and Cloudera environments.
Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
Experience on Spark and Spark SQL, Spark Streaming, Spark GraphX, Spark MLlib.

TECHNICAL SKILLS

Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks

Cloud Platform: Amazon Web Services (AWS), MS Azure

Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016

NoSQL Databases: HBase, Hive 2.3, and MongoDB

Version Control: GIT, GitLab, SVN

Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.

Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC

Web Technologies: JavaScript, CSS, HTML and JSP.

Operating Systems: Windows, UNIX/Linux and Mac OS.

Build Management Tools: Maven, Ant.

IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.

PROFESSIONAL EXPERIENCE

Confidential - Piscataway, NJ

Big Data Developer

Responsibilities:

As a Big Data Developer, worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
Implemented Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake and Data Factory.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
Built pipelines to move hashed and un-hashed data from Azure Blob to Data Lake.
Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
Worked in Azure environment for development and deployment of Custom Hadoop Applications.
Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
Maintained Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
Created Hive schemas using performance techniques like partitioning and bucketing.
Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
Used Hadoop YARN to perform analytics on data in Hive.
Developed and maintained batch data flow using HiveQL and Unix scripting
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
Developed and execute data pipeline testing processes and validate business rules and policies
Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
Build large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
Used windows Azure SQL reporting services to create reports with tables, charts and maps.
Developed code in Java which creates mapping in Elastic Search even before data is indexed into.

Environment: Hadoop 3.0, Azure, Java 8, MapReduce, Agile, HBase 1.2, JSON, Spark 2.4, Kafka, JDBC, Hive 2.3, JSON, Pig 0.17

Confidential - Phoenix, AZ

Hadoop Developer

Responsibilities:

Worked as a Hadoop Developer on Hadoop eco-systems including Hive, Zookeeper, Spark Streaming with MapR distribution.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
Created and worked Sqoop jobs with incremental load to populate Hive External tables.
Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
Worked with Apache Nifi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems.
Developed Scala, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Exported analyzed data to relational databases using Sqoop in deploying data from various sources into HDFS and building reports.
Exported analyzed data to relational database using Sqoop for visualization to generate reports for the BI team.
Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
Involved in developing ETL data pipelines for performing real-time streaming by ingesting data into HDFS and HBase using Kafka and Storm.
Involved in moving log files generated from varied sources to HDFS, further processing through Flume.
Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
Load the data into Spark RDD and performed in-memory data computation to generate the output response.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in MongoDB.
Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Used Spark to create the structured data from large amount of unstructured data from various sources.
Used Apache Spark on Yarn to have fast large scale data processing and to increase performance.
Responsible for design & development of Spark SQL Scripts using Scala/Java based on Functional Specifications.
Worked on Cluster co-ordination services through Zookeeper.
Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
Monitored workload, job performance and capacity planning using Cloudera Manager.

Environment: Hadoop 3.0, Kafka 2.0.0, Pig 0.17, Hive 2.3, MVC, Scala 2.12, JDBC, AWS, POC, Sqoop 2.0, Zookeeper 3.4, Python, Spark 2.3, HDFS, Agile

Confidential - Brooklyn, NY

Spark Developer

Responsibilities:

Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
Migrated existing on-premises application to AWS.
Used AWS services like EC2 and S3 for small data sets processing and storage.
Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
Maintained the Hadoop cluster on AWS EMR.
Implemented Spark RDD's in Scala.
Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
Implemented Elastic Search on Hive data warehouse platform.
Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.
Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
Experienced in using Spark Core for joining the data to deliver the reports and for delivering the fraudulent activities.
Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
Experienced in Creating data-models for Client's transactional logs, analyzed the data from Cassandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).
Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
Developed Sqoop Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.

Environment: Spark, Scala, Java, Hadoop 3.0, Kafka, JSON, AWS, Hive 2.3, Pig 0.17, Sqoop, Oozie, Cassandra 3.11

Confidential - Bellevue, WA

Java/J2EE Developer

Responsibilities:

Worked as a Java/J2EE Developer to manage data and to develop web applications.
Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case diagrams.
Extensively worked on n-tier architecture system with application system development.
Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
Developed UI using HTML, CSS, Bootstrap, JQuery, and JSP for interactive cross browser functionality and complex user interface.
Developed the web interface using MVC design pattern with Struts framework
Designed and implemented most of the Java related portions of the application including EJBs for encapsulating business logic.
Developed server side utilities using J2EE technologies Servlets, JSP, JDBC using JDeveloper.
Developed the JSP’s using the struts framework tag libraries.
Developed the WORKFLOW concept using the struts framework to avoid the back button problems.
Responsible to analyze existing C ++ project to prepare business logic documents.
Was responsible to communicate with End client to support the application and analyze the issue and fixed the issue.
Maintained the struts config files, tiles definition files and web.xml.
Session Beans are designed in such a way to serve the following: Inserting, updating, and deleting data from the database
Developed and executed the business validation logic in form beans.
The framework involves struts framework, which internally uses the J2EE design patterns.
Developed the servlets, beans for the application
Preparation of Test Plans.
Involved in the application development and unit testing.
Responsible for design and architecture of the project by using MVC Struts framework.
Implemented Business Logic using POJO's and used WebSphere to deploy the applications.
Used the built tools Maven to build JAR & WAR files and ANT for clubbing all source files and web content in to war files.
Worked on various SOAP and Restful services used in various internal applications.

Environment: Java, J2EE, Eclipse, HTML, CSS, Bootstrap, JQuery, MVC, Struts, ANT, Maven

Confidential

Java Developer

Responsibilities:

Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
Developed web components using MVC pattern under Struts framework.
Wrote JSPs, Servlets and deployed them on Weblogic Application server.
Used JSP's, HTML on front end, Servlets as Front Controllers and JavaScript for client side validations.
Wrote the Hibernate-mapping XML files to define java classes-database tables mapping.
Developed the UI using JSP, HTML, CSS and AJAX and learned how to implement JQuery, JSP and client & server validations using JavaScript.
Implemented MVC architecture by using spring to send and receive the data from front-end to business layer.
Designed, developed and maintained the data layer using JDBC and performed configuration of Java Application Framework.
Extensively used Hibernate in data access layer to access and update information in the database.
Migrated the Servlets to the Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
Used Jenkins for continuous integration purpose in using SVN, JUnit and Mockito as version control and Unit testing by Creating design documents and test cases for development work.
Worked on Eclipse IDE for front end development environment for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
Developed the application using Servlets and JSP for the presentation layer along with JavaScript for the client side validations.
Wrote Hibernate classes, DAO's to retrieve & store data, configured Hibernate files.
Used Web Logic for application deployment and Log4J used for Logging/debugging.
Used CVS version controlling tool and project build tool using ANT.
Used various Core Java concepts such as multi-threading, Exception Handling, Collection APIs to implement various features and enhancements.
Wrote and debugged the Maven Scripts for building the entire web application.
Designed and developed Ajax calls to populate screens parts on demand.

Environment: JSP, CSS, HTML, Struts, Spring, Hibernate, MVC, JavaScript, XML, AJAX, JSP, Maven

We provide IT Staff Augmentation Services!

Big Data Developer Resume

Piscataway, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship