Big Data Developer Resume
Piscataway, NJ
SUMMARY
- Over 8+ years of IT experience as Big Data Developer using Hadoop, HDFS, Hortonworks, Map Reduce and Hadoop Ecosystem (Pig, Hive, Impala and Spark, Scala), Java and J2EE.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
- Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, Hbase and MongoDB.
- Good understanding of designing attractive data visualization dashboards using Tableau.
- Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
- Create batch data by using spark with the help of Scala API in developing Data Ingestion pipelines using Kafka.
- Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
- Used Flume and Kafka to direct data from different sources to/from HDFS.
- Worked with Azure cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
- Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
- Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
- Knowledge about using Data Bricks Platform, Cloudera Manager and Hortonworks Distribution to monitor and manage clusters.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Expertise in working with Linux/Unix and shell commands on the Terminal.
- Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large scale distributed systems.
- Good experience in developing MapReduce jobs in J2EE/Java for data cleansing, transformations, pre-processing and analysis.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 Web Services which provides fast and efficient processing of Teradata Big Data Analytics.
- Experience in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Extensive experience on developing Spark Streaming jobs by developing RDD's (Resilient Distributed Datasets) and used Spark SQL as required.
- Strong knowledge on Hadoop eco-systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Extensive experience with advanced J2EE Frameworks such as spring, Struts, JSF and Hibernate.
- Expertise in JavaScript, JavaScript MVC patterns, object Oriented JavaScript Design Patterns and AJAX calls.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
- Experience working with Hortonworks and Cloudera environments.
- Good knowledge in implementing various data processing techniques using Apache HBase for handling the data and formatting it as required.
- Excellent experience in installing and running various Oozie workflows and automating parallel job executions.
- Experience on Spark and Spark SQL, Spark Streaming, Spark GraphX, Spark MLlib.
TECHNICAL SKILLS
Big Data Ecosystem: MapReduce, HDFS, HIVE 2.3, HBase 1.2 Pig, Sqoop, Flume 1.8, HDP, Oozie, Zookeeper, Spark, Kafka, storm, Hue Hadoop Distributions Cloudera (CDH3, CDH4, CDH5), Hortonworks
Cloud Platform: Amazon Web Services (AWS), MS Azure
Relational Databases: Oracle 12c, MySQL, MS-SQL Server2016
NoSQL Databases: HBase, Hive 2.3, and MongoDB
Version Control: GIT, GitLab, SVN
Programming Languages: Java, Python, SQL, PL/SQL, AWS, HiveQL, UNIX Shell Scripting, Scala.
Software Development & Testing Life cycle: UML, Design Patterns (Core Java and J2EE), Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC
Web Technologies: JavaScript, CSS, HTML and JSP.
Operating Systems: Windows, UNIX/Linux and Mac OS.
Build Management Tools: Maven, Ant.
IDE & Command line tools: Eclipse, IntelliJ, Toad and NetBeans.
PROFESSIONAL EXPERIENCE
Confidential - Piscataway, NJ
Big Data Developer
Responsibilities:
- As a Big Data Developer, worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
- Implemented Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, Azure Data Lake and Data Factory.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
- Built pipelines to move hashed and un-hashed data from Azure Blob to Data Lake.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Maintained Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
- Created Hive schemas using performance techniques like partitioning and bucketing.
- Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
- Used Hadoop YARN to perform analytics on data in Hive.
- Developed and maintained batch data flow using HiveQL and Unix scripting
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Involved in converting MapReduce programs into Spark transformations using Spark RDD's using Scala and Python.
- Developed and execute data pipeline testing processes and validate business rules and policies
- Built code for real time data ingestion using Java, MapR-Streams (Kafka) and STORM.
- Build large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
- Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Used windows Azure SQL reporting services to create reports with tables, charts and maps.
- Developed code in Java which creates mapping in Elastic Search even before data is indexed into.
Environment: Hadoop 3.0, Azure, Java 8, MapReduce, Agile, HBase 1.2, JSON, Spark 2.4, Kafka, JDBC, Hive 2.3, JSON, Pig 0.17
Confidential - Phoenix, AZ
Hadoop Developer
Responsibilities:
- Worked as a Hadoop Developer on Hadoop eco-systems including Hive, Zookeeper, Spark Streaming with MapR distribution.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3)
- Worked with Apache Nifi to Develop Custom Processors for the purpose of processing and disturbing data among cloud systems.
- Developed Scala, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Exported analyzed data to relational databases using Sqoop in deploying data from various sources into HDFS and building reports.
- Exported analyzed data to relational database using Sqoop for visualization to generate reports for the BI team.
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Involved in developing ETL data pipelines for performing real-time streaming by ingesting data into HDFS and HBase using Kafka and Storm.
- Involved in moving log files generated from varied sources to HDFS, further processing through Flume.
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
- Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in MongoDB.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Developed multiple POCs using PySpark and deployed on the YARN cluster, compared the performance of Spark, with Hive
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Spark to create the structured data from large amount of unstructured data from various sources.
- Used Apache Spark on Yarn to have fast large scale data processing and to increase performance.
- Responsible for design & development of Spark SQL Scripts using Scala/Java based on Functional Specifications.
- Worked on Cluster co-ordination services through Zookeeper.
- Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: Hadoop 3.0, Kafka 2.0.0, Pig 0.17, Hive 2.3, MVC, Scala 2.12, JDBC, AWS, POC, Sqoop 2.0, Zookeeper 3.4, Python, Spark 2.3, HDFS, Agile
Confidential - Brooklyn, NY
Spark Developer
Responsibilities:
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Migrated existing on-premises application to AWS.
- Used AWS services like EC2 and S3 for small data sets processing and storage.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Maintained the Hadoop cluster on AWS EMR.
- Implemented Spark RDD's in Scala.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Involved in loading data from rest endpoints to Kafka Producers and transferring the data to Kafka Brokers.
- Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
- Load D-Stream data into Spark RDD and do in memory data Computation to generate Output response.
- Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
- Implemented Elastic Search on Hive data warehouse platform.
- Worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.
- Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
- Experienced in using Spark Core for joining the data to deliver the reports and for delivering the fraudulent activities.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Experienced in Creating data-models for Client's transactional logs, analyzed the data from Cassandra tables for quick searching, sorting and grouping using the Cassandra Query Language (CQL).
- Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
- Developed Sqoop Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
- Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
Environment: Spark, Scala, Java, Hadoop 3.0, Kafka, JSON, AWS, Hive 2.3, Pig 0.17, Sqoop, Oozie, Cassandra 3.11
Confidential - Bellevue, WA
Java/J2EE Developer
Responsibilities:
- Worked as a Java/J2EE Developer to manage data and to develop web applications.
- Involved in Documentation and Use case design using UML modeling include development of Class diagrams, Sequence diagrams, and Use case diagrams.
- Extensively worked on n-tier architecture system with application system development.
- Extensively used Eclipse IDE for developing, debugging, integrating and deploying the application.
- Developed UI using HTML, CSS, Bootstrap, JQuery, and JSP for interactive cross browser functionality and complex user interface.
- Developed the web interface using MVC design pattern with Struts framework
- Designed and implemented most of the Java related portions of the application including EJBs for encapsulating business logic.
- Developed server side utilities using J2EE technologies Servlets, JSP, JDBC using JDeveloper.
- Developed the JSP’s using the struts framework tag libraries.
- Developed the WORKFLOW concept using the struts framework to avoid the back button problems.
- Responsible to analyze existing C ++ project to prepare business logic documents.
- Was responsible to communicate with End client to support the application and analyze the issue and fixed the issue.
- Maintained the struts config files, tiles definition files and web.xml.
- Session Beans are designed in such a way to serve the following: Inserting, updating, and deleting data from the database
- Developed and executed the business validation logic in form beans.
- The framework involves struts framework, which internally uses the J2EE design patterns.
- Developed the servlets, beans for the application
- Preparation of Test Plans.
- Involved in the application development and unit testing.
- Responsible for design and architecture of the project by using MVC Struts framework.
- Implemented Business Logic using POJO's and used WebSphere to deploy the applications.
- Used the built tools Maven to build JAR & WAR files and ANT for clubbing all source files and web content in to war files.
- Worked on various SOAP and Restful services used in various internal applications.
Environment: Java, J2EE, Eclipse, HTML, CSS, Bootstrap, JQuery, MVC, Struts, ANT, Maven
Confidential
Java Developer
Responsibilities:
- Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
- Created user-friendly GUI interface and Web pages using HTML, CSS and JSP.
- Developed web components using MVC pattern under Struts framework.
- Wrote JSPs, Servlets and deployed them on Weblogic Application server.
- Used JSP's, HTML on front end, Servlets as Front Controllers and JavaScript for client side validations.
- Wrote the Hibernate-mapping XML files to define java classes-database tables mapping.
- Developed the UI using JSP, HTML, CSS and AJAX and learned how to implement JQuery, JSP and client & server validations using JavaScript.
- Implemented MVC architecture by using spring to send and receive the data from front-end to business layer.
- Designed, developed and maintained the data layer using JDBC and performed configuration of Java Application Framework.
- Extensively used Hibernate in data access layer to access and update information in the database.
- Migrated the Servlets to the Spring Controllers and developed Spring Interceptors, worked on JSPs, JSTL, and JSP Custom Tags.
- Used Jenkins for continuous integration purpose in using SVN, JUnit and Mockito as version control and Unit testing by Creating design documents and test cases for development work.
- Worked on Eclipse IDE for front end development environment for insertions, updating and retrieval operations of data from oracle database by writing stored procedures.
- Developed the application using Servlets and JSP for the presentation layer along with JavaScript for the client side validations.
- Wrote Hibernate classes, DAO's to retrieve & store data, configured Hibernate files.
- Used Web Logic for application deployment and Log4J used for Logging/debugging.
- Used CVS version controlling tool and project build tool using ANT.
- Used various Core Java concepts such as multi-threading, Exception Handling, Collection APIs to implement various features and enhancements.
- Wrote and debugged the Maven Scripts for building the entire web application.
- Designed and developed Ajax calls to populate screens parts on demand.
Environment: JSP, CSS, HTML, Struts, Spring, Hibernate, MVC, JavaScript, XML, AJAX, JSP, Maven