We provide IT Staff Augmentation Services!

Sr. Big Data/hadoop Developer Resume

5.00/5 (Submit Your Rating)

Boston, MA

SUMMARY:

  • Overall 9 working experience as a Big Data/Hadoop Developer in designed and developed various applications like big data, Hadoop, Java/J2EE open - source technologies.
  • Experience in leveraging big data tools such as Spark, Hadoop, Hive, HBase, Kafka, Zookeeper, Flume, MapReduce, Oozie, Yarn and Pig.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Hands on experience in Test-driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
  • Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, Hbase and MongoDB.
  • Procedural knowledge in data Cleaning and Analyzing using HiveQL and custom MapReduce programs.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Good understanding of designing attractive data visualization dashboards using Tableau.
  • Experience in using various IDEs/Text Editors such as PyCharm, Jupyter, Nano, Emacs and repositories such as Git, SVN.
  • Develop Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
  • Create batch data by using spark with the help of Scala API in developing Data Ingestion pipelines using Kafka.
  • Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
  • Used Flume and Kafka to direct data from different sources to/from HDFS.
  • Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
  • Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
  • Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
  • Knowledge about using Data Bricks Platform, Cloudera Manager and Hortonworks Distribution to monitor and manage clusters.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Expertise on working with various python packages such as Numpy, SciPy, Pandas, Matplotlib, Plotly and Cufflinks.
  • Expertise in working with Linux/Unix and shell commands on the Terminal.
  • Expertise in RDBMS namely Oracle, MS SQL Server and SPs, Views and Triggers.
  • Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large scale distributed systems.
  • Hands on experience in developing Spark Applications using Scala for Spark Streaming, Spark SQL and Structured Streaming for Real Time data processing.
  • Extensive use of Spark RDD's, Spark SQL and Data Frame for faster/optimized Streaming and Testing.
  • Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Ability to develop MapReduce program using Java and Python.
  • Good understanding and exposure to Python programming.
  • Experience in using various IDEs Eclipse, Intellij and repositories SVN and Git.
  • Exporting and importing data to and from Oracle using SQL developer for analysis.
  • Good experience in using Sqoop for traditional RDBMS data pulls and worked with different distributions of Hadoop like Hortonworks and Cloudera.
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.

PROFESSIONAL EXPERIENCE:

Confidential - Boston, MA

Sr. Big Data/Hadoop Developer

Responsibilities:

  • Worked as a Sr. Big Data/Hadoop Developer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Involved in Agile development methodology active member in scrum meetings.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Created Hive schemas using performance techniques like partitioning and bucketing.
  • Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Developed analytical components using Spark Scala and Spark Stream.
  • Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed a Spark job in Java which indexes data into Elastic Search from external Hive tables which are in HDFS.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Imported several transactional logs from web servers with Flume to ingest the data into HDFS.
  • Worked on MongoDB, HBase (NoSQL) databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Developed several new MapReduce programs to analyze and transform the data to uncover insights into the customer usage patterns.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case
  • Extensively involved in Design phase and delivered Design documents.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Implemented a distributed messaging queue to integrate with Cassandra using Zookeeper.
  • Implemented Kafka event log producer to produce the logs into Kafka topic which are utilized by Elastic Search to analyze the logs produced by the Hadoop cluster.
  • Import the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
  • Performed transformations like event joins, filter boot traffic and some pre-aggregations using Pig.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case
  • Used windows Azure SQL reporting services to create reports with tables, charts and maps.
  • Developed code in Java which creates mapping in Elastic Search even before data is indexed into.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop 3.0, HBase, Sqoop 1.4, Zookeeper 3.4, Oozie, Hive 2.3, Pig 0.17, MS Azure, Scala 2.12, Spark 2.3, Apache Flume 1.8, NoSQL, MongoDB 4.0, MapReduce, HDFS, Cassandra 3.11, Kafka 1.1, Java

Confidential - Troy, NY

Big Data/Hadoop Developer

Responsibilities:

  • Worked as a Big Data/Hadoop Developer for providing solutions for big data problem.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks. Participated in daily scrum and other design related meetings.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Implemented Hortonworks Nifi and recommended solution to inject data from multiple data sources to HDFS and Hive using Nifi.
  • Developed Nifi flow to move data from different sources to HDFS and from HDFS to S3 buckets
  • Worked on Spark SQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real-time and persist it to HBase.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Experienced in bringing up EMR cluster and deploying code into the cluster in S3 buckets.
  • Migrated the existing on-perm code to AWS EMR cluster.
  • Implemented Spark to migrate MapReduce jobs into Spark RDD transformations, streaming data using Spark streaming
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
  • Responsible for implementing MapReduce programs into Spark transformations using Spark and Scala.
  • Used NoSQL database with Hbase and MongoDB. Exported the result set from Hive to MySQL using Shell scripts.
  • Extensively worked with Cloudera Hadoop distribution components and custom packages.
  • Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Used Oozie to orchestrate the MapReduce jobs and worked with H-Catalog to open up access to Hive's Metadata.
  • Performed various data warehousing operations like de-normalization and aggregation on Hive using DML statements.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
  • Wrote complex Hive queries and UDFs in Java and Python.
  • Analyzed the data by performing Hive queries (HiveQL), ran Pig scripts, Spark SQL and Spark streaming.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.

Environment: Hadoop 3.0, Agile, Spark 2.3, Scala 2.12, Python 3.7, Hortonworks, Nifi, HDFS, Hive 2.3, AWS, NoSQL, HBase, Kafka, Java, EMR, MapReduce, Cassandra 3.11, MongoDB, MySQL, Zookeeper 3.4, Oozie, Pig 0.17, Sqoop 1.4, XML

Confidential - Boston, MA

Sr. Java/Hadoop Developer

Responsibilities:

  • Worked as Sr. Java/Hadoop Developer and responsible for taking care of everything related to the clusters.
  • Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra database.
  • Developed Spark scripts by writing custom RDDs in Scala and Python for data transformations and actions on RDDs.
  • Worked on developing the application involving Spring MVC implementations and Restful web services.
  • Implemented some of the big data operations on AWS cloud
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Developed Spark scripts by using Scala Shell commands as per the requirement.
  • Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
  • Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML, XHTML and AJAX.
  • Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
  • Implemented the use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
  • Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
  • Developed code using Core Java to implement technical enhancement following Java Standards.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark SQL Context.
  • Used Data frames/ Datasets to write SQL type queries using Spark SQL to work with datasets sitting on HDFS.
  • Implemented Hibernate utility classes, session factory methods, and different annotations to work with back end data base tables.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Used Hibernate reverse engineering tools to generate domain model classes, perform association mapping and inheritance mapping using annotations and XML.
  • Involved in various NoSQL databases like HBase, Cassandra in implementing and integration.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Used Spark API over Hadoop Yarn as execution engine for data analytics using Hive.
  • Configured Continuous Integration system to execute suites of automated test on desired frequencies using Jenkins, Maven & GIT.

Environment: Hadoop, Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark, AWS, EC2, Scala, Zookeeper, HDFS, Oozie, JSON, XML, Oracle, MySQL, Cassandra, Jenkins, Maven, GIT

Confidential - San Francisco, CA

Sr. Java/J2EE Developer

Responsibilities:

  • Translate business requirements into technical document by Interacting with Business Analysts and Subject Matter Experts (SMEs) to carefully understand business requirements.
  • Involved in the requirement analysis, design and development of the application built in Java/J2EE using JavaScript, JSP, AJAX, JDBC and Web Services with JAX-WS
  • Contributed in design and development of Spring MVC web based application.
  • Designing and Developing Micro-services that are highly scalable, fault-tolerant using Spring Boot.
  • Involved in design, develop and implementation of the application using Spring and J2EE framework.
  • Used JSP, Servlets, and HTML to create web interfaces. Developed JavaBeans and used custom tag libraries for embedding dynamic into JSP pages.
  • Used advanced level of HTML, JavaScript, CSS and pure CSS layouts (table less layout)
  • Developed dynamic e-mails using JavaScript, and hand coding of HTML and CSS.
  • Developed and Refracted the Action Servlets and JSPs using Struts following MVC architecture.
  • Involved in exposing the application as a Web Service (RESTful, JAXB, JAX-RPC, and SOAP) and configuring to connect to other web services.
  • Developed the application using Spring MVC Framework by implementing controller and backend service classes.
  • Responsible for design and implementation of various modules of the application using Struts-Spring-Hibernate architecture.
  • Responsible for writing Struts action classes, Hibernate POJO classes and integrating Struts and Hibernate with spring for processing business needs.
  • Used Spring framework along with JSP, HTML, CSS, AngularJS and JavaScript to construct the dynamic web pages (presentation layer) for the application.
  • Developed J2EE application using Spring framework with Hibernate, Spring MVC, Spring Test Context Framework and JUnit for unit testing.
  • Used Spring MVC for loading database configuration and hibernate mapping files, created data source instance for instantiating Hibernate DAO Support's session factory object.
  • Involved in developing Classes diagram in UML showing generalization and other relationships,
  • Combined Spring MVC and JQuery to perform the Ajax request and response to create and validated a form on server side.
  • Created web application prototype using JQuery and AngularJS.
  • Involved in the JMS Connection Pool and the implementation of publish and subscribe using Spring JMS.
  • Involved in the designing and developing modules in application using Spring.
  • Designed and developed User Interface using JSP, JSTL, HTML, AJAX, and JQuery.
  • Used Hibernate implemented JPA for persisting backend database transaction results in persisted classes.
  • Built web-based applications using Spring MVC Architecture suitable for Apache Axis framework.
  • Created an XML configuration file for Hibernate for Database connectivity.
  • Created connections to database using Hibernate Session Factory, using Hibernate APIs to retrieve and store data to the database with Hibernate transaction control.
  • Implemented spring services and Spring DAO's for controller interactions to operate on data.
  • Implemented Java and J2EE design patterns such as MVC and DAO.

Environment: Java, J2EE, JavaScript, AJAX, Spring MVC, HTML, JavaBeans, CSS, Struts, RESTful, SOAP, Hibernate, POJO, AngularJS, JUnit, JQuery, Ajax, XML

Confidential

Java Developer

Responsibilities:

  • Involved in the complete Software Development Life Cycle (SDLC) including Requirement Analysis, Design, Implementation, Testing and Maintenance.
  • Used core java to design application modules, base classes and utility classes.
  • Designed and implemented customized exception handling to handle the exceptions in the application.
  • Used Dependency Injection (DI) or Inversion of Control (IOC) In order to develop code for obtaining bean references in spring framework using annotations.
  • Involved in Implementation of the application by following the Java best practices and patterns.
  • Used both Java Objects and Hibernate framework to develop Business components to map the Java classes to the database.
  • Used spring framework for dependency injection, transaction management. Used Spring MVC framework controllers for Controllers part of the MVC.
  • Implemented Business Logic using POJO's and used WebSphere to deploy the applications.
  • Used Spring Framework for MVC for writing Controller, Validations and View.
  • Used Eclipse as IDE for development of the application.
  • Built data-driven Web applications with server side Java technologies like Servlets/JSP and generated dynamic Web pages with Java Server Pages (JSP)
  • Involved in mapping of data representation from MVC model to Oracle Relational data model with a SQL-based schema using Hibernate, object/relational-mapping (ORM) solution.
  • Used Spring IOC framework to integrate with Hibernate.
  • Integrating HTTP Apache Http plug-in with Weblogic Servers.
  • Implemented Maven Script to create JAR & dependency JARS and deploy the entire project onto the Weblogic Application Server.
  • Coded JavaBeans and implemented Model View Controller (MVC) Architecture.
  • Developed Client applications to consume the Web services based on both SOAP and REST protocol.
  • Utilized log4j for logging purposes and debug the application.
  • Created and implemented Oracle Queries, functions using SQL and PL/SQL.
  • Involved in bug fixing during the System testing, Joint System testing and User acceptance testing.
  • Worked on various SOAP and RESTful services used in various internal applications.
  • Consumed REST based Micro services with Rest template based on RESTful APIs.
  • Developed front end web application using AngularJS along with cutting edge HTML and CSS.
  • Developed processing component to retrieve customer information from MySQL database, developed DAO layer using Hibernate.
  • Used MAVEN for developing build scripts and deploying the application onto WebLogic.

Environment: java, spring, Hibernate, MVC, POJO, WebSphere, Eclipse, HTTP, Maven, JavaBeans, SOAP, log4j, SQL, PL/SQL, CSS, MySQL

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Hive 2.3, Pig 0.17, Sqoop 1.4, Flume 1.8, Oozie 4.3, Spark 2.3, Kafka 1.1, Storm 1.0.5 and Zookeeper 3.4

Languages: C, Java, Python 3.7, Scala 2.12, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC Struts, Spring, Hibernate 5.3.1

NoSQL Databases: HBase, Cassandra 3.11, MongoDB 4.0.0

Operating Systems: HP-Unix, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML5, DHTML, XML, AJAX, WSDL

Web/Application servers: Apache Tomcat 9.0.10, WebLogic, JBoss

Databases : Oracle 12c, DB2, SQL Server, MySQL, Teradata r15

Tools and IDE: Eclipse 4.8, NetBeans, Toad, Maven, ANT 1.10.3, Sonar, JDeveloper, DB Visualizer

Version control & Web Services: SVN, CVS, GIT, REST, SOAP

We'd love your feedback!