We provide IT Staff Augmentation Services!

Sr. Spark/hadoop Developer Resume

3.00/5 (Submit Your Rating)

Eden Prairie, MN

SUMMARY:

  • Over 8 Plus years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR, etc) to fully implement and leverage new Hadoop features.
  • Install Kafka on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Experience working with Data Frames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
  • Experience with Core Java component Collection, Generics, Inheritance, Exception Handling and Multi - threading.
  • Very good understanding on NoSql databases like MongoDB and HBase.
  • Experience on major components in Hadoop Ecosystem including Hive, Sqoop, Flume &knowledge of MapReduce/HDFS Framework.
  • Hands-on programming experience in various technologies like Java, J2EE, Html, XML
  • A very good experience in developing and deploying the applications using Web logic, Apache Tomcat, and JBoss.
  • Experience in working with Developer Toolkits like Force.com IDE, Force.com Ant Migration Tool, Eclipse IDE, Mavens.
  • Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
  • Experience in installation, configuration and deployment of Big Data solutions.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2.
  • Implementing in setting up standards and processes for Hadoop based application design and implementation.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) Yarn Architecture.
  • Hands on experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expertise in using XML related technologies such as XML, DTD, XSD, XPATH, XSLT, DOM, SAX, JAXP, JSON and JAXB.
  • Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in implementing spark solution to enable real time reports from Cassandra data.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB.
  • Experience in extracting files from MongoDB through Sqoop and placed in HDFS and processed.
  • Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
  • Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
  • Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Build AWS secured solutions by creating VPC with private and public subnets.
  • Extensive experience in Application servers likes Web logic, Web Sphere, JBoss, Glassfish and Web Servers like Apache Tomcat.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: Hadoop 3.0, HDFS, MapReduce, HBase 1.4, Apache Pig 0.17, Hive 2.3, Sqoop 1.4, Apache Impala 3.0, Oozie 4.3, Yarn, Apache Flume 1.8, Kafka 1.1, Zookeeper 3.4

Hadoop Distributions: Cloudera, Hortonworks, MapR

Cloud: AWS, Azure, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.

Programming Language: Java, Scala 2.12, Python 3.6, SQL, PL/SQL, Shell Scripting, Storm 1.0, JSP, Servlets

Frameworks: Spring 5.0.5, Hibernate 5.2, Struts 1.3, JSF, EJB, JMS

Web Technologies: HTML5, CSS, JavaScript, JQuery 3.3, Bootstrap 4.1, XML, JSON, AJAX

Databases: Oracle 12c/11g, SQL

Database Tools: TOAD, SQL PLUS, SQL

Operating Systems: Linux, Unix, Windows 10/8/7

IDE and Tools: Eclipse 4.7, NetBeans 8.2

NoSQL Databases: HBase 1.4, Cassandra 3.11, MongoDB

Web/Application Server: Apache Tomcat 9.0.7, JBoss, Web Logic, Web Sphere

SDLC Methodologies: Agile, Waterfall

Version Control: GIT, SVN, CVS, Maven

PROFESSIONAL EXPERIENCE:

Confidential - Eden Prairie, MN

Sr. Spark/Hadoop Developer

Responsibilities:

  • As a Spark/Hadoop Developer worked on Hadoop eco-systems including Hive, MongoDB, Zookeeper, Spark Streaming with MapR distribution.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
  • Maintain Hadoop, Hadoop ecosystems, and database with updates/upgrades, performance tuning and monitoring.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Prepared data analytics processing, and data egress for availability of analytics results to visualization systems, applications, or external data stores.
  • Builds large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
  • Responsible for design and development of Spark SQL Scripts based on Functional Specifications.
  • Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Provisioning of Cloudera Director AWS instance and adding Cloudera manager repository to scale up Hadoop Cluster in AWS.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Developed Spark Applications by using Scala, Java, and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
  • Responsible for developing data pipeline using flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
  • Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Designed and developed automation test scripts using Python
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Writing Pig-scripts to transform raw data from several data sources into forming baseline data.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
  • Developed data formatted web applications and deploy the script using HTML, XHTML, CSS, and Client- side scripting using JavaScript.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
  • Assisted in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.

Environment: Hadoop 3.0, Spark 2.3, MapReduce, Java, MongoDB, HBase 1.2, JSON, Hive 2.3, Zookeeper 3.4, AWS, MySQL, Scala 2.12, Python, Cassandra 3.11, HTML5, JavaScript

Confidential - Philadelphia, PA

Sr. Big Data/Hadoop Developer

Responsibilities:

  • Worked as a Sr. Big Data/Hadoop Developer with Hadoop Ecosystems components.
  • Developed Big Data solutions focused on pattern matching and predictive modeling.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
  • Used Kibana, which is an open source based browser analytics and search dashboard for Elastic Search.
  • Used Java Persistence API (JPA) framework for object relational mapping which is based on POJO Classes.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Designed & Developed a Flattened View (Merge and Flattened dataset) de-normalizing several Datasets in Hive/HDFS.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, Pig, and loaded data into HDFS.
  • Involved in identifying job dependencies to design workflow for Oozie & Yarn resource management.
  • Designed solution for various system components using Microsoft Azure.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
  • Developed and designed data integration and migration solutions in Azure.
  • Worked on Proof of concept with Spark with Scala and Kafka.
  • Worked on visualizing the aggregated datasets in Tableau.
  • Worked on importing data from HDFS to MYSQL database and vice-versa using Sqoop.
  • Implemented MapReduce jobs in Hive by querying the available data.
  • Configured Hive Meta store with MySQL, which stores the metadata for Hive tables.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop.
  • Performance tuning of Hive queries, MapReduce programs for different applications.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Worked on MongoDB, HBase databases which differ from classic relational databases
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts

Environment: Hadoop 3.0, Agile, Pig 0.17, HBase 1.4.3, Jenkins 2.12, NoSQL, Sqoop 1.4, Impala 3.0.0, Hive 2.3, MapReduce, YARN, Oozie, Microsoft Azure, Nifi, Avro, MYSQL, Kafka, Scala 2.12, Spark, Apache Flume 1.8

Confidential - St. Louis, MO

Hadoop Developer

Responsibilities:

  • Extensively worked on Hadoop eco-systems including Hive, Spark Streaming with MapR distribution.
  • Implemented J2EE Design Patterns like DAO, Singleton, and Factory.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Upgraded the Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating Hive with existing applications.
  • Worked on NoSQL support enterprise production and loading data into HBase using Impala and Sqoop.
  • Performed multiple MapReduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Handled importing of data from various data sources, performed transformations using Hive, PIG, and loaded data into HDFS.
  • Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting.
  • Developed the Java/J2EE based multi-threaded application, which is built on top of the struts framework.
  • Used Spring/MVC framework to enable the interactions between JSP/View layer and implemented different design patterns with J2EE and XML technology.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
  • Implemented the J2EE design patterns Data Access Object (DAO), Session Façade and Business Delegate.
  • Developed Nifi flows dealing with various kinds of data formats such as XML, JSON and Avro.
  • Implemented MapReduce jobs in HIVE by querying the available data.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Collaborated with business users/product owners/developers to contribute to the analysis of functional requirements.
  • Implemented application using MVC architecture integrating Hibernate and spring frameworks.
  • Utilized various JavaScript and JQuery libraries Bootstrap, Ajax for form validation and other interactive features.
  • Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
  • Integrated Kafka-Spark streaming for high efficiency throughput and reliability
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts.

Environment: Hadoop 3.0, Hive 2.1, J2EE, JDBC, Pig 0.16, HBase 1.1, Sqoop, NoSQL, Impala, Java, Spring, MVC, XML, Spark 1.9, PL/SQL, HDFS, JSON, Hibernate, Bootstrap, JQuery, JavaScript, Ajax

Confidential - Peoria IL

Sr. Java/Hadoop Developer

Roles & Responsibilities:

  • Designed and Developed application modules using spring and Hibernate frameworks.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
  • Used MAVEN for developing build scripts and deploying the application onto WebLogic.
  • Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
  • Implemented MVC architecture using Spring Framework, Coding involves writing Action Classes/Custom Tag Libraries, JSP.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Creating Hive tables with periodic backups, writing complex Hive/Impala queries to run on Impala.
  • Implemented partitioning, bucketing and worked on Hive, using file formats and compressions techniques with optimizations.
  • Involved in designing and developing modules at both Client and Server Side.
  • Worked on JDBC framework encapsulated using DAO pattern to connect to the database.
  • Developed the UI Screens using JSP and HTML and did the client side validation with the JavaScript.
  • Worked on various SOAP and RESTful services used in various internal applications.
  • Developed JSP and Java classes for various transactional/ non-transactional reports of the system using extensive SQL queries.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
  • Implemented Storm topologies to pre-process data before moving into HDFS system.
  • Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
  • Involved in configuring builds using Jenkins with Git and used Jenkins to deploy the applications onto Dev, QA environments
  • Involved in unit testing, system integration testing and enterprise user testing using JUnit.
  • Involved in creating Hive tables, loading with data and writing Hive queries which runs internally in MapReduce way.
  • Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.

Environment: spring 4.0, Hibernate 5.0.7, Hadoop 2.6.5, Spark 1.1, Hive, Python 3.3, Scala, Sqoop, Flume 1.3.1, Impala, MapReduce, LINUX

Confidential - Detroit, MI

Sr. Java/J2EE Developer

Roles & Responsibilities:

  • As a Java/J2ee developer involved in back-end and front-end developing team.
  • Designed and developed various modules of the application with J2EE design architecture, frameworks like Spring MVC architecture and Spring Bean Factory using IOC, AOP concepts.
  • Implemented Java/J2EE design patterns such as Factory, DAO, Session Façade, and Singleton.
  • Used Hibernate in persistence layer and developed POJO's, Data Access Object (DAO) to handle all database operations.
  • Used Maven as the build tool, GIT for version control, Jenkins for Continuous Integration and JIRA as a defect tracking tool.
  • Developed the user interface components using HTML, CSS, JavaScript, AJAX, JQuery and also created custom tags.
  • Implemented the Project structure based on Spring MVC pattern using spring boot.
  • Worked on JavaScript to validate input, manipulated HTML elements using JavaScript.
  • Developed external JavaScript codes that can be used in several different web pages.
  • Developed Web pages using JSP, HTML, CSS, Struts Tag libs and AJAX for the Credit Risk module.
  • Used Spring Beans to encapsulate business logic and Implemented Application MVC Architecture using Spring MVC framework.
  • Developed XMLs, JavaScript and Java classes for dynamic HTML generation to perform the server side processing on the client requests.
  • Used JUnit framework for unit testing of application and Maven to build the application and deployed on Jetty server.
  • Developed unit test cases in JUnit and documented all the test scenarios as per the user specifications.
  • Implemented Spring framework based on the Model View Controller design paradigm.
  • Involved in development activities using Core Java /J2EE, Servlets, JSP, JSF used for creating web application, XML and springs.
  • Used Spring Framework for Dependency injection and integrated with the Hibernate.
  • Developed RESTful Web Services client to consume JSON messages using Spring JMS configuration.
  • Developed services using Spring IOC and Hibernate persistence layer with Oracle Database.
  • Implemented build script using ANT for compiling, building and deploying the application on WebSphere application server.

Environment: J2EE, Java, Spring MVC 3.0, POJO, Jenkins, HTML, JavaScript, AJAX, JQuery, CSS, XML, Maven, JUnit, Hibernate 4.2.8, POJO, ANT, Oracle 10g.

We'd love your feedback!