We provide IT Staff Augmentation Services!

Hadoop / Spark Developer Resume

5.00/5 (Submit Your Rating)

Tallahassee, FloridA

PROFESSIONAL SUMMARY:

  • Over 8 years of extensive experience in Information Technology with 5+ years of Hadoop/Bigdata processing and 3 years of Java J2EE technologies.
  • Solid experience in Hadoop MRV1, Hadoop MRV2 or YARN Architecture.
  • Experience on Hadoop distributions like Cloudera and Hortonworks.
  • Strong experience in Setting up the High - Availability for Hadoop Cluster Components and Edge Nodes.
  • Expertise with the tools in Hadoop ecosystem including Pig, Hive, HDFS, MapReduce, HBase, Spark, Sqoop, Flume, Oozie.
  • Excellent working knowledge of HDFS Filesystem and Hadoop Demons such as Resource Manager, Node Manager, Name Node, Data Node, Secondary Name Node, Containers etc.
  • Experienced in developing MapReduce programs in JAVA using Apache Hadoop for working with Big Data.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Used Zookeeper to provide coordination services to the cluster.
  • Experience in Application Development using Scala, Java, and Python.
  • In depth understanding of Apache spark job execution Components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages and task.
  • Experience in creating Spark Contexts, Spark SQL Contexts and Spark Streaming Context to process huge data sets of data.
  • Strong Experience in installing and working on NoSQL databases like HBase, Cassandra.
  • Experience in processing different file formats like XML, JSON, ORCFILE, TEXTFILE and sequence file formats.
  • Good Knowledge on cloud technologies like AWS cloud and Amazon Elastic MapReduce (EMR).
  • Hands on experience in customizing SPLUNK Apps and dashboards, built advanced visualizations, configurations, Reports and search capabilities.
  • Good Experience in creating Business Intelligence solutions using Tableau Desktop by connecting to multiple data sources like Hive, Flat files, CSV, SQL Server and Oracle.
  • Experience in Servlets, JDBC, JSP, Struts, Spring, JavaScript, JSON, Apache Tomcat Web/Application Servers, HTML, CSS, XML, HTML5.
  • Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2, and MySQL.
  • Involved in complete SDLC of project including requirements gathering, design documents, development, testing and production environments.
  • Good Experience in Agile Engineering practices, Scrum methodologies, and Test-Driven Development and Waterfall methodologies.
  • Experience working both independently and collaboratively to solve problems and deliver high-quality results in a fast-paced, unstructured environment.
  • Ability to quickly master new concepts and applications.
  • Exhibited strong written and oral communication skills.

TECHNICAL SKILLS:

Big Data Frameworks: Hadoop (HDFS, MapReduce), Spark, Spark SQL, Spark Streaming, Hive, Impala, Kafka, HBase, Flume, Pig, Sqoop, Oozie, Cassandra, Mongo DB.

Bigdata distribution: Cloudera, Hortonworks, Amazon EMR

Programming languages: Core Java, Scala, Python, SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu, Cent OS)

Databases: Oracle, SQL Server, MySQL

Designing Tools: UML, Visio

IDEs: Eclipse, NetBeans

Java Technologies: JSP, JDBC, Servlets, Junit, Spring, Hibernate

Web Technologies: XML, HTML, JavaScript, jQuery, JSON

Linux Experience: System Administration Tools, Puppet

Web Services: Web Service (RESTful and SOAP)

Frame Works: Jakarta Struts 1.x, Spring 2.x

Development methodologies: Agile, Waterfall

Logging Tools: Log4j

Application / Web Servers: Apache Tomcat, WebSphere

Messaging Services: ActiveMQ, Kafka, JMS

Version Tools: Git and CVS

Analytics: Tableau

Others: Putty, WinSCP, DataLake, Talend

PROFESSIONAL EXPERIENCE:

Confidential, Tallahassee, Florida

Hadoop / Spark Developer

Responsibilities:

  • Experience in supporting and managing Hadoop Clusters using Hortonworks distributions by deploying it on AWS cloud.
  • Collected aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Kafka.
  • Ingestion framework was developed in python.
  • Creating the RDD’s, Data frames for faster execution and performing data transformations and actions using Spark.
  • Developed optimal strategies for distributing the web log data over the cluster.
  • Implemented Hive Generic UDF's to in corporate business logic into Hive Queries.
  • Configuring Spark Streaming to receive real time data from the Kafka for high speed data processing and Store the stream data to HDFS.
  • Worked on Spark SQL for faster execution of Hive queries using Spark SQL Context.
  • Invoking HQL from Spark SQL and storing it in parquet file as the storage format.
  • Creating Hive tables and working on them using HQL for evaluation, filtering, loading and storing of data.
  • Design and implemented Incremental Imports using Sqoop into Hive tables.
  • Analyzed the web log data using the HQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Developed ETL jobs and scheduling it in Oozie and performing ingestion into HDFS.
  • Cluster authentication(security) is maintained by using Kerberos .
  • Good experience in designing and developing various kinds of data visualizations, dashboards, stories, and workbooks in Tableau.
  • Involved in Agile methodologies, daily scrum meetings, sprint planning.

Environment: HDFS, Hive, Sqoop, Oozie, Storm, Scala 2.11.8, Spark 2.0, Spark SQL, Spark streaming, Python, Kafka, GitHub, Hortonworks(HDP), Kerberos, AWS, Amazon S3, Amazon EC2, Amazon EBS, Tableau.

Confidential, Pleasanton, California

Hadoop Developer

Responsibilities:

  • Imported data from our relational data stores to Hadoop using Sqoop and vice versa .
  • Developed flume configurations to ingest data into HDFS.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Created various MapReduce jobs for performing ETL transformations on the transactional and application specific data sources.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Worked on the conversion of existing MapReduce batch applications for better performance.
  • Involved in creating Hive tables and loading and analyzing data using HIVE.
  • Developed HIVE queries and custom UDF, UDAF.
  • Used HIVE definition to map the output files to tables.
  • Worked on loading tables to Impala for faster retrieval using different file formats.
  • Worked with HBase, a NOSQL database.
  • Involved in using HBase Java API on Java application.
  • Integrated HIVE and HBase using storage Handlers to achieve the limitations from both the sides.
  • Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
  • Experience in benchmarking Hadoop clusters for various ingestions from various applications.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Developed Custom Service Descriptor for Cloudera.
  • Created Reports and Dashboards using structured and unstructured data using SAP BO .

Environment: Apache Hadoop 2.x (MapReduce & HDFS), Hive, Impala, Pig, HBase, Sqoop, Flume, Oozie, Linux, Java 7, Eclipse, Cloudera, SAP BO.

Confidential, Portland, Oregon

Hadoop Developer

Responsibilities:

  • Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validations.
  • Importing the unstructured data into the HDFS using Flume.
  • Developed shell Scripts for workflow Integration.
  • Written Map Reduce java programs to analyze the log data for large-scale data sets.
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
  • Developed custom UDFS and implemented Pig scripts.
  • Implemented MapReduce jobs using Java API and PIG Latin as well HIVEQL.
  • Hands on design and development of an application using Hive (UDF).
  • Provide support to data analysts in running Pig and Hive queries.
  • Involved in Importing and exporting Data between HDFS and Relational Database systems like Oracle, MySQL, DB2, Teradata using Sqoop.
  • Configured HA cluster for both Manual failover and Automatic failover.
  • Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Responsible for defining the data flow within Hadoop eco system and direct the team to implement them.
  • Exported the result set from Hive to MySQL using Shell scripts.

Environment: Apache Hadoop (HDFS & MapReduce), Hive, Sqoop, Flume, Pig 0.10 and 0.11, Linux, Java, Eclipse, Maven, Oracle 9i/10g, MySQL

Confidential

Java Developer

Responsibilities:

  • Involved in development of business domain concepts into Use Cases, Sequence Diagrams, Class Diagrams, Component Diagrams and Implementation Diagrams.
  • Implemented various J2EE Design Patterns such as Model-View-Controller, Data Access Object, Business Delegate and Transfer Object.
  • Responsible for analysis and design of the application based on MVC Architecture, using open source Struts Framework.
  • Involved in configuring Struts, Tiles and developing the configuration files.
  • Developed Struts Action classes and Validation classes using Struts controller component and Struts validation framework.
  • Developed and deployed UI layer logics using JSP, XML, JavaScript, HTML /DHTML.
  • Used Spring Framework and integrated it with Struts.
  • Involved in Configuring web.xml and struts-config.xml according to the struts framework.
  • Designed a lightweight model for the product using Inversion of Control principle and implemented it successfully using Spring IOC Container.
  • Used transaction interceptor provided by Spring for declarative Transaction Management.
  • The dependencies between the classes were managed by Spring using the Dependency Injection to promote loose coupling between them.
  • Developed JDBC Connection pooling to optimize database connections.
  • Created tables, stored procedures in SQL for data manipulation and retrieval, Database Modification using SQL, PL/SQL, Stored procedures, triggers, Views in Oracle 10g and MySQL.
  • Developed ANT script for auto-generation and deployment of the web service.
  • Wrote stored procedure and used JAVA APIs to call these procedures.
  • Developed various test cases such as unit tests, mock tests, and integration tests using the JUnit.
  • Experience in writing Stored Procedures, Functions and Packages
  • Used log4j to perform logging the applications.

Environment: Java, J2EE, Struts MVC, Tiles, JDBC, JSP, JavaScript, HTML, Spring IOC, Spring AOP, JAXWS, Ant, SQL, Oracle 10g, Junit, log4j and Eclipse.

Confidential

Java Developer

Responsibilities:

  • Involved in Full Life Cycle Development in Distributed Environment Using Java and J2EE framework.
  • Responsible for developing and modifying the existing service layer based on the business requirements.
  • Involved in designing & developing web-services using SOAP and WSDL.
  • Involved in database design.
  • Provided connections using JDBC to the database and developed SQL queries to manipulate the data.
  • Created Core Java Interfaces and Abstract classes for different functionalities.
  • Involved in High-Level Design and prepared Logical view of the application.
  • Created User Interface using JSF.
  • Responsible for Analysis, Design, Development, and Integration of UI components with backend using J2EE technologies such as Servlets, JSP, JDBC.
  • Used technologies like JSP, JSTL, JavaScript, HTML, XML and Tiles for Presentation tier.
  • Implemented Multithread concepts in Java classes to avoid deadlocking.
  • Involved in integration testing the Business Logic layer and Data Access layer.
  • Involved in JUnit testing of the application using JUnit framework.
  • Written Stored Procedures functions and views to retrieve the data from relational databases.
  • Used Maven builds to wrap around Ant build scripts.
  • CVS tool is used for version control of code and project documents.
  • Responsible to mentor/work with team members to make sure the standards and guidelines are followed and delivery of tasks in time.

Environment: Java, J2EE, JDBC, Oracle 10g, JSP, Servlets, jQuery, JSF, JDBC, Junit, HTML, JavaScript, PL/SQL, Maven, UML, CVS and Web Services.

We'd love your feedback!