Hadoop/Spark Developer Resume Reston, VA - Hire IT People

PROFESSIONAL SUMMARY:

8 +years of overall experience in IT as a Developer, Designer & Database Administrator with cross platform integration experience using Hadoop and Java/J2EE.
Over 4 years of experience exclusively on BIG DATA ECOSYSTEM using HADOOP framework and related technologies such as HDFS, MapReduce, HIVE, PIG, HBASE, FLUME, OOZIE, SQOOP, and ZOOKEEPER and this includes working experience in Spark Core , Spark SQL , Spark Streaming and Kafka.
Excellent understanding of Hadoop architecture and complete understanding of hadoop daemons and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, Data Node and Map Reduce programming paradigm.
Hands on experience using Cloudera andHorton works Hadoop Distributions.
Experience in using Cloudera Manager for installation and management of single - node and multi-node Hadoop cluster. Good experience of AWS Elastic Block Storage (EBS), different volume types and use of various types of EBS volumes based on requirement.
Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
Hands-on experience on Scala programming language.
Experience with complex data processing pipelines , including ETL and data ingestion dealing with unstructured and semi-structured data.
Hands on experience in loading unstructured data (Log files, Xml data) into HDFS using Flume/Kafka.
Expertise in writing Map Reduce jobs using Java native code, Pig, Hive for data Processing.
Worked on Importing and exporting data into HDFS and Hive using Sqoop.
Worked on Import & Export of data using ETL tool Sqoop from MySQL to HDFS.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Hands-on experience with message brokers such as Apache Kafka, IBM WebSphere, and RabbitMQ.
Wrote Hive queries for data analysis to meet the requirements.
Experience in tuning the performances by using Partitioning, Bucketing and Indexing in HIVE.
Created Hive tables to store data into HDFS and processed data using HiveQL.
Handling different file formats on Parquet,ORC, Sequence files, Flat text files.
Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
Experience in Nagios, Ganglia and Ambari , Cloudera Manager Enterprise monitoring tools.
Experience in job/workflow scheduling and monitoring tools like Oozie and Zookeeper.
Knowledge of job workflow scheduling, monitoring and of administrative tasks such as installing Hadoop, Commissioning and decommissioning, and its ecosystem components such as Flume, Oozie, Hive and Pig.
Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data and also on Windows Azure.
Knowledge in DevOps tools like Maven, Git/GitHuband Jenkins.
Hands-on experience with testing frameworks for hadoop using MRUnit framework.
Experience in developing and designing Web Services (SOAP and Restful Web services).
Excellent Java development skills using J2EE, Spring, J2SE, Servlets, JUnit, MRUnit, JSP, JDBC.
Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
Experience with web-based UI development using jQuery UI, JQuery, Bootstrap, CSS, HTML, HTML5, XHTML and Java script.
Knowledge in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
Good understanding of creating Conceptual Data Models, Process/Data Flow Diagrams, Use Case Diagrams, Class Diagrams and State Diagrams.
An excellent team player and self-starter with good communication skills and proven abilities to finish tasks before target deadlines.

TECHNICAL SKILLS:

Hadoop/Big Data: Hadoop 1.2.1,2.0, HDFS, MapReduce, PIG 0.8, Hive, Sqoop 1.4.4, Zookeeper 3.4.5, Flume, Zookeeper, Oozie, nifi, Spark, Storm, Kafka

NoSQL: Hbase, Cassandra, MongoDB

Java Technologies: Java, J2EE, JSTL, JDBC 3.0/2.1, JSP 1.2/1.1, Java Servlets, JMS, JUNIT, Log4j

Frameworks: Struts 1.2, Spring 3.0, Hibernate 3.2

Languages: C, Java, Scala, Unix Shell Scripts, Python,Php

Client Technologies: Java Script, CSS, HTML5, Bootstrap, XHTML, JQUERY

Web services: XML, SOAP, WSDL, SOA, JAX- WS, DOM, SAX, XPATH, XSLT, UDDI, JAX-RPC, REST, and JAXB 2.0

Databases: Oracle 11g/10g/9i, DB2, MS-SQL Server, MySQL, MS-Access.

Data Warehouse: Teradata, Netezza

Web Servers: Web Logic 10.3, Web Sphere 6.1, Apache Tomcat 5.5/6.0.

Analytical Tools: TABLEAU, Spotfire.

Modeling Tools: UML on Rational Rose.

IDE Development Tools: Eclipse, Net Beans, IntelliJ

Build Tools: Maven, Scala Build Tool(SBT), Ant

Version control: Git, SVN, CVS

Operating systems: WINDOWS, Linux(Red Hat, Ubuntu, Centos).

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop/Spark Developer

Responsibilities:

Worked with the business analyst team for gathering requirements and client needs.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, Sqoop, Flume.
Using Sqoop , imported and exported the data from RDBMS into HDFS .
Loaded unstructured data (Log files, Xml data) into HDFS using Flume.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark- Context , Spark -SQL , Data Frame and Pair RDD's .
Worked on reading multiple data formats on HDFS using Scala .
Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in hive.
Using Scala developed spark code and Spark-SQL/Streaming for faster processing and testing of data.
Implemented the online application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD.
Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
Developed a data pipeline using Kafka and Storm to store data into HDFS and performed the real time analytics on the incoming data.
Hands on experience in Spark and Spark Streaming creating RDD's, applying operations like Transformation and Actions on it.
Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
Used with NoSQL technology (Amazon Dynodb) to gather and track event-based metric.
Created Phoenix tables and Phoenix queries on top of HBase tables to boost query performance
Used HIVE to analyze the partitioned and bucketed data and compute various metrics for reporting.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
Written Hive queries to process the data for visualization.
Worked on different file formats ( ORCFILE , TEXTFILE) and different Compression Codecs (GZIP, SNAPPY , LZO).
Extensively created mappings in TALEND using talend features.
Extensive experience in using Talend features such as context variables, triggers, connectors for Database and flat files like tMySqlInput, tMySqlConnection, tOracle, tMSSqlInput,TMSSqlOutput, tMSSqlrow, tFileCopy, tFileInputDelimited, tFileExists.
Implemented a POC with Spark SQL to interpret complex JSON records. Used Cassandra as a storage for Spark analytics and worked on mongo dB.
Experience in collecting metrics for Hadoop clusters using Ambari.
Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
Used Zookeeper to provide coordination services to the cluster.
Involved in defining job flows using Oozie for scheduling jobs to manage apache Hadoop jobs by directed Acyclic graph (DAG) of actions with control flows
Managing and scheduling Spark Jobs on a Hadoop Cluster using Oozie.
Driving the application from development phase to production phase using Continuous Integration and Continuous Deployment ( CICD ) model using Maven and Jenkins.
Track and maintain tasks/projects completed on time and within given scope between Onsite and Offshore team .
Involved in the Complete Software development life cycle (SDLC) to develop the application.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Hadoop, HDP, HDFS, MapReduce, Pig, Hive, Sqoop, Kafka, Solr, HBase, Oozie, Flume, Spark Core, Spark SQL, Spark Streaming, HDP, java,GitHub, SQL Scripting, Unix Shell Scripting,Mongo dB, Cassandra, Linux Shell Scripting.

Confidential, Reston, VA

Hadoop Developer

Responsibilities:

Worked on use cases, data requirements and business value for implementing a Big Data Analytics platform.
Installed and configured Apache Hadoop, Hive and Pig environment on Amazon EMR.
Extensively involved in Installation and configuration of Cloudera distribution Hadoop, Name Node, Job Tracker, Task Trackers and Data Nodes.
Importing and exporting data into HDFS and Hive using Sqoop.
Configured MySQL Database to store Hive metadata.
Developed Java RESTful web services to upload data from local to Amazon S3, listing S3 objects and file manipulation operations.
Successfully ran all Hadoop MapReduce programs on Amazon Elastic MapReduce framework by using Amazon S3 for Input and Output.
Experience in Amazon S3 for storing objects and sharing data between Hadoop.
Extensively work on Redshift database development with copying data from S3, Insert Bulk records, create schema, cluster, tables and tune the queries for better performance.
Worked on performance analysis and improvements for Hive and Pig scripts at MapReduce job tuning level.
Worked on implementing several POCs to validate and fit the several Hadoop eco system tools on CDH.
Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
Responsible for smooth error-free configuration of DWH-ETL solution and Integration with Hadoop.
Designed warehousing infrastructure on top of HDFS data with Hive.
Designing and implementing semi-structured data analytics platform leveraging Hadoop .
Proficient in data modeling with Hive partitioning, bucketing, and other optimization techniques in Hive.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Finally, transforming and analyzing the data using Tableau UI tool.
Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
Managed cluster coordination services through Apache ZooKeeper .
Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
Daily status checks for Oozie workflow and monitor Cloudera manager and check data node status to ensure nodes are up and running.
Driving the application from development phase to production phase using Continuous Integration and Continuous Deployment ( CICD ) model using Maven and Jenkins.
Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, AWS, Redshift, Java, Oracle 10g, MySQL, Flume, Sqoop, Apache Zookeeper, Unix Shell Scripting, Cloudera manager, Tableau,Jenkins,Centos.

Confidential, Houston, TX

Hadoop developer

Responsibilities:

Involved in the Complete Software development life cycle (SDLC) to develop the application.
Interacted with Business Analysts to understand the requirements and the impact of the ETL on the business.
Collaborated with Business users for requirement gathering for building Tableau reports per business needs
Installed, configured and maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop for POC.
Worked on configuring security for Hadoop cluster (Kerberos, Active Directory)
Assisted in gathering and ingesting large amounts of multiform data using Flume into HDFS through a multi agent source-channel-sink combination.
Used Sqoop to extract data from multiple structured sources and to export data to other external RDBMS tables for querying and reporting.
Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
Created custom re-usable UDFs, UDAFs, UDTFs and macros in Pig Latin/Hive and used in various reporting queries.
Created Hive tables and involved in data loading and writing Hive UDFs.
Exported the analyzed data to the relational database MySQL using Sqoop for visualization and to generate reports.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Used storage format like AVRO to access multiple columnar data quickly in complex queries
Automated workflows using shell scripts to pull data from various databases into Hadoop.
Developed UI of Web Service using Spring MVC Framework .
Used Oozie workflow engine to run multiple Hive and Pig Jobs.
Responsible for creating a Solr schema from the Indexer settings
Written Solr queries for various search documents.

Environment: Apache Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Solr, Cloudera,Kerberos, Flume, HBase, ZooKeeper, Oracle,PL/SQL, NoSQL and Unix/Linux.

Confidential , Columbus, Ohio

Java/Hadoop Developer

Responsibilities :

Developed web components using JSP , Servlets and JDBC .
Extensively worked with Java Script for front-end validations.
Involved in installingHadoop Ecosystem components
Installed and configured Hadoop MapReduce, HDFS.
Developed multiple Java Map Reduce Programs for data analysis, data cleaning and pre-processing.
Participated in building test cluster for implementing Kerberos authentication.
Supported Map Reduce Programs those are running on the cluster.
Involved in HDFS maintenance and loading of structured and unstructured data.
Wrote MapReduce job using Java API.
Installed and configured Pig and written Pig Latin scripts.
Involved in managing and reviewing Hadoop log files.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Written Hive queries for data analysis to meet the business requirements.
Creating Hive tables and working on them using Hive QL.
Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Used JUnit for unit testing and Continuum for integration testing.

Environment: Hadoop, MapReduce, HDFS, Hive, PIG, MySQL, Java, Junit

Confidential, Scambell, CA

JAVA/J2EE Developer

Responsibilities:

Involved in complete requirement analysis, design, coding and testing phases of the project.
Developed use case diagrams, class diagrams, database tables, and mapping between relational database tables.
Gathered and analyzed information for developing, supporting, and modifying existing web applications based on prioritized business needs.
Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
Designed and developed the UI using JSP , HTML , CSS and JavaScript , AJAX
Used EJBs to develop business logic and coded reusable components in Java Beans.
Development of database interaction code to JDBC API making extensive use of SQL
Wrote SQL, PL/SQL, stored procedures for implementing business rules and transformations.
Query Statements and advanced Prepared Statements, Designed tables and indexes
Used connection pooling for best optimization using JDBC interface.
Used EJB entity and session beans to implement business logic and session handling and transactions Developed user-interface using JSP, Servlets, and JavaScript
Wrote complex SQL queries and stored procedures
Actively involved in the system testing.
Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.

Environment: Java, JSP, Servlets, JDBC, EJB, Servlets, JavaScript, SQL, PL/SQL, HTML, XHTML, CSS and JavaScript, AJAX Web Logic.

Confidential

Java/J2EE Developer

Responsibilities:

Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
Involved in analysis and design of the application.
Involved in preparing the detailed design document for the project.
Developed the application using J2EE architecture.
Involved in developing JSP forms.
Designed and developed web pages using HTML and JSP.
Designed various applets using JBuilder.
Designed and developed Servlets to communicate between presentation and business layer.
Used EJB as a middleware in developing a three-tier distributed application.
Developed Session Beans and Entity beans to business and data process.
Used JMS in the project for sending and receiving the messages on the queue.
Developed the Servlets for processing the data on the server.
The processed data is transferred to the database through Entity Bean.
Used JDBC for database connectivity with MySQL Server.
Query Statements and advanced Prepared Statements, Designed tables and indexes
Wrote complex SQL queries and stored procedures.
Used CVS for version control.
Involved in unit testing using Junit.
Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.

Environment: Core Java, J2EE, JSP, Servlets, XML, XSLT, EJB, JDBC, JBuilder 8.0, JBoss, Swing, JavaScript, JMS, HTML, CSS, MySQL Server, CVS, Windows 2000.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Reston, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship