Hadoop/Spark Developer Resume Valley Forge, PA - Hire IT People

PROFESSIONAL SUMMARY:

IT professional Around 6 years of experience with extensive knowledge and background in Software Development Lifecycle Analysis, Design, Development, Debugging and Deploying various software applications. More than 4 years of hands on experience in Big Data and Hadoop Ecosystem in ingestion, storage, querying, processing and analysis using HDFS, MapReduce, Pig, Hive, Spark, Flume, Kafka, Oozie etc. 2 years of work experience using JAVA/J2EE technologies.
Good experience in developing and deploying enterprise - based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, HBase, Flume, Scoop, Spark, Strom, Kafka, Oozie and Zookeeper.
Experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS) .
Excellent Programming skills at a higher level of abstraction using Scala and Spark .
Good understanding in processing of real-time data using Spark .
Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop .
Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive .
Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie .
Experience in managing and reviewing Hadoop Log files.
Experience in setting up Zookeeper to provide coordination services to the cluster.
Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
Experience and understanding in Spark and Storm .
Hands Experience on dealing with log files to extract data and to copy into HDFS using flume .
Experience in analyzing data using Hive, Pig Latin , and custom MR programs in Java .
Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC) .
Experience in multiple database and tools, SQL analytical functions, Oracle PL/SQL server and DB2 .
Experience in Creating ETL/Talend jobs both design and code to process data to target databases.
Experience in working with Amazon Web Services EC2 instance and S3 buckets .
Worked on different file formats like Avro, Parquet, RC file format, JSON format .
Involved in writing Python scripts for building disaster recovery process for current processing data into data center by providing current static location.
Hands on experience working on NoSQL databases like MongoDB, HBase, Cassandra and its integration with Hadoop cluster.
Experience in ingesting data into Cassandra and consuming the ingested data from Cassandra to HDFS .
Used Apache Nifi for loading PDF Documents from Microsoft SharePoint to HDFS .
Used Avro serialization technique to serialize data for handling schema evolution.
Experience in designing and coding web applications using Core Java & Web Technologies - JSP, Servlets and JDBC , full Understanding of utilizing J2EE technology Stack, including Java related frameworks like Spring, ORM Frameworks (Hibernate) .
Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP .
Developed web application in open source java framework Spring . Utilized Spring MVC framework .
Experienced front-end development using EXT-JS, jQuery, JavaScript, HTML, Ajax and CSS .
Have good interpersonal, communicational skills, strong problem-solving skills, explore and adapt to new technologies with ease and a good team member.

TECHNICAL SKILLS:

Programming Languages: Java, C, Python, Shell Scripting

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Hue, Impala, Sqoop, Apache Spark, Apache Kafka, Apache Ignite, Apache Nifi, OOZIE, FLUME, Zookeeper, YARN

No SQL Databases: MongoDB, HBase, Cassandra

Hadoop Distribution: Hortonworks, Cloudera, MapR

Databases: Oracle 10g, MySQL, MSSQL

IDE/Tools: Eclipse, NetBeans, Maven

Version control: GIT, SVN, CLEARCASE

Platforms: Windows, Unix, Linux

BI Tools: Tableau, MS Excel

Web/Server Application: Apache Tomcat, Web Logic, Web sphere, MSSQL Server, Oracle Server

Web Technologies: HTML, CSS, JavaScript, jQuery, JSP, Servlets, Ajax

PROFESSIONAL EXPERIENCE:

Confidential, Valley Forge, PA

Hadoop/Spark Developer

Responsibilities:

Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO , PARQUET , CSV formats.
Leveraged Hive queries to create ORC tables.
Created Views from Hive Tables on top of data residing in Data Lake.
Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
Involved in creation and designing of data ingest pipelines using technologies such as Apache Strom and Kafka .
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific job.
Developed Spark code using Scala and Spark - SQL / Streaming for faster testing and processing of data.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context , Spark - SQL , Data Frame, Pair RDD's , Spark YARN .
Experienced with batch processing of data sources using Apache Spark , Elastic search.
Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
Import the data from different sources like HDFS / HBase into Spark RDD .
Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala .
Wrote complex SQL to pull data from the Teradata EDW and create Ad-Hoc reports for key business personnel within the organization.
Used the version control system GIT to access the repositories and used in coordinating with CI tools.
Integrated maven with GIT to manage and deploy project related tags.
Experience with AWS S3 services creating buckets, configuring buckets with permissions, logging, versioning and tagging.
Implementing a Continuous Integration and Continuous Deployment framework using Jenkins , and Maven in Linux environment.
Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.

Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, GIT, Jenkins, AWS (S3), Python, Java, SQL Scripting and Linux Shell Scripting, Hortonworks.

Confidential, Denver, CO

Hadoop Developer

Responsibilities:

Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters with agile methodology.
Monitored multiple Hadoop clusters environments and monitored workload, job performance and capacity planning using Cloudera Manager.
Load and transform large sets of structured, semi structured and unstructured data.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa.
Developed Oozie workflow for scheduling Pig and Hive Scripts.
Configured the Hadoop Ecosystem components like YARN, Hive, Pig, HBase and Impala.
Analyzed the web log data using the HiveQL to extract number of unique visitors per day, visit duration.
Involved in setting QA environment by implementing pig and Sqoop scripts.
Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
Developed Pig Latin scripts to do operations of sorting, joining and filtering source data.
Performed MapReduce programs on log data to transform into structured way to find Customer Name, age group, etc.
Pro-actively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
Executed test cases in automation tool, Performed System, Regression, Integration Testing, reviewed result, logged defect.
Participated in functional reviews, test specifications and documentation review.
Documented the systems processes and procedures for future references, responsible to manage data coming from different sources.

Environment: Cloudera Hadoop Distribution, HDFS, Talend, Map Reduce (JAVA), Impala, Pig, Sqoop, Flume, Hive, Oozie, HBase, Shell Scripting, Agile Methodologies.

Confidential, Richardson, TX

Hadoop Developer

Responsibilities:

Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive and HBase database.
Extracted the data from Oracle into HDFS using Sqoop to store and generate reports for visualization purpose.
Collected and aggregated large amounts of web log data from different sources such as webservers, mobile using Apache Flume and stored the data into HDFS for analysis.
Built a data flow pipeline using flume, Java (MapReduce) and Pig.
Developed Hive scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
Extensive experience in writing Pig scripts to transform raw data into baseline data.
Created Hive tables, partitions and loaded the data to analyze using HiveQL queries.
Worked on Oozie workflow engine for job scheduling.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Worked on analyzing and writing Hadoop MapReduce jobs using Java API, Pig and Hive.
Responsible for building scalable distributed data solutions using Hadoop.
Leveraged Solr API to search user interaction data for relevant matches.
Designed the Solr Schema, and used the Solr client API for storing, indexing, querying the schema fields
Loading the data to HBASE by using bulk load and HBASE API.
Validated applications Functionality, Usability and Compatibility during Functional, Exploratory, Regression, System Integration Content, UAT Testing phases.

Environment: Hortonworks Hadoop Distribution, MapReduce, HBase, Hive, Pig, Sqoop, Oozie, Flume, Solr, Shell script.

Confidential

Java/J2EE Developer

Responsibilities:

Analyzed project requirements for this product and involved in designing using UML infrastructure.
Interacting with the system analysts & business users for design & requirement clarification.
Extensive use of HTML5 with Angular JS, JSTL, JSP, jQuery and Bootstrap for the presentation layer along with JavaScript for client-side validation.
Taken care of Java Multithreading part in back end components.
Developed HTML reports for various modules as per the requirement.
Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
Created multiple RESTful web services using jersey2 framework.
Used Aqua Logic BPM (Business Process Managements) for workflow management.
Developed the application using NOSQL on MongoDB for storing data to the server.
Developed complete business tier with state full session Java beans and CMP Java entity beans with EJB 2.0.
Developed integration services using SOA, Web Services, SOAP and WSDL.
Designed, developed and maintained the data layer using the ORM framework in Hibernate.
Used Spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated Spring with JSF.
Responsible for managing the Sprint production test data with the help of tools like Telegance, CRM etc for tweaking the test data during the IAT / UAT Testing.
Involved in writing Unit test cases using JUnit and involved in integration testing.

Environment: Java, J2EE, HTML, CSS, JSP, JavaScript, Bootstrap, AngularJS, Servlets, JDBC, EJB, Java Beans, Hibernate, Spring MVC, Restful, JMS, MQ Series, AJAX, WebSphere Application Server, SOAP, XML, MongoDB, JUnit, Rational Suite, CVS Repository.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Valley Forge, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship