Hadoop Developer Resume ,Â Richmond, VA - Hire IT People

SUMMARY:

7 years of Professional experience in IT Industry in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java, J2EE.
3+ years Real time experience in Hadoop Framework and its ecosystem.
Experience in installation, configuration and managing - Cloudera (CDH3&4) and Hortonworks Hadoop platform along with CDH3&4 clusters.
Worked on Multi Clustered environment and setting up Cloudera Hadoop echo system.
Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Map, Reduce, Job Tracker, Task Tracker, Namenode, Datanode, Kafka and Secondary Namenode concepts.
Experience in dealing with Apache Hadoop components like HDFS, Map Reduce, Sqoop, Hive, PIG, Oozie, Apache Flume, Zookeeper, Ambari.
Good knowledge on Spark In-memory capabilities and its modules: Spark Streaming, Spark-SQL, Spark MLlib.
Hands on experience with working on Spark using both Scala and python. Performed various actions and transformations on spark RDD’s and DataFrames.
Good understanding of Nosql databases including Hbase and Mongodb.
Expertise in writing Hadoop Jobs for processing and analyzing data using MapReduce, Hive & Pig. Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
Hands-on experience on YARN (MapReduce 2.0) architecture and it components.
Hands on experience using Core Java, UNIX Shell scripting and RDBMS.
Excellent Java development skills using J2EE, J2SE, Servlets, JUnit, JSP, JDBC.
Very good hands-on technical knowledge of ETL Tools, DataStage, SQL and PL/SQL.
Vast Experience in Teradata and Involved in Converting Projects from Teradata to Hadoop.
Experience implementing SOAP and REST Web Services.
Hands on experience working on virtualization tools like Tableau, Arcadia Data.
Well versed with Agile working environment using JIRA and code version tools like GIT and SVN.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka

Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans

Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)

Web Technologies: JavaScript, AJAX, HTML, XML and CSS.

Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting

IDE: Eclipse, NetBeans, pyCharms

Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAIBuild Management tools: Maven, Apache ANT, SOAP, REST

Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.

Scheduling Tools: Cron tab, Autosys, Ctrl M

Visualization Tools: Tableau, Arcadia Data.

PROFESSIONAL EXPERIENCE:

Confidential, Richmond, VA

Hadoop Developer

Responsibilities:

Imported the retail and commercial data from various vendors into HDFS using EDE process and Sqoop.
Designed the Cascading flow setup from the Edge node to the HDFS (Data lake)
Created the cascading code to do several type of data transformations as required by the DA
Involved in converting Hive/SQL queries into spark transformations using spark RDDs and python(pyspark).
Involved in running analytics workloads and long running services on Apache Mesos cluster manager.
Developed Apache Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
Used the Hue to create external Hive tables on the data in the data imported and on transformed data
Developed the code for removing or replacing the error fields in the data fields using cascading
Created the custom functions for several datatype conversions, handling the errors in the data provided by the vendor
Monitored the cascading flow using the Driven component to ensure the desired result was obtained
Optimized a Confidential tool Docs, for importing the data and converting the data into parquet file format post validation.
Involved in testing the tool Spark for exporting the data from HDFS to external database in POC
Developed the shell scripts for automating the cascading jobs for Control M schedule.
Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC
Developed Hive queries to analyze the data according to the customer rating Id for several projects
Experience in developing various Spark Streaming API's using python. (pyspark).
Developing spark code using pyspark to applying various transformations and actions for faster data processing.
Working knowledge on Apache Spark Streaming API that enables scalable, high - throughput, fault-tolerant stream processing of live data streams.
Used Spark Stream processing to get data into in-memory, implemented RDD transformations, and performed actions.
Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading
Involved in writing the test cases for the cascading jobs using Plunger framework.
Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
Setting up the cascading environment and troubleshooting the environmental issues related to cascading.
Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts

Environment: MapReduce, HDFS Sqoop, Cascading, LINUX, Shell, Hadoop, Spark, Hive, AWS RedShift, Hadoop Cluster

Confidential, Sunnyvale, CA

Hadoop Developer

Responsibilities:

Involved in start to end process of hadoop cluster installation, configuration and monitoring.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
Created HBase tables to store variable data formats of data coming from different applications.
Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Pair RDD's, Spark YARN.
Good understanding on DAG cycle for entire Spark application flow on Spark application WebUI.
Involved in transforming data from legacy tables to HDFS and HBASE tables using Sqoop.
Responsible for building scalable distributed data solutions using Hadoop.
Developed Simple to complex Map/reduce Jobs using Hive and Pig.
Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.
Analyzed the data by performing Hive queries and running Pig scripts to study behavior of lab equipment.
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
Worked on Oozie workflow engine to run multiple Hive and Pig jobs.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop Yarn architecture, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse, Linux, NoSql.

Confidential, New York, NY

Java/Hadoop Developer

Responsibilities:

Worked on Hadoop cluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
Established custom MapReduces programs in order to analyze data and used Pig Latin to clean unwanted data
Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
Involved in creating Hive tables, then applied HiveQL on those tables for data validation.
Moved the data from Hive tables into Mongo collections.
Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
Used Zookeeper to manage coordination among the clusters
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Worked on Cloudera to analyze data present on top of HDFS.
Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
Assisted application teams in installing Hadoop updates, operating system, patches and version upgarades when required
Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.

Environment: Hadoop, Pig, Hive, Sqoop, Cloudera Manager (CDH3), Flume, MapReduce, HDFS, JavaScript, Websphere, HTML, AngularJS, LINUX, Oozie, MongoDB.

Confidential, Raleigh, NC

Java Developer

Responsibilities:

Work with business users to determine requirements and technical solutions.
Followed Agile methodology (Scrum Standups, Sprint Planning, Sprint Review, Sprint Showcase and Sprint Retrospective meetings).
Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading etc.
Used SPRING framework that handles application logic and makes calls to business make them as Spring Beans.
Implemented, configured data sources, session factory and used Hibernate Template to integrate Spring with Hibernate.
Developed web services to allow communication between applications through SOAP over HTTP with JMS and mule ESB.
Actively involved in coding using Core Java and collection API's such as Lists, Sets and Maps
Developed a Web Service (SOAP, WSDL) that is shared between front end and cable bill review system.
Implemented Rest based web service using JAX - RS annotations, Jersey implementation for data retrieval with JSON.
Developed MAVEN scripts to build and deploy the application onto Web logic Application Server and ran UNIX shell scripts and implemented auto deployment process.
Used Maven as the build tool and is scheduled/triggered by Jenkins (build tool).
Develop JUNIT test cases for application unit testing.
Implement Hibernate for data persistence and management.
Used SOAP UI tool for testing web services connectivity.
Used SVN as version control to check in the code, Created branches and tagged the code in SVN.
Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
Used Log4j framework to log/track application and debugging.

Environment: JDK 1.6, Eclipse IDE, Core Java, J2EE, Spring, Hibernate, Unix, Web Services, SOAP UI, Maven, Web logic Application Server, SQL Developer, Camel, Junit, SVN, Agile, SONAR, Log4j, REST, Log 4j, JSON, JBPM.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Richmond, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship