Hadoop Developer Resume
Richmond, VA
SUMMARY:
- 7 years of Professional experience in IT Industry in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java, J2EE.
- 3+ years Real time experience in Hadoop Framework and its ecosystem.
- Experience in installation, configuration and managing - Cloudera (CDH3&4) and Hortonworks Hadoop platform along with CDH3&4 clusters.
- Worked on Multi Clustered environment and setting up Cloudera Hadoop echo system.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Map, Reduce, Job Tracker, Task Tracker, Namenode, Datanode, Kafka and Secondary Namenode concepts.
- Experience in dealing with Apache Hadoop components like HDFS, Map Reduce, Sqoop, Hive, PIG, Oozie, Apache Flume, Zookeeper, Ambari.
- Good knowledge on Spark In-memory capabilities and its modules: Spark Streaming, Spark-SQL, Spark MLlib.
- Hands on experience with working on Spark using both Scala and python. Performed various actions and transformations on spark RDD’s and DataFrames.
- Good understanding of Nosql databases including Hbase and Mongodb.
- Expertise in writing Hadoop Jobs for processing and analyzing data using MapReduce, Hive & Pig. Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Hands-on experience on YARN (MapReduce 2.0) architecture and it components.
- Hands on experience using Core Java, UNIX Shell scripting and RDBMS.
- Excellent Java development skills using J2EE, J2SE, Servlets, JUnit, JSP, JDBC.
- Very good hands-on technical knowledge of ETL Tools, DataStage, SQL and PL/SQL.
- Vast Experience in Teradata and Involved in Converting Projects from Teradata to Hadoop.
- Experience implementing SOAP and REST Web Services.
- Hands on experience working on virtualization tools like Tableau, Arcadia Data.
- Well versed with Agile working environment using JIRA and code version tools like GIT and SVN.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka
Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans
Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)
Web Technologies: JavaScript, AJAX, HTML, XML and CSS.
Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting
IDE: Eclipse, NetBeans, pyCharms
Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAIBuild Management tools: Maven, Apache ANT, SOAP, REST
Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.
Scheduling Tools: Cron tab, Autosys, Ctrl M
Visualization Tools: Tableau, Arcadia Data.
PROFESSIONAL EXPERIENCE:
Confidential, Richmond, VA
Hadoop Developer
Responsibilities:- Imported the retail and commercial data from various vendors into HDFS using EDE process and Sqoop.
- Designed the Cascading flow setup from the Edge node to the HDFS (Data lake)
- Created the cascading code to do several type of data transformations as required by the DA
- Involved in converting Hive/SQL queries into spark transformations using spark RDDs and python(pyspark).
- Involved in running analytics workloads and long running services on Apache Mesos cluster manager.
- Developed Apache Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Used the Hue to create external Hive tables on the data in the data imported and on transformed data
- Developed the code for removing or replacing the error fields in the data fields using cascading
- Created the custom functions for several datatype conversions, handling the errors in the data provided by the vendor
- Monitored the cascading flow using the Driven component to ensure the desired result was obtained
- Optimized a Confidential tool Docs, for importing the data and converting the data into parquet file format post validation.
- Involved in testing the tool Spark for exporting the data from HDFS to external database in POC
- Developed the shell scripts for automating the cascading jobs for Control M schedule.
- Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC
- Developed Hive queries to analyze the data according to the customer rating Id for several projects
- Experience in developing various Spark Streaming API's using python. (pyspark).
- Developing spark code using pyspark to applying various transformations and actions for faster data processing.
- Working knowledge on Apache Spark Streaming API that enables scalable, high - throughput, fault-tolerant stream processing of live data streams.
- Used Spark Stream processing to get data into in-memory, implemented RDD transformations, and performed actions.
- Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading
- Involved in writing the test cases for the cascading jobs using Plunger framework.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Setting up the cascading environment and troubleshooting the environmental issues related to cascading.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
Environment: MapReduce, HDFS Sqoop, Cascading, LINUX, Shell, Hadoop, Spark, Hive, AWS RedShift, Hadoop Cluster
Confidential, Sunnyvale, CA
Hadoop Developer
Responsibilities:- Involved in start to end process of hadoop cluster installation, configuration and monitoring.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Created HBase tables to store variable data formats of data coming from different applications.
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Pair RDD's, Spark YARN.
- Good understanding on DAG cycle for entire Spark application flow on Spark application WebUI.
- Involved in transforming data from legacy tables to HDFS and HBASE tables using Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.
- Analyzed the data by performing Hive queries and running Pig scripts to study behavior of lab equipment.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Worked on Oozie workflow engine to run multiple Hive and Pig jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop Yarn architecture, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse, Linux, NoSql.
Confidential, New York, NY
Java/Hadoop Developer
Responsibilities:- Worked on Hadoop cluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Established custom MapReduces programs in order to analyze data and used Pig Latin to clean unwanted data
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Involved in creating Hive tables, then applied HiveQL on those tables for data validation.
- Moved the data from Hive tables into Mongo collections.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
- Used Zookeeper to manage coordination among the clusters
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Worked on Cloudera to analyze data present on top of HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgarades when required
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Environment: Hadoop, Pig, Hive, Sqoop, Cloudera Manager (CDH3), Flume, MapReduce, HDFS, JavaScript, Websphere, HTML, AngularJS, LINUX, Oozie, MongoDB.
Confidential, Raleigh, NC
Java Developer
Responsibilities:- Work with business users to determine requirements and technical solutions.
- Followed Agile methodology (Scrum Standups, Sprint Planning, Sprint Review, Sprint Showcase and Sprint Retrospective meetings).
- Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading etc.
- Used SPRING framework that handles application logic and makes calls to business make them as Spring Beans.
- Implemented, configured data sources, session factory and used Hibernate Template to integrate Spring with Hibernate.
- Developed web services to allow communication between applications through SOAP over HTTP with JMS and mule ESB.
- Actively involved in coding using Core Java and collection API's such as Lists, Sets and Maps
- Developed a Web Service (SOAP, WSDL) that is shared between front end and cable bill review system.
- Implemented Rest based web service using JAX - RS annotations, Jersey implementation for data retrieval with JSON.
- Developed MAVEN scripts to build and deploy the application onto Web logic Application Server and ran UNIX shell scripts and implemented auto deployment process.
- Used Maven as the build tool and is scheduled/triggered by Jenkins (build tool).
- Develop JUNIT test cases for application unit testing.
- Implement Hibernate for data persistence and management.
- Used SOAP UI tool for testing web services connectivity.
- Used SVN as version control to check in the code, Created branches and tagged the code in SVN.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Used Log4j framework to log/track application and debugging.
Environment: JDK 1.6, Eclipse IDE, Core Java, J2EE, Spring, Hibernate, Unix, Web Services, SOAP UI, Maven, Web logic Application Server, SQL Developer, Camel, Junit, SVN, Agile, SONAR, Log4j, REST, Log 4j, JSON, JBPM.