We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

NY

SUMMARY

  • Around 6+ years of IT experience which includes hands on experience in Big Data/Hadoop development and good object oriented programming skills.
  • Good knowledge of Hadoop Development and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map - Reduce concepts.
  • Experience in all the phases of Data warehouse life cycle involving Requirement Analysis, Design, Coding, Testing, and Deployment.
  • Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced on major Hadoop ecosystem’s projects such as PIG, HIVE and HBASE.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in using job scheduling and monitoring tools like Oozie and ZooKeeper.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
  • Experience in installation, configuration, supporting and managing Cloudera Hadoop platform along with CDH3, CDH4 clusters.
  • Experience in tuning and troubleshooting performance issues in Hadoop cluster.
  • Monitored multiple Hadoop clusters environments using Nagios and Ganglia.
  • Experience with Ambari Hortonworks HDP 2.1 and 2.2 distributions
  • Experience in importing and exporting data using Sqoop from HDFS file system to Relational Database Systems and vice-versa.
  • Experience in providing security for Hadoop Cluster with Kerberos.
  • Experience in extracting the data from RDBMS into HDFS Sqoop.
  • Experience in collecting the logs from log collector into HDFS using up Flume
  • Good understanding of NoSQL databases such as Cassandra, HBase and MongoDB.
  • Experience in analyzing data in HDFS through MapReduce, Hive and Pig.
  • Imported and exported data using Sqoop from HDFS to RDBMS.
  • Experienced in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
  • Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper.
  • Understanding of Hadoop, mapreduce paradigm, Spark.
  • Extending Hive and Pig core functionality by writing customUDFs.
  • Knowledge in Full Life Cycle development of Data Warehousing.
  • Good understanding of Data Mining and Machine Learning techniques.
  • Experience in creating custom Lucene/Solr Query components.
  • Migration of Informatica Mappings/Sessions/Workflows from Dev, QA to Prod environments.
  • Experience creating and upgrading ETL and Relational Database Frameworks.
  • Worked on data warehouse product Amazon Redshirt which is a part of the AWS (Amazon Web Services).
  • Experience in developing Pig scripts and Hive Query Language.
  • Experience in managing and reviewing Hadoop Log files.
  • Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources.
  • Evaluation of ETL tools and recommends the most suitable solutions based on business needs.
  • Have worked on data visualization using Tableau.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
  • Created UNIX shell scripts to run the Informatica workflows and controlling the ETL flow.
  • Strong with relational database design concepts.
  • Well experienced and posses strong knowledge in Unix Shell Scripting
  • Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
  • Detailed understanding of Software Development Life Cycle (SDLC) and knowledge of project management methodologies including Agile.
  • Strong Experience in working with Databases like Oracle DB2, SQL Server 2008 and MySQLand proficiency in writing complex SQL queries.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Oozie, Zookeeper, Cassandra spark, Storm, & Kafka

Java Technologies: Java, J2EE, JSTL, JDBC 3.0/2.1, JSP 1.2/1.1, Java Servlets, JMS, JUnit, Log4j

IDE Development Tools: Eclipse 3.5, Net Beans, Oracle JDeveloper 10.1.3

Search Engine: Elastic Search2.3,Solr.

Frameworks: MVC, Struts, Hibernate, Spring

Visualization Tools: Tableau, Talend

Programming languages: C, C++, Java, Python, Ruby, Linux shell scripts

Databases: Oracle 9i/10g, MySQL, DB2, MS-SQL Server

Web Servers: IIS 7.0, AWS, EC2, S3, RDS, ELB

Web Technologies: HTML, XML,AngularJS, JavaScript, AJAX, SOAP, WSDL

ETL Tools: Informatica, Pentaho, SSRS, SSIS, Cognos.

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, NY

Responsibilities:

  • Developed data pipeline using Kafka, storm, Pig and Java MapReduce to ingest customer behavioural data and financial histories into HDFS for analysis.
  • Involved in writing MapReduce jobs.
  • Involved in Sqoop, HDFS Put or Copy from Local to ingest data.
  • Worked comprehensively with Apache Sqoop and developed Sqoop scripts in order to interface data from a MySQL database into theHadoopDistributed File System (HDFS). Utilize parallel processes of theHadoopframework to ensure resource efficiency.
  • Successfully automated the Flume workflow using the Oozie scheduler.
  • Reworked and structured data obtained to draw substantial conclusions using Pig and Hive languages in theHadoopframework for management assessment and use.
  • Used Pig to do transformations, event joins, filter traffic and some pre-aggregations before storing the data onto HDFS.
  • Experience in managing and reviewing Hadoop Log files.
  • Worked on Tableau for generating reports on HDFS data.
  • Worked on NoSQL databases like Cassandra, CouchDB and MongoDB
  • Created Managed tables and External tables in hive and loaded data from HDFS.
  • Optimized the Hive tables using Optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
  • Created partitioned tables and loaded data using both static and dynamic partition method.
  • Developed a pipeline using Kafka, storm and Java API to maintain a flow of message with different multiple producers and multiple subscribers.
  • Created a Pro - Sub message queue for MSC direct.
  • Configured WebHCAT to allow presentation dashboard to query Hive-indexed data.
  • Coordinated with the BI team to visualize the transformed data into a dashboard using Tableau.
  • Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and executing Hive queries and Pig Scripts.
  • Implemented new search techniques to pull the records from the elastic by MSC Associates.
  • Worked on data flow with different file format to inject into Elastic Search for quick search techniques.
  • Created a new Elastic Search environment parallel to Endeca for new work flow.
  • Introduced a new Search Engine called Elastic Search into Confidential .
  • Worked on huge .csv format files to convert it into .txt files by Java.
  • Worked on huge excel files to convert it into .txt files by Java.
  • Did the Proof of concept (POC) on Elastic Search with Java API to set a work flow between different clients and Elastic search.

Environment: HDFS, Kafka, Pig, Hive, Storm, MapReduce, Cassandra, MongoDB,Sqoop,CouchDB, Oozie, Elastic Search, Kibana,Big Data, Java APIs, Java collection, Python, SQL, NoSQL, HBase.

Hadoop Developer

Confidential, Jersey city, NJ

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and MapReduce.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Real time streaming the data using Spark with Kafka.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS using Sqoop and Kafka.
  • Developed Hadoop streaming Map/Reduce works using Python.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Expertise with NoSQL databases like Hbase, Cassandra, DynamoDB (AWS) and MongoDB
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in the environment.
  • Expertise in AWS data migration between different database platforms like SQL Server to Amazon Aurora using RDS tool.
  • Supported MapReduce Programs those are running on the cluster.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in scheduling Oozie workflow engine to run multiple pig jobs.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Involved in defining job flows.
  • Experienced in monitoring the logs.
  • Experienced in running Hadoop streaming jobs to process terabytes of data in xml format.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible to manage data coming from different sources
  • Involved in using HCATALOG to access Hive table metadata from MapReduce or Pig code.
  • Developed Enterprise Lucene/Solrbased solutions to include custom type/object modeling and implementation into the Lucene/Solranalysis (Tokenizers/Filters) pipline.
  • Created customSolrQuery components to enable optimum search matching.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Developed Hadoop streaming Map/Reduce works using Python.
  • Administrating Tableau Server backing up the reports and providing privileges to users.
  • Represented the retrieved results through tableau.
  • Used Eclipse and ant to build the application.
  • Used NoSQL database with Cassandra and Monod.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Spark, AWS EC2, S3, RDS, Kafka, Solr, LINUX, Cloudera, Big Data, Java APIs, Java collection, Python, SQL, NoSQL, Cassandra,Tableau, HBase.

Hadoop Developer

Confidential

Responsibilities:

  • Involved different phases in big data projects like data acquiring, data processing and data serving using dash boards.
  • Import/export data from Oracle data base to/from HDFS using Sqoop, Hue and JDBC.
  • Gathered data from different sources like Internet, sensors, user behavior, and moved to HDFS using Optimized join baseinMapReduce programs.
  • Implemented Custom Input formats that handles input files received from java applications to process in MapReduce.
  • Implemented joins and data aggregation using Apache Crunch.
  • Writing MapReducepipelineprograms for testing using Apache Crunch.
  • DevelopedHadoopMapReduce jobs for unit testing using MRUnit.
  • Divided each data set in to corresponding categories by fallowing MapReduceBinning design pattern.
  • Implemented FilterMappers to eliminate un-necessary records.
  • Experience in using Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
  • Created partitions, bucketing across state in Hive to handle structured data.
  • Implemented Dash boards that handleHiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
  • Implemented business logic based on state in Hive using Generic UDF's. Used HBase-Hive integration.
  • Integrating bulk data into Cassandra file system using MapReduce programs.
  • Involved in creating data-models for customer data using Cassandra Query Language.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
  • Created production jobs using Oozie work flows that integrated different actions like MapReduce, Sqoop, and Hive.
  • Experience in managing and reviewing Hadoop Log files.
  • Experienced with monitoring Cluster using Cloudera manager.
  • Worked on building BI reports inTableauwith Spark using Shark and Spark SQL.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Experience with Core Distributed computing and Data Mining Library using Apache Spark.
  • Experienced in configuring maven builds that integrated dependencies check styles, test coverage's.
  • Responsible for analyzing multi-platform applications using python.
  • Developed MapReduce jobs in Python for data cleaning and data processing.
  • Designing Test Plans, Test Cases and performed System Testing.
  • Involved in daily SCRUM meetings to discuss the development/progress of
  • Sprints and was active in making scrum meetings more productive.
  • Experience in integrating RHadoop for categorization and statistical analysis to generate reports.

Environment: Big Data, Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, Crunch, Scala, Spark, Strom, kafka, Cassandra, Linux, Python, RHadoop, Oracle10g, Cloudera manager

Java Developer

Confidential

Responsibilities:

  • Involved in the project from requirements gathering and involved in various stages like Design, Testing till production.
  • Involved in designing Application based on MVC architecture.
  • Implemented Spring MVC framework which includes writing Controller classes for handling requests, processing form submissions and also performed validations using Commons validator.
  • Have Knowledge on spring batch which provides Functions like processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.
  • Implemented various design patterns in the project such as Business Delegate, Data Transfer Object, Service Locator, Data Access Object and Singleton.
  • Developed web service for web store components using JAXB and involved in generating stub and JAXB data model class based on annotation.
  • Developed components of web services end to end, using different standards with clear understanding on SOAP using various message patterns.
  • Developed XML configuration and data description using Hibernate. Hibernate Transaction Manager is used to maintain the transaction persistence.
  • Used DAO pattern to retrieve the data from database.
  • Designed and develop web-based application using HTML5, CSS, JavaScript (jQuery), AJAX, JSP framework.
  • Used Maven Deployment Descriptor Setting up build environment by writing Maven build.xml, taking build, configuring and deploying of the application in all the servers.
  • Extensively worked on JavaScript (jQuery) for client side validation and various GUI elements.
  • JQuery library has been used for creation of powerful dynamic WebPages and web applications by using its advanced and cross browser functionality.
  • Installation, Configuration & administration of Web Logic environment, including deployment of Servlets.
  • Designed, developed Middleware Components using Web Logic Application Server, persistence registration object, request entry handling (controller) object, concurrency object, transaction object.
  • Implementing all the Business logic in the middle-tier usingJavaclasses,Javabeans.

Environment: JDK1.6,J2EE, Eclipse, Servlets, JSP, spring, HTML, JavaScript Prototypes, XML, Jquery, HTML, AJAX Oracle, WebLogic Application, Maven, JDBC, Hibernate.

Java Developer

Confidential

Responsibilities:

  • Involved in the analysis, design, and development and testing phases of Software Development Lifecycle (SDLC) using agile development methodology.
  • Involved in business requirement gathering and technical specifications.
  • Implemented J2EE standards, MVC2 architecture using Struts Framework.
  • Implementing Servlets, JSP and Ajax to design the user interface.
  • Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface.
  • Presentation components in JSP pages are built using ICE faces tag libraries.
  • ICE Faces libraries are used in all presentation pages like Search/Inquiry and data collection pages.
  • Used JBoss for EJB and JTA, for caching and clustering purpose.
  • Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests.
  • GUI was developed using JSP, AJAX and JavaScript, spring framework.
  • Involved in the Development of Spring Framework Controllers.
  • Worked with Flied level engineers and teams to make the product more user-friendly.
  • Wrote Web Services using SOAP for sending and getting data from the external interface.
  • Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML.
  • Involved in writing the ANT scripts to build and deploy the application.
  • Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.

Environment: JAVA multithreading, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, WebSphere, JBOSS, HTML5, DHTML, CSS, XML, Log4j, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate.

We'd love your feedback!