Hadoop Developer Resume
NY
SUMMARY
- Around 6+ years of IT experience which includes hands on experience in Big Data/Hadoop development and good object oriented programming skills.
- Good noledge of Hadoop Development and various components such as HDFS, Job Tracker, Task Tracker, Data Node, Name Node and Map - Reduce concepts.
- Experience in all teh phases of Data warehouse life cycle involving Requirement Analysis, Design, Coding, Testing, and Deployment.
- Experience in developing MapReduce Programs using Apache Hadoop for analyzing teh big data as per teh requirement.
- Experienced on major Hadoop ecosystem’s projects such as PIG, HIVE and HBASE.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good noledge in using job scheduling and monitoring tools like Oozie and ZooKeeper.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
- Experience in installation, configuration, supporting and managing Cloudera Hadoop platform along with CDH3, CDH4 clusters.
- Experience in tuning and troubleshooting performance issues in Hadoop cluster.
- Monitored multiple Hadoop clusters environments using Nagios and Ganglia.
- Experience with Ambari Hortonworks HDP 2.1 and 2.2 distributions
- Experience in importing and exporting data using Sqoop from HDFS file system to Relational Database Systems and vice-versa.
- Experience in providing security for Hadoop Cluster with Kerberos.
- Experience in extracting teh data from RDBMS into HDFS Sqoop.
- Experience in collecting teh logs from log collector into HDFS using up Flume
- Good understanding of NoSQL databases such as Cassandra, HBase and MongoDB.
- Experience in analyzing data in HDFS through MapReduce, Hive and Pig.
- Imported and exported data using Sqoop from HDFS to RDBMS.
- Experienced in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.
- Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper.
- Understanding of Hadoop, mapreduce paradigm, Spark.
- Extending Hive and Pig core functionality by writing customUDFs.
- Knowledge in Full Life Cycle development of Data Warehousing.
- Good understanding of Data Mining and Machine Learning techniques.
- Experience in creating custom Lucene/Solr Query components.
- Migration of Informatica Mappings/Sessions/Workflows from Dev, QA to Prod environments.
- Experience creating and upgrading ETL and Relational Database Frameworks.
- Worked on data warehouse product Amazon Redshirt which is a part of teh AWS (Amazon Web Services).
- Experience in developing Pig scripts and Hive Query Language.
- Experience in managing and reviewing Hadoop Log files.
- Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources.
- Evaluation of ETL tools and recommends teh most suitable solutions based on business needs.
- Have worked on data visualization using Tableau.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Created UNIX shell scripts to run teh Informatica workflows and controlling teh ETL flow.
- Strong with relational database design concepts.
- Well experienced and posses strong noledge in Unix Shell Scripting
- Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
- Detailed understanding of Software Development Life Cycle (SDLC) and noledge of project management methodologies including Agile.
- Strong Experience in working with Databases like Oracle DB2, SQL Server 2008 and MySQLand proficiency in writing complex SQL queries.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, Mapreduce, HBase, Pig, Hive, Sqoop, Flume, MongoDB, Oozie, Zookeeper, Cassandra spark, Storm, & Kafka
Java Technologies: Java, J2EE, JSTL, JDBC 3.0/2.1, JSP 1.2/1.1, Java Servlets, JMS, JUnit, Log4j
IDE Development Tools: Eclipse 3.5, Net Beans, Oracle JDeveloper 10.1.3
Search Engine: Elastic Search2.3,Solr.
Frameworks: MVC, Struts, Hibernate, Spring
Visualization Tools: Tableau, Talend
Programming languages: C, C++, Java, Python, Ruby, Linux shell scripts
Databases: Oracle 9i/10g, MySQL, DB2, MS-SQL Server
Web Servers: IIS 7.0, AWS, EC2, S3, RDS, ELB
Web Technologies: HTML, XML,AngularJS, JavaScript, AJAX, SOAP, WSDL
ETL Tools: Informatica, Pentaho, SSRS, SSIS, Cognos.
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential, NY
Responsibilities:
- Developed data pipeline using Kafka, storm, Pig and Java MapReduce to ingest customer behavioural data and financial histories into HDFS for analysis.
- Involved in writing MapReduce jobs.
- Involved in Sqoop, HDFS Put or Copy from Local to ingest data.
- Worked comprehensively with Apache Sqoop and developed Sqoop scripts in order to interface data from a MySQL database into theHadoopDistributed File System (HDFS). Utilize parallel processes of theHadoopframework to ensure resource efficiency.
- Successfully automated teh Flume workflow using teh Oozie scheduler.
- Reworked and structured data obtained to draw substantial conclusions using Pig and Hive languages in theHadoopframework for management assessment and use.
- Used Pig to do transformations, event joins, filter traffic and some pre-aggregations before storing teh data onto HDFS.
- Experience in managing and reviewing Hadoop Log files.
- Worked on Tableau for generating reports on HDFS data.
- Worked on NoSQL databases like Cassandra, CouchDB and MongoDB
- Created Managed tables and External tables in hive and loaded data from HDFS.
- Optimized teh Hive tables using Optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
- Created partitioned tables and loaded data using both static and dynamic partition method.
- Developed a pipeline using Kafka, storm and Java API to maintain a flow of message with different multiple producers and multiple subscribers.
- Created a Pro - Sub message queue for MSC direct.
- Configured WebHCAT to allow presentation dashboard to query Hive-indexed data.
- Coordinated with teh BI team to visualize teh transformed data into a dashboard using Tableau.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and executing Hive queries and Pig Scripts.
- Implemented new search techniques to pull teh records from teh elastic by MSC Associates.
- Worked on data flow with different file format to inject into Elastic Search for quick search techniques.
- Created a new Elastic Search environment parallel to Endeca for new work flow.
- Introduced a new Search Engine called Elastic Search into Confidential .
- Worked on huge .csv format files to convert it into .txt files by Java.
- Worked on huge excel files to convert it into .txt files by Java.
- Did teh Proof of concept (POC) on Elastic Search with Java API to set a work flow between different clients and Elastic search.
Environment: HDFS, Kafka, Pig, Hive, Storm, MapReduce, Cassandra, MongoDB,Sqoop,CouchDB, Oozie, Elastic Search, Kibana,Big Data, Java APIs, Java collection, Python, SQL, NoSQL, HBase.
Hadoop Developer
Confidential, Jersey city, NJ
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and MapReduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Real time streaming teh data using Spark with Kafka.
- Configured Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Scale.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS using Sqoop and Kafka.
- Developed Hadoop streaming Map/Reduce works using Python.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Expertise with NoSQL databases like Hbase, Cassandra, DynamoDB (AWS) and MongoDB
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in teh environment.
- Expertise in AWS data migration between different database platforms like SQL Server to Amazon Aurora using RDS tool.
- Supported MapReduce Programs those are running on teh cluster.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple pig jobs.
- Automated all teh jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Involved in defining job flows.
- Experienced in monitoring teh logs.
- Experienced in running Hadoop streaming jobs to process terabytes of data in xml format.
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Involved in using HCATALOG to access Hive table metadata from MapReduce or Pig code.
- Developed Enterprise Lucene/Solrbased solutions to include custom type/object modeling and implementation into teh Lucene/Solranalysis (Tokenizers/Filters) pipline.
- Created customSolrQuery components to enable optimum search matching.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract teh data from weblogs and store in HDFS.
- Developed Hadoop streaming Map/Reduce works using Python.
- Administrating Tableau Server backing up teh reports and providing privileges to users.
- Represented teh retrieved results through tableau.
- Used Eclipse and ant to build teh application.
- Used NoSQL database with Cassandra and Monod.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Spark, AWS EC2, S3, RDS, Kafka, Solr, LINUX, Cloudera, Big Data, Java APIs, Java collection, Python, SQL, NoSQL, Cassandra,Tableau, HBase.
Hadoop Developer
Confidential
Responsibilities:
- Involved different phases in big data projects like data acquiring, data processing and data serving using dash boards.
- Import/export data from Oracle data base to/from HDFS using Sqoop, Hue and JDBC.
- Gathered data from different sources like Internet, sensors, user behavior, and moved to HDFS using Optimized join baseinMapReduce programs.
- Implemented Custom Input formats dat handles input files received from java applications to process in MapReduce.
- Implemented joins and data aggregation using Apache Crunch.
- Writing MapReducepipelineprograms for testing using Apache Crunch.
- DevelopedHadoopMapReduce jobs for unit testing using MRUnit.
- Divided each data set in to corresponding categories by fallowing MapReduceBinning design pattern.
- Implemented FilterMappers to eliminate un-necessary records.
- Experience in using Pig as an ETL tool for event joins, filters, transformations and pre- aggregations.
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards dat handleHiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Implemented business logic based on state in Hive using Generic UDF's. Used HBase-Hive integration.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Involved in creating data-models for customer data using Cassandra Query Language.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Created production jobs using Oozie work flows dat integrated different actions like MapReduce, Sqoop, and Hive.
- Experience in managing and reviewing Hadoop Log files.
- Experienced with monitoring Cluster using Cloudera manager.
- Worked on building BI reports inTableauwith Spark using Shark and Spark SQL.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Implemented data ingestion and handling clusters in real time processing using Kafka.
- Experience with Core Distributed computing and Data Mining Library using Apache Spark.
- Experienced in configuring maven builds dat integrated dependencies check styles, test coverage's.
- Responsible for analyzing multi-platform applications using python.
- Developed MapReduce jobs in Python for data cleaning and data processing.
- Designing Test Plans, Test Cases and performed System Testing.
- Involved in daily SCRUM meetings to discuss teh development/progress of
- Sprints and was active in making scrum meetings more productive.
- Experience in integrating RHadoop for categorization and statistical analysis to generate reports.
Environment: Big Data, Hadoop, MapReduce, Pig, Hive, Sqoop, Oozie, Crunch, Scala, Spark, Strom, kafka, Cassandra, Linux, Python, RHadoop, Oracle10g, Cloudera manager
Java Developer
Confidential
Responsibilities:
- Involved in teh project from requirements gathering and involved in various stages like Design, Testing till production.
- Involved in designing Application based on MVC architecture.
- Implemented Spring MVC framework which includes writing Controller classes for handling requests, processing form submissions and also performed validations using Commons validator.
- Have Knowledge on spring batch which provides Functions like processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management.
- Implemented various design patterns in teh project such as Business Delegate, Data Transfer Object, Service Locator, Data Access Object and Singleton.
- Developed web service for web store components using JAXB and involved in generating stub and JAXB data model class based on annotation.
- Developed components of web services end to end, using different standards with clear understanding on SOAP using various message patterns.
- Developed XML configuration and data description using Hibernate. Hibernate Transaction Manager is used to maintain teh transaction persistence.
- Used DAO pattern to retrieve teh data from database.
- Designed and develop web-based application using HTML5, CSS, JavaScript (jQuery), AJAX, JSP framework.
- Used Maven Deployment Descriptor Setting up build environment by writing Maven build.xml, taking build, configuring and deploying of teh application in all teh servers.
- Extensively worked on JavaScript (jQuery) for client side validation and various GUI elements.
- JQuery library has been used for creation of powerful dynamic WebPages and web applications by using its advanced and cross browser functionality.
- Installation, Configuration & administration of Web Logic environment, including deployment of Servlets.
- Designed, developed Middleware Components using Web Logic Application Server, persistence registration object, request entry handling (controller) object, concurrency object, transaction object.
- Implementing all teh Business logic in teh middle-tier usingJavaclasses,Javabeans.
Environment: JDK1.6,J2EE, Eclipse, Servlets, JSP, spring, HTML, JavaScript Prototypes, XML, Jquery, HTML, AJAX Oracle, WebLogic Application, Maven, JDBC, Hibernate.
Java Developer
Confidential
Responsibilities:
- Involved in teh analysis, design, and development and testing phases of Software Development Lifecycle (SDLC) using agile development methodology.
- Involved in business requirement gathering and technical specifications.
- Implemented J2EE standards, MVC2 architecture using Struts Framework.
- Implementing Servlets, JSP and Ajax to design teh user interface.
- Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to teh User Interface.
- Presentation components in JSP pages are built using ICE faces tag libraries.
- ICE Faces libraries are used in all presentation pages like Search/Inquiry and data collection pages.
- Used JBoss for EJB and JTA, for caching and clustering purpose.
- Used EJBs (Session beans) to implement teh business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests.
- GUI was developed using JSP, AJAX and JavaScript, spring framework.
- Involved in teh Development of Spring Framework Controllers.
- Worked with Flied level engineers and teams to make teh product more user-friendly.
- Wrote Web Services using SOAP for sending and getting data from teh external interface.
- Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML.
- Involved in writing teh ANT scripts to build and deploy teh application.
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
Environment: JAVA multithreading, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, WebSphere, JBOSS, HTML5, DHTML, CSS, XML, Log4j, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate.
