Hadoop Developer Resume
Orlando, FL
SUMMARY
- Over 8 years of IT experience which includes close to 5 years of work experience in Big Data, Hadoop ecosystem related technologies.
- Experienced in Agile SCRUM, RUP (Rational Unified Process) and TDD (Test Driven Development) software development methodologies.
- Excellent understanding/knowledge of Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, HBase, Oozie, ZooKeeper, Flume and Sqoop based Big Data Platforms.
- Expertise in design and implementation of Big Data solutions in Banking, Insurance, Telecommunication, Retail and E - commerce domains.
- Good knowledge on Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Implemented Data Quality, Price Gap Rules in ETL Tool Talend. Extensive past experience working in Informatica Power Center, and Designer.
- Extensive experience with OLTP/OLAP System and E-R modeling, developing Database Schemas like STAR schema and Snowflake schema used in relational, dimensional and multidimensional modeling.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Good exposure to performance tuning hive queries, mapreduce jobs, spark jobs.
- Comprehensive experience in building Web-based applications using J2EE Frame works like Spring, Hibernate, EJB, Struts and JMS.
- Excellent ability to use analytical tools to mine data and evaluate the underlying patterns.
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Managing and Reviewing data backups and log files.
- Good working Experience in client-side development with HTML, XHTML, CSS, JavaScript, JQuery, JSON and AJAX.
- Good understanding of application servers WebLogic, WebSphere and XML methodologies (XML, XSL, XSD) including Web Services like SOAP and REST.
- Hands on experience in NOSQL databases like HBase, Cassandra, and MongoDB.
- Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
- Hands on experience in developing MapReduce programs using Apache Hadoop for analyzing the Big Data.
- Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitioners and Buckets.
- Experience in writing Custom Counters for analysing the data and testing using MRUnit framework.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml and Avro.
- Expertise in composing MapReduce Pipelines with many user-defined functions using Apache Crunch.
- Expertise in writing ad-hoc MapReduce programs using Pig Scripts.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources.
- Expertise in Hive Query Language (HiveQL), Hive Security and debugging Hive issues.
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python/Java into Pig Latin and HQL (Hive QL).
- Analyzed the data by performing Hive queries and used HIVE UDFs for complex querying.
- Expert database engineer, NoSQL and relational data modeling.
- Expertise in HBase Cluster Setup, Configurations, HBase Implementation and HBase Client API.
- Worked on importing data into HBase using HBase Shell and HBase Client API.
- Expertise in several J2EE technologies like JDBC, Servlets, JSP,Struts, Spring, Hibernate, JPA, JSF, EJB, JMS, JAX-WS, SOAP, JQuery, AJAX, XML, JSON, HTML5/HTML, XHTML, Maven, and Ant.
- Expert knowledge over J2EE Design Patterns like MVC Architecture, Front Controller, Session Facade, Business Delegate and Data Access Object for building J2EE Applications.
- Extensive experience in developing Internet and Intranet related applications using J2EE, Servlets, JSP, Jboss, WebLogic, Tomcat, and Struts Frame Work.
- Extensive experience with database DB2 (Database Design, and SQL Queries).
- Good experience in SQL, PL/SQL, Perl Scripting, Shell Scripting, Partitioning, Data modeling, OLAP, Logical and Physical Database Design, Backup and Recovery procedures.
- Experienced with build tool Maven, Ant and continuous integrations like Jenkins.
- Experience in Administering, Installation, Configuration, Troubleshooting, Security, Backup, Performance Monitoring and Fine-tuning of Linux Red Hat.
- Developed Unit test cases using JUnit testing framework.
TECHNICAL SKILLS
Big Data Technology: Hadoop, Teradata, Map Reduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Oozie, Storm, Kafka and Flume
Spark Streaming Technologies: Spark Streaming, Storm
Java/J2EE Technology: JSP, JSF, Servlets, EJB, JDBC, Struts, Spring, Spring MVC, Spring Portlet, Spring Web Flow, Hibernate, iBATIS, JMS, MQ, JCA, JNDI, Java Beans, JAX-RPC, JAX-WS, RMI, RMI-IIOP, EAD4J, Axis, Castor, SOAP, WSDL, UDDI, JiBX, JAXB, DOM, SAX, MyFaces(Tomahawk), Facelets, JPA, Portal, Portlet, JSR 168/286, LifeRay, WebLogic Portal, LDAP, JUnit.NET
Hadoop Distribution: Cloudera, Hortonworks, IBM Big Insights
Cloud Computing Service: AWS (Amazon Web Services)
Scripting Languages: Python, Bash, Java Scripting, HTML5, CSS3
Programming Languages: Java (1.4/5/6), C/C++, Swing, SQL, HTML, CSS, i18n, l10n, DHTML, XML, XSD, XHTML, XSL, XSLT, XPath, XQuery, SQL, PL/SQL, UML, JavaScript, AJAX(DWR), jQuery, Dojo, ExtJS, Shell Scripts, Perl
Development Framework/IDE: RAD 8.x/7.x/6.0, IBM WebSphere Integration Developer 6.1, WSAD 5.x, Eclipse Galileo/Europa/3.x/2.x, MyEclipse 3.x/2.x, NetBeans 7.x/6.x, IntelliJ 7.x, Workshop 8.1/6.1, Adobe Photoshop, Adobe Dreamweaver, Adobe Flash, Ant, Maven, Rational Rose, RSA, MS Visio, OpenMake Meister
Web/Application Servers: WebSphere Application Server 8.x/ 7.0/6.1/5.1/5.0 , WebSphere Portal Server 7.0/6.1, WebSphere Process Server 6.1, WebLogic Application Server 8.1/6.1, JBoss 5.x/3.x, Apache 2.x, Tomcat 7.x/6.x/5.x/4.x, MS IIS, IBM HTTP Server
Databases: NoSQL, Oracle 11g/10g/9i/8i, DB2 9.x/8.x, MS SQL Server 2008/2005/2000 , MySQL
NoSQL: HBase, Cassandra, MongoDB
ETL Tools: Talend, Informatica
Reporting/Analysis Tools: Tableau, SAS
Operating Systems: Windows XP, 2K, MS-DOS, Linux (Red Hat), Unix (Solaris), HP UX, IBM AIX
Version Control: CVS, SourceSafe, ClearCase, Subversion
Monitoring Tools: Embarcadero J Optimizer 2009, TPTP, IBM Heap Analyzer, Wily Introscope, JMeter
Other: JBoss Drools 4.x, REST, IBM Lotus WCM, MS ISA,CA SiteMinder, BMC WAM, Mingle
PROFESSIONAL EXPERIENCE
Confidential - Orlando, FL
Hadoop Developer
Responsibilities:
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Built wrapper shell scripts to hold this Oozie workflow.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in creating Hadoop streaming jobs using Python.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created many Java UDF and UDAFs in hive for functions that were not preexisting in Hive like the rank, Csum, etc.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Developed POC for Apache Kafka.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Gained knowledge on building Apache Spark applications using Scala.
- Do various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing mapside joins etc.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Created And Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, using Talend Tool.
- Involved in development of Talend components to validate the data quality across different data sources.
- Involved in analysis of business validation rules and finding options for the implementation of the rules in Talend.
- Automated and Scheduling the Rules on Weekly, Monthly Basis in TAC (Talend Administration Centre).
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Familiarity with NoSQL databases including HBase, MongoDB.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
Environment: Hadoop, MapReduce, YARN, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, Talend, CDH3, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux, Spark, Kafka, Amazon web services.
Confidential - Kenilworth, NJ
Hadoop Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for datacleaning and preprocessing.
- Importing and exporting data into HDFS from Oracle 10.2 database and vice versa using SQOOP.
- Experienced in defining and coordination of job flows.
- Gained experience in reviewing and managing Hadoop log files.
- Extracted files from NoSQL database (MongoDB), HBase through sqoop and placed in HDFS for processing.
- Involved in Writing Data Refinement Pig Scripts and Hive Queries.
- Good knowledge in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Coordinated cluster services using ZooKeeper.
- Used XML Technologies like DOM for transferring data.
- Object relational mapping and Persistence mechanism is executed using Hibernate ORM.
- Developed custom validator in Struts and implemented server side validations using annotations.
- Created struts-config.xml file for the Action Servlet to extract the data from specified Action form so as to send it to specified instance of action class.
- Used Oracle for the database and WebLogic as the application server.
- Involved in coding for DAO Objects using JDBC (using DAO pattern).
- Used Flume to transport logs to HDFS.
- Experienced in moving data from Hive tables into Cassandra for real time analytics on hive tables.
- Organize documents in more useable clusters using Mahout.
- Configured connection between HDFS and Tableau using Impala for Tableau developer team.
- Responsible to manage data coming from different sources.
- Got good experience with various NoSQL databases.
- Experienced with handling administration activations using Cloudera manager.
- Supported MapReduce programs those are running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in mapreduce way.
- Worked on Talend ETL tool, developed and scheduled jobs in Talend integration suite.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Worked on visualization tool tableau for visually analyzing the data.
Environment: Apache Hadoop, Java, JDK1.6, J2EE, JDBC, Servlets, JSP, Linux, XML, WebLogic, SOAP, WSDL, HBaseHive, Pig, Sqoop, ZooKeeper, NoSQL, HBase, R, MAHOUT Map-Reduce, Cloudera, HDFS, Flume, Impala, Tableau, Talend, MySQL, HTML5, CSS, MongoDB
Confidential, Pittsburgh, PA
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence purpose.
- Used default MapReduce Input and Output Formats.
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Experienced on loading and transforming of large sets of structured and semi structured data.
- Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
- Export filtered data into HBase for fast query.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Created data-models for customer data using the Cassandra Query Language.
- Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Hadoop (Cloudera), Hbase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, Java
Confidential - New York, NY
Java Developer/Oracle Developer
Responsibilities:
- Estimation, design, and development of various modules.
- Implemented MVC architecture.
- Responsible for developing use case, class diagrams and sequence diagrams for the modules using UML.
- Responsible for re-engineering Confidential legal eCommerce Java/J2EE/JEE based Portal applications.
- Designed, developed and tested Java/J2EE/JEE/Portal applications using Spring, Spring IoC, Spring MVC, Spring Portlet, Hibernate, and WebSphere Portal.
- Designed, developed and modified UI components that used JSP, JSF, JavaScript, jQuery, DWR (AJAX), CSS, HTML, XHTML, XML, and velocity.
- Created batch print component that converted MS Word documents to PDF and sent the merged document Stream to client side for printing using Aspose.Words for Java and iText.
- Configured Spring and Hibernate components.
- Designed and developed business and persistence layer components using Spring, Spring IoC and Hibernate.
- Wrote complex SQL queries to interact with backend Oracle 11g/10 databases.
- Created test cases and performed Unit and Integration testing using Spring Test API.
- Built, deployed and tested developed components on WebSphere Portal Server 6.1
- Worked on Agile software development environment.
- Involved in development of user interface modules using HTML, CSS, JSP.
- Designed the applications using MVC framework for easy maintainability.
- Involved in writing many Scriptlets in JSP where the requirement has to be met.
- Developed notification and customer classes.
- Involved in writing SQL queries.
- Used technologies like JDBC for accessing related data from database.
- Handled backend data by creating optimal stored procedures in Oracle database.
- Migrated manual work of employees done on excel sheet to automated system developed.
- Optimized project efficiency by reducing 4 hours/day spent in manual excel sheet maintenance.
- Developed Servlets as controllers to perform requisite functions.
- Worked with and utilized Core java, MySQL and HTML daily.
- Fixing/Troubleshooting bugs and issues with modules regularly.
- Design and development of Vendor portal application, to keep track of shipping information for the orders requested.
- Worked on enhancements of the File Processors for sending orders to drop shippers and updating orders from drop shippers for the orders shipment status Released/Scheduled based on requirements
Environment: HTML, CSS, AJAX, JQuery, Javascript, Flash, Core Java, J2EE, Struts 2.0, Servlets, JSP, JSTL, XML, MyEclipse 9.0. Jboss 4.0, Oracle
Confidential
Java Developer/(DW/BI) Developer
Responsibilities:
- Involved in Architecture/Designing the State Portal Application.
- Involved in Functional and Detailed Designs.
- Involved in Presentation Development using Struts Framework.
- Involved in the analysis, design, and development and testing phases of Software Development Lifecycle (SDLC) using Agile development methodology.
- Involved in business requirement gathering and technical specifications.
- Implemented J2EE standards, MVC2 architecture using Struts Framework.
- Implemented Servlets, JSP and Ajax to design the user interface.
- Presentation Tier is built using the Struts framework.
- Implemented and configured various Action classes for handling the client requests using Struts 2 framework.
- Used EJBs (Stateless Session beans) to implement the business logic, MDBs (JMS) for asynchronous communication internal and external to the system.
- All the Business logic in all the modules is written in core Java.
- Workflow (Order Flow) is built using JMS technology.
- Developed WebServices using SOAP for sending and getting data from the external interface.
- Used Source Integrity tool to build and deploy the application.
- Used Design patterns such as Business delegate, Service locator, Model View Controller, Session façade, DAO.
- Involved in implementing the JMS (Java messaging service) for asynchronous communication.
- Involved in using JMS Queues and JMS Topics for one-to-one and one-to-may communication in the application.
- Backend application layer is implemented using EJB (Enterprise Java Bean) in WebSphere Application Server environment.
- Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle.
- Interaction with Oracle database is implemented using Hibernate.
- Designed a for business intelligence module of the project.
- Developed stored procedures in PostgreSql to support analytical reports.
- Integrated business intelligence module in the existing Reporting Framework (Java Based).
- Used SAS Visual Analytics for report generation and SAS Data Integrator for transforming data from OLTP to OLAP environment.
Environment: J2EE, EJB, WebServices, XML, XSD, RUP, Microsoft Visio, Clear Case, Source Integrity, Oracle 10g, WebSphere 10.3, JMS, SOA, LDAP, RAD, LOG4j, Servlets, JSP, Unix, Struts 2.0, Hibernate, Informatica, SAS.
