Hadoop Developer/admin Resume
Piscataway, NJ
SUMMARY
- About 8 - years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce programming paradigm.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- Hands-on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume& knowledge of Mapper/Reduce/HDFS Framework.
- Set up standards and processes for Hadoop based application design and implementation.
- Worked on NoSQL databases including H base, Cassandra and Mongo DB.
- Good experience in analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Expertise in database performance tuning & data modeling.
- Developed automated scripts using Unix Shell for performing RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Expertise in working with different databases likes Oracle, MS-SQL Server, Postures, and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
- Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
- Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions.
- Expertise in development support activities including installation, configuration and successful deployment of changes across all environments.
- Experience in Creating a design and framework for the generic Abilities Code to handle the multiple and ever expanding list of data files coming from source
- Familiarity and experience with data warehousing and ETL tools.
- Good working Knowledge in OOA&OOD using UML and designing use cases.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
- Experience in production support and application support by fixing bugs.
- Used HP Quality Center for logging test cases and defects.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
TECHNICAL SKILLS
Big Data: Hadoop, Hive, Sqoop, Pig, Puppet, Ambary, HBase, Monod, Cassandra, Power Pivot, Defamer, Pentaho, spark, Flume, SolrCloud
Operating Systems: Windows, Ubuntu, Red Hat Linux, Linux, UNIX
Project Management: Plan View, MS-Project
Programming or Scripting Languages: Java, SQL, Unix Shell Scripting, C, Python
Modeling Tools: UML, Rational Rose
IDE/GUI: Eclipse
Framework: Struts, Hibernate
Database: MS-SQL, Oracle, MS-Access
Middleware: Web Sphere, TIBCO
ETL: Informatica, Pentaho, Netezza
Business Intelligence: OBIEE, Business Objects
Testing: Quality Center, Win Runner, Load Runner, QTP
PROFESSIONAL EXPERIENCE
Confidential, Piscataway, NJ
Hadoop Developer/Admin
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Developed multiple MapReduce jobs in java for data cleaning.
- Developed Hive UDF to parse the staged raw data to get the Hit Times of the claims from a specific branch for a particular insurance type code.
- Schedule these jobs with workflow engine like Oozie. Actions can be performed both sequentially and parallel using Oozie.
- Built wrapper shell scripts to hold thisOozie workflow.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in creating Hadoop streaming jobs using Python.
- Used Ganglia to Monitor and Nagios to send alerts about the cluster around the clock
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on MapReduce Joins in querying multiple semi-structured data as per analytic needs.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created many Java UDF and UDAFs in hive for functions that were not preexisting in Hive like the rank, Scum, etc.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Developed POC for Apache Kafka.
- Do various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc..
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Familiarity with NoSQL databases including HBase, Monod.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, Monod, Cassandra, Oracle, NoSQL and Unix/Linux, Kafka, Amazon web services.
Confidential, Englewood, CO
Hadoop Developer / Admin
Responsibilities:
- Experience with Cloudera and Horton works distribution of Hadoop
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Involved in analyzing system failures, identifying root causes and recommended course of actions.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Managing and scheduling Jobs on a Hadoop cluster.
- Deployed Hadoop Cluster in the following modes.
- Developed multiple MapReduce jobs in java for data cleaning and accessing.
- Managed Hadoop clusters: monitor, maintain, setup.
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows.
- Implemented NameNode backup using NFS. This was done for High availability.
- Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts to automate document indexing to SolrCloud in production.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Used CGI scripts to access Dbccdb database.
- Analyzed the web log data using the Havel to extract number of unique visitors per day, page-views, visit duration, most purchased product on website.
- Converting the Oracle table components to Teradata Table Components in Abilities Graphs
- Used Ambary to manage, provision and monitor Hadoop cluster.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
Environment: Hadoop 1x, Hive, Pig, HBASE, Sqoop, Flume, Zookeeper, Pig, HDFS, Ambary, Oracle, CDH3.
Confidential - Cambridge, MA
Java/Hadoop Developer
Responsibilities:
- Exported data from DB2 to HDFS using Sqoop.
- Developed MapReduce jobs using Java API.
- Installed and configured Pig and also wrote Pig Latin scripts.
- Wrote MapReduce jobs using Pig Latin.
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Worked on Cluster coordination services through Zookeeper.
- Worked on loading log data directly into HDFS using Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Responsible to manage data coming from different sources.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Implemented JMS for asynchronous auditing purposes.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Develop monitoring and performance metrics for Hadoop clusters.
- Experience in Document designs and procedures for building and managing Hadoop clusters.
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
- Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar.
- Experience in Automate deployment, management and self-serve troubleshooting applications.
- Define and evolve existing architecture to scale with growth data volume, users and usage.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and also written Hive UDFs.
- Experience in managing the CVS and migrating into Subversion.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
Environment: Hadoop, HDFS, Hive, Flume, Sqoop, HBase, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Java (JDK 1.6)
Confidential, Chicago, IL
Java Developer
Responsibilities:
- Gathered user requirements followed by analysis and design. Evaluated various technologies for the client.
- Developed HTML and JSP to present Client side GUI.
- Involved in development of JavaScript code for client side Validations.
- Designed the HTML based web pages for displaying the reports.
- Developed the HTML based web pages for displaying the reports.
- Developed java classes and JSP files.
- Extensively used JSF framework.
- Extensively used XML documents with XSLT and CSS to translate the content into HTML to present to GUI.
- Developed dynamic content of presentation layer using JSP.
- Develop user-defined tags using XML.
- Developed Java Mail for automatic emailing and JNDI to interact with the knowledge server.
- Used Struts Framework to implement J2EE design patterns (MVC).
- Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
- Developed Enterprise java Beans like Entity Beans, session Beans (both Stateless and State full Session beans) and Message Driven Beans.
Environment: Java, J2EE, EJB 2.1, JSP 2.0, Servlets 2.4, JNDI 1.2, Java Mail 1.2, JDBC 3.0, Struts, HTML, XML, CORBA, XSLT, Java Script, Eclipse3.2, Oracle10g, Weblogic8.1, Windows 2003.
Confidential
Java Developer
Responsibilities:
- Created the Database, User, Environment, Activity, and Class diagram for the project (UML).
- Implement the Database using Oracle database engine.
- Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven.
- The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces.
- Created an entity object (business rules and policy, validation logic, default value logic, security)
- Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
- Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
- Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
- Create Reusable Component (ADF Library and ADF Task Flow)
- Experience using Version controls such as CVS, PVCS, and Rational Clear Case.
- Creating Modules Using Task Flow with Bounded and Unbounded
- Generating WSDL (Web Services) And Create Work Flow Using BPEL
- Handel the AJAX functions (partial trigger, partial Submit, auto Submit)
- Created the Skin for the layout.
Environment: Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J) 11g, web services Using Oracle SOA (Bell), Oracle WebLogic.