Hadoop Developer Resume
MN
PROFESSIONAL SUMMARY:
- Over 7+ years of experience with emphasis on Big Data Technologies, Development and Design of Java based enterprise applications.
- Excellent understanding and knowledge of Hadoop architecture and various components such as HDFS , Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Horton works , and Cloud era ( CDH4, CDH5 ) distributions and on Amazon web services ( AWS ).
- Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework .
- Set up standards and processes for Hadoop based application design and implementation.
- Worked on NoSQL databases including Hbase, Cassandra and MongoDB .
- Good experience in analysis using PIG and HIVE and understanding of SQOOP and Puppet .
- Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
- Experience on ETL development using Kafka, Flume, and Sqoop and Expertise in database performance tuning & data modelling .
- Developed automated scripts using Unix Shell for performing RUNSTATS , REORG , REBIND, COPY, LOAD, BACKUP, IMPORT, EXPORT and other related to database activities.
- Experienced in developing Map Reduce programs using Apache Hadoop for working with Big Data
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Expertise in working with different databases likes Oracle, MS-SQL Server, Postgress , and MS Access 2000 along with exposure to Hibernate for mapping an object-oriented domain model to a traditional relational database.
- Familiarity and experience with data warehousing and ETL tools .
- Good knowledge on Apache Spark and Scala.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
- Experience in production support and application support by fixing bugs.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
- Good knowledge on Teradata Manager, TDWM, PMON, DBQL, SQL assistant and BTEQ.
- Expertise in creating databases, users, tables, triggers, macros, views, stored procedures, functions, Packages, joins and hash indexes in Teradata database.
- Extensively worked with Teradata utilities like BTEQ, Fast Export, Fast Load, Multi Load to export and load data to/from different source systems including flat files.
- Hands on experience using query tools like TOAD, SQL Developer, PLSQL developer, Teradata SQL Assistant and Query man.
- Expertise in writing large/complex queries using SQL.
- Proficient in performance analysis, monitoring and SQL query tuning using EXPLAIN PLAN, Collect Statistics, Hints and SQL Trace both in Teradata as well as Oracle.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Storm, Spark, Kafka, Hbase, Scoop, Flume, Zookeeper, Oozie, Avro
Apache Hadoop Distributions: Cloud era (CDH3/CDH4/CDH5), Horton Works
Operating systems: Windows, Centos, Ubuntu, Red Hat Linux, Linux, UNIX
Scripting Languages: Java, SQL, Unix Shell Scripting, PL/SQL, Python
Databases/Technologies: DB Oracle, MySQL, Teradata. Oracle 8i/9i/10g, Microsoft NoSQL, SQL Server, Teradata
Java Technologies: JDBC, Servlets, JSP, Spring and Hibernate
IDE Tools: Eclipse, Net Beans
Application Servers: Tomcat, Web Logic, Web Sphere
ETL Tools: Informatica, Pentaho, SSRS, SSIS, Cognos
Frameworks: MVC, Struts, Hibernate, Spring
Development Strategies: Agile, Lean Agile, Scrum, Water Fall and Test Driven Development
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, MN
Responsibilities:
- Installed Hadoop, Map Reduce, HDFS, AWS and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Developed Map Reduce programs to parse the raw data, populate staging tables an store the refined data in partitioned tables in the EDW
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW tables and historical metrics
- Responsible in working with Message broker system such as Kafka
- Worked on Storm Topology integrated with Kafka in order to achieve near to real time processing of various Tests involved in Manufacturing of Trucks.
- Worked on data warehouse product Amazon Redshirt which is a part of the AWS
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data
- Implemented and extracted the data from Hbase using SPARK
- Shared responsibility for administration of Hadoop, Hive and Pig and built wrapper shell scripts to hold this Oozie workflow.
- Involved in testing the AWS Red shift to connecting with SQL database for testing and storing data in POC
- Designed and developed data management system using MySQL
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in creating Hadoop streaming jobs using Python.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Familiarity with NoSQL databases including HBase, MongoDB .
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on Map Reduce Joins in querying multiple semi-structured data as per analytic needs.
- Worked on Performance optimization of Spot fire applications, Spot fire server configuration
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created many Java UDF and UDAFs in hive for functions that were not pre-existing in Hive like the rank, Csum, etc.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Developed POC for Apache Kafka .
- Gained knowledge on building Apache Spark applications using Scala.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Wrote shell scripts for rolling day-to-day processes and it is automated
- Used Tera data utilities fast load, multiload, tpump to load data.
- Wrote Teradata Macros and used various Teradata analytic functions .
- Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
Environment : Hadoop , Map Reduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloud era, Flume, HBase, Spot fire, Zookeeper, CDH3, MongoDB, AWS Red shift, Cassandra, Oracle, NoSQL and Unix/Linux, Spark, Kafka, Amazon web services .
Hadoop Consultant
Confidential, Morris Plains, NJ
Responsibilities:
- Installed and configured various components of Cloudera Hadoop ecosystem and maintained their integrity
- Implemented Fair scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Use Spark API over Horton works Hadoop YARN to perform analytics on data in Hive
- Worked on implementing custom Hive and Pig UDF's to transform large amounts of data.
- Installed and configured Cloudera Manager and Involved in building a multi node Hadoop cluster.
- Used Apache flume to ingest log data from multiple sources directly into Accumulo, file roll and HDFS
- Worked in AWS environment for development and deployment Custom Hadoop Applications
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard
- Extracted data from mainframes and feed to KAFKA and ingested to HBASE to perform Analytics
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2
- Worked on NoSQL databases including HBase and Elastic Search
- Developed multiple POCs using Scala and deployed on the Yarn cluster , compared the performance of Spark, with Cassandra and SQL
- Involved in extracting customer's Big data from various data sources into Hadoop HDFS. This included data from mainframes, databases and also log data from servers.
- Worked on Commissioning and Decommissioning Data Nodes and Task Trackers .
- Installed and configured Cassandra cluster and CQL on the cluster.
- Developed multiple Map Reduce jobs in Java for data cleaning and processing.
- Worked on implementing custom Hive and Pig UDF's to transform large amounts of data.
- Experience in analyzing data using hive queries, pig scripts and Map Reduce programs.
- Implemented Oozie workflow engine to manage inter-dependent Hadoop jobs and to automate Hadoop jobs such as Hive, Sqoop and system-specific jobs.
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data.
- Worked on creating Key Spaces and loading data on the Cassandra Cluster.
- Monitored the nodes, the streaming process between nodes during the start up of new nodes and clearing of keys which are no longer used using the node tool utility.
Environment : Linux, Java, Map Reduce, AWS Red shift, HDFS, Oracle, SQL server, Accumulo, Hive, Pig, Sqoop, Cloudera manager, Cassandra
Java/Hadoop developer
Confidential, Fremont, CA
Responsibilities:
- Exported data from DB2 to HDFS using Sqoop and Developed Map Reduce jobs using Java API.
- Installed and configured Pig and also wrote Pig Latin scripts.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Wrote Map Reduce jobs using Pig Latin and worked on Cluster coordination services through Zookeeper
- Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven. The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- Rich Faces
- Programmed ETL functions between Oracle and Amazon Red shift
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Optimized the configuration of Amazon Red shift clusters, data distribution, and data processing.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Implemented various Java/J2EE design patterns such as Singleton, Session Façade, Business Delegate, Value Object, Data Access Object
- Implementing various advanced join operations using Pig Latin.
- Involved in loading data from LINUX file system to HDFS.
- Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Used SOAPUI for testing the web service response
- Experience in defining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Document designs and procedures for building and managing Hadoop clusters .
- Strong Experience in troubleshooting the operating system, maintaining the cluster issues and also java related bugs.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar .
- Experience in Automate deployment, management and self-serve troubleshooting applications.
- Define and evolve existing architecture to scale with growth data volume, users and usage.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Installed and configured Hive and also written Hive UDFs.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more.
Environment : Hadoop, HDFS, Hive, Flume, Sqoop, HBase, ORACLE, PIG, Eclipse, MySQL and Ubuntu, Zookeeper, Red shift, Java (JDK 1.6)
Java Developer
Confidential
Responsibilities:
- Gathered user requirements followed by analysis and design. Evaluated various technologies for the client.
- Designed and developed the HTML based web pages for displaying the reports and developed Java classes and JSP files.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
- Extensively used JSF framework.
- Extensively used XML documents with XSLT and CSS to translate the content into HTML to present to GUI.
- Involved in development of JavaScript code for client side Validations.
- Involved in the implementation of business logic in struts Framework and Hibernate in the back-end.
- J2EE is used to develop the application based on MVC architecture
- Wrote complex SQL queries and stored procedures Developed Java Mail for automatic emailing and JNDI to interact with the knowledge server.
- Used Struts Framework to implement J2EE design patterns (MVC).
- Prepared the Installation, Customer guide and Configuration document which were delivered to the customer along with the product.
- Developed Enterprise java Beans like Entity Beans, session Beans (both Stateless and State full Session beans) and Message Driven Beans.
Environment : Java, J2EE, EJB 2.1, JSP 2.0, Servlets 2.4, JNDI 1.2, Java Mail 1.2, JDBC 3.0, Struts, HTML, XML, CORBA, XSLT, Java Script, Eclipse3.2, Oracle10g, Weblogic8.1, Windows 2003.
Java Developer
Confidential
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project
- Created the Database , User, Environment, Activity, and Class diagram for the project (UML).
- Implement the Database using Oracle database engine
- Designed and developed a fully functional generic n-tiered J2EE application platform the environment was Oracle technology driven.
- The entire infrastructure application was developed using Oracle JDeveloper in conjunction with Oracle ADF-BC and Oracle ADF- RichFaces.
- Created an entity object (business rules and policy, validation logic, default value logic, security)
- Experience on working in Agile development following SCRUM process, Sprint and daily stand-up meetings
- Created View objects, View Links, Association Objects, Application modules with data validation rules (Exposing Linked Views in an Application Module), LOV, dropdown, value defaulting, transaction management features.
- Web application development using J2EE: JSP, Servlets, JDBC, Java Beans, Struts, Ajax, JSF, JSTL, Custom Tags, EJB, JNDI, Hibernate, ANT, JUnit and Apache Log4J, Web Services, Message Queue (MQ).
- Designing GUI prototype using ADF 11G GUI component before finalizing it for development.
- Experience using Version controls such as CVS, PVCS, and Rational Clear Case.
Environment : Java core, Servlet, JSF, ADF Rich client UI Framework ADF-BC (BC4J), web services Using Oracle SOA (BPEl), Oracle Web Logic.