Hadoop Developer Resume
Franklin Lakes, NJ
PROFESSIONAL SUMMARY:
- 7+ years of overall experience in Systems Administration and Enterprise Application Development in diverse industries which includes hands on experience in Big data ecosystem related technologies.
- Expertise in setting, configuring & monitoring of Hadoop cluster using Cloudera CDH3, CDH4, Apache tar balls & Hortonworks Ambari on Ubuntu, Redhat, Centos & Windows.
- Hands on experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, Hbase, Zookeeper, Sqoop, Oozie and Flume.
- Experience in managing and reviewing Hadoop Log files.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run
- Hadoop Map/Reduce and Pig jobs.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems.
- Experience with Oozie Scheduler in setting up workflow jobs with Map/Reduce and Pig jobs.
- Knowledge of architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Experience in managing Hadoop clusters and services using Cloudera Manager.
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Collected logs data from various sources and integrated in to HDFS using Flume.
- Assisted Deployment team in setting up Hadoop cluster and services.
- Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS amazon cloud.
- Experience in Talend Big Data Studio 6.0.
- Push data as delimited files into HDFS using Talend Big Data studio.
- Expertise in Commissioning and decommissioning the nodes in Hadoop Cluster using Cloudera Manager
- Experience on Oracle OBIEE.
- Setting up HDFS Quotas to enforce the fair share of computing resources.
- Experience in Rebalance an HDFS Cluster.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Expertise in benchmarking, performing backup and disaster recovery of Name Node metadata andimportant and sensitive data residing on cluster.
- Rack aware configuration for quick availability and processing of data.
- Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- Successfully loaded files to Hive and HDFS from Oracle, SQL Server, MySQL, and Teradata using Sqoop.
- Loaded streaming log data from various web servers into HDFS using Flume.
- Created Hive internal and external tables defined with appropriate static and dynamic partitions.
- Experience in Creating and managing HBase clusters dynamically using Slider and Start & Stop HBase clusters running on Slider.
- Strong Knowledge on Spark concepts like RDD Operations, Caching and Persistence.
- Experience in Upgrading Apache Ambary, CDH and HDP Cluster.
- Extensive knowledge in using job scheduling by Oozie and Centralized Service Zookeeper.
- Expertise in Collaborating across Multiple technology groups and getting things done.
- Worked on both traditional Waterfall model and Agile methodology, Sound knowledge of Data warehousing concepts.
- Expertise in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL also data transformation & file processing using Pig Latin Scripts.
- Worked on Oracle, Teradata and Vertica database systems with Good experience in UNIX Shell scripting.
- Experience in modeling with both OLTP/OLAP systems and Kimball and Inmon Data Warehousing environments.
- Experience in extracting data from both Relational systems and Flat Files.
- Analysis and development of mappings using the transformations in Informatica.
- Handsome experience in Linux admin activities.
- Excellent communication skills, team player, quick learner, organized, resilient and self - motivated.
TECHNICAL SKILLS:
Big Data Ecosystems Hadoop: Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume, Yarn
DB Languages: SQL, PL/SQL, Oracle
Programming Languages: Java, C
Frameworks: Spring, Hibernate, JSF, EJB, JMS.
Scripting Languages: JSP & Servlets, JavaScript, XML, HTML, Python
Web Services: SOAP, Restful
Databases: RDBMS, HBase, Cassandra
Tools: Eclipse, Net Beans.
Platforms: Windows, Linux, Unix
Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Franklin Lakes, NJ
Hadoop developer
Responsibilities:- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Involved in developing and testing Pig Latin Scripts.
- Involved in gathering the requirements, designing, development and testing
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Developing design documents considering all possible approaches and identifying best of them.
- Launching and Setup of HADOOP/ HBASE Cluster which includes configuring different components of HADOOP and HBASE Cluster.
- Hands on experience in loading data from UNIX file system to HDFS.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data from HBASE through Sqoop and placed in HDFS for further processing.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Involved in creating Hive tables, loading data and running hive queries in those data.
- Extensive Working knowledge of partitioned table, UDFs and performance tuning in Hive.
- Sqoop framework is developed in Unix and generic script is used to ingest data for various sources
- Connection details are fed through Metadata, so that a common script is maintained and Logs are Audited and stored in HDFS as Hive table.
- Sqoop job and parameters are fine tuned to improve performance
- Data is ingested as textfile in the Landing zone for Audit purpose with replication factor set to 1 and then Insert overwrite into Staging zone as external ORC table.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on MapReduce Joins in querying multiple semi-structured data as per analytic needs.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created many Java UDF and UDAFs in hive for functions that were not preexisting
- Hive like the rank, Csum, etc.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Developed POC for Apache Kafka.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Gained knowledge on building Apache Spark applications using Scala.
- Do various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Created and Implemented Business, validation and coverage, Price gap Rules in Talend on Hive, using Talend Tool.
- Involved in development of Talend components to validate the data quality across different data sources.
Environment: Amazon EC2, Apache Hadoop 1.0.1, MapReduce, HDFS, Hbase, Hive, Pig, Oozie, Flume, Java (jdk 1.6), Eclipse, Spark, SQL.
Confidential, Northbrook, IL
Hadoop Administrator
Responsibilities:- Hands on experience Installation, configuration, maintenance, monitoring, performance and tuning, andtroubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster andProduction.
- Job Tracker is used to assign MapReduce Tasks to Task Tracker in cluster of Nodes
- Valuable experience on cluster audit findings and tuning configuration parameters.
- Implemented Kerberos security in all environments.
- Defined file system layout and data set permissions.
- Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.
- Written complex HSQL’s to generate data required in the final reports and pass these HSQL’s to the Ruby programs to convert these HSQL’s to map Reduce programs
- Importing, exporting data into HDFS and HIVE using Sqoop
- Maintained Operators, Categories, Alerts, Notifications, Jobs and Schedules
- Demonstrate and understanding of concepts, best practices and functions to implement a Big Data solution in a corporate environment.
- Worked on pulling the data from oracle databases into the Hadoop cluster.
- Help design of scalable Big Data clusters and solutions.
- Manage and review data backups and log files and experience in deploying Java applications on cluster.
- Commissioning and Decommissioning Nodes from time to time.
- Work with Hadoop developers, designers in troubleshooting Map Reduce job failures and issues and helping to developers.
- Work with network and Linux system engineers to define optimum network configurations, server hardware and operating system.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- Production support responsibilities include cluster maintenance.
Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Java (J2EE), XML, Microsoft (Word &excel), Linux.
Confidential, Basking Ridge, NJ
Hadoop Developer
Responsibilities:- Responsible for building scalable distributed data solutions using Hadoop.
- Understanding business needs, analyzing functional specifications and map those to development.
- Involved in loading data from Mainframe DB2 into HDFS using Sqoop.
- Handled Delta processing or incremental updates using Hive.
- Responsible for daily ingestion of data from DATALAKE to CDB Hadoop tenant system.
- Developed PIG Latin scripts in transformations while extracting data from source system.
- To work on data issue related tickets and to provide the fix.
- To monitor and fix the production job failures.
- Review the team members design documents and coding.
- Documented the systems processes and procedures for future references including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Implemented data ingestion from multiple sources like IBM Mainframes, Oracle.
- Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
- Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
- Have thorough knowledge on spark architecture and how RDD's work internally.
- Have exposure to Spark SQL.
- Have experience in Scala programming language and used it extensively with Spark for data processing.
Environment: HDFS, Hive, Pig, Hbase,Unix Shell Script, Talend, Spark,Scala
Confidential, Overland Park, KS
Java Programmer
Responsibilities:- Prepared technical design, unit test cases, detailed time estimation, traceability matrix, impact analysis and code reviewed documents for each iteration task.
- Designed, implemented and tested different applications for Payments systems
- Used Web services, Web Processing Service (WPS), BPEL, REST extensively for Operations module.
- Used Log4J for writing into different logs files Application Log and Error Log.
- Used Spring Model View Controller (MVC) 2 architecture. Used JSPs in front-end, Spring framework in business layer and Hibernate in persistence layer.
- Involved in developing Spring IOC to communicate with the persistence layer.
- Involved in using Spring AOP framework for reusability logging purpose.
- Developed front-end content using JSP, JavaScript, JQuery, HTML, JHTML and JSTL.
- Written SQL Queries and stored procedures to interact with Oracle 11g.
- Implemented RESTful web services using Jersey API and JSON.
- Extensively used RAD with various plugins for implementing various modules.
- Developed Ant build scripts for deploying the project on WebSphere application server.
- Developed UNIX Shell scripts for automating project management tasks.
- Configured Data Sources for the Oracle database system using IBM WebSphere.
Environment: Java, JavaScript, JQuery, Servlets, JSF, Spring 3.0, JSTL, Hibernate 3.1, Web Services, PCI, NodeJs, WSDL, UML, HTML, CSS, IBM WebSphere Application Server, Log4J, RAD, JUnit, UNIX, Oracle 10g.
Confidential
Java Developer
Responsibilities:- Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for the Presentation Tier.
- Developed JSP pages and client side validation by java script tags.
- Involved in team meetings with corporate webmaster UI team and end user client for understanding needs.
- Developed web Components using JSP, Servlets and Server side components using EJB under J2EE.
- Developed an own realm for Apache Tomcat Server for authenticating the users.
- Developed front end controller in Servlet to handle all the request.
- Developed Controller Servlet to handle the database access.
- Used Spring Framework as middle tier application framework, persistence strategy using spring support for Hibernate for integrating with database.
- Worked on Server-side pagination for processing high volume of data to the UI.
- Validation is performed using Struts validator.
- Extensively worked on J2EE Technologies to develop the project, Followed Spring MVC Framework for the development of the project.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle10g Database.
- Created a Front-end application using JSPs and Spring MVC for registering a new entry and configured it to connect to database using Hibernate.
- Configured the Hibernate configuration files to persist the data to the Oracle 10g Database.
- Used Hibernate as ORM tool for accessing database.
- Designed, Developed and analyzed the front-end and back-end using JSP, Servlets and Spring.
- Executed JavaScript frameworks for real-time applications settling on using AngularJS for the frontend.
- Implemented REST messages for communication between web service client and service provider.
- Designed and developed very complex and large web pages usingAngularJS, HTML 5, and CSS.
- Designed and developed Restful service interface using spring boot to the underlying customer event API.
- Designed and developed front-end using Servlet, JSP, JSF, DHTML, Java Script and AJAX.
- Developed various J2EE components like Servlet, JSP.
- Developed custom tags to display the data in JSP pages.
- Tested/De-bugged on browser using Firebug
- Coded JavaScript for page functionality and Pop up Screens and used HTML to make dropdown menus on web pages and display part of a web page upon user request.
- Deployed the application in Production environment
- Supporting the application at client side
Environment: Java 1.5/J2EE, core java, JSF, Hibernate, JDBC, Eclipse, Spring, JSP, XML, XSL, JSTL, JavaScript, JQuery, MVC, Servlets, AJAX, HTML, CSS, UML, POJO, log4j, Junit, Soap, JMS, ANT, SVN, DAO, DTO, Apache Tomcat, Oracle SQL.
Confidential
Java Developer
Responsibilities:- Develop new features to the application by understanding the requirements from the business.
- Used spring framework integrated with Security service to enable security for the application and services.
- Implemented the Spring MVC architecture.
- Configured Bean properties using setter injection.
- Worked extensively with JSP's and Servlets to accommodate all presentation customizations on the front end.
- Developed JSP's for the presentation layer.
- Created DML statements to insert/update the data in database and created DDL statements to create/drop tables to/from oracle database.
- Configured Hibernate for storing objects in the database, retrieving objects, querying objects and persisting relationships between objects.
- Configured the hibernate files to connect to the database.
- Wrote DAO design pattern to retrieve & store data the data form Web Services and populate the user account information to admin for modifying or creating the alternate/secondary ids for the primary user id account.
- Used JUnit for unit testing of the application.
- Deployed EAR files using the build tools in the WebLogic application server.
Environment: Jdk1.6, JSP2.0, Struts1.1, HTML, XML, WIN CVS, Tomcat and WebLogic.