Hadoop Admin Resume
SUMMARY:
- 8 years of total experience as Software Engineer with IT Technologies and good working knowledge in Java and BIG Data Hadoop Ecosystems.
- Over 5 years of experience in Hadoop infrastructure which include Map reduce, Hive, Oozie, Sqoop, HBase, Pig, HDFS, Yarn, Spark. Impala, SAS interface configuration projects in direct client facing roles.
- Good knowledge on Data Structure, Algorithms, Object Oriented Design and Data Modelling
- Strong experience in Core Java programming using Collections, Generics, Exception handling, multithreading.
- Good knowledge on Data Warehousing, ETL development, Distributed Computing, and large scale data processing.
- Good knowledge on implementation and design of big data pipelines.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Experience in implementing ETL/ELT processes with MapReduce, PIG, Hive
- Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase & Hive Integration, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework.
- Set up standards and processes for Hadoop based application design and implementation.
- Worked on NoSQL databases including HBase, Cassandra and MongoDB.
- Experience on Horton works and Cloudera Hadoop environments.
- Good experience in analysis using Pig and Hive and understanding of SQOOP and Puppet.
- Good knowledge on Apache Knox and Apache Ranger.
- Good understanding on data structure and algorithms.
- Good knowledge on Informatica for ETL tool, and stored procedures to pull data from source systems/ files, cleanse, transform and load data into databases.
- Strong knowledge on creating and monitoring Hadoop cluster on VM, Hortonworks Data Platform 2.1 7 2.2, CDH3, CDH4 Cloudera Manager on Linux, Ubuntu OS.
- Involved in development of Informatica mappings and tuning for better performance.
- Good knowledge on JDBC/ODBC.
- Knowledge on MS SQL Server 2012/2008/2005 and Oracle 11g/10g/9i.
- Involved in best practices for Cassandra, migrating application to Cassandra database from the legacy platform for Choice, upgraded Cassandra from 2.0 to 2.2.5.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Hands on experience in coding MapReduce/Yarn Programs using Java, Scala and Python for analyzing Big Data.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Implemented AWS provides a variety of computing and networking services to meet the needs of applications
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Experience in creating databases, users, tables, triggers, macros, views, stored procedures, functions, Packages, joins and hash indexes in Teradata database.
- Good understating on Machine Learning, Data Mining and Underlining Algorithms.
- Good hands of experience on OLAP tools.
- Strong knowledge in Software Development Life Cycle (SDLC)/ IT Life Cycle (ITLC).
- Strong understanding in Agile and Waterfall SDLC methodologies.
- Involved in log file management where the logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 3 months.
- Good Knowledge in creating reports using Qlik View/ Qlik Scenes.
- Experienced in installing, configuring and administrating Hadoop Clusters.
TECHNICAL SKILLS:
Big Data: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Pig, Ambari, HBase, MongoDB, Cassandra, Spark, Flume, Impala, Kafka, Oozie, Zookeeper, Cloudera Manager, Ambani
Hadoop Distribution: Cloudera, HortonWork, AWS, MapR
Project Management: MS-Project
Programming & Scripting Languages: Java, SQL, PL-SQL, JavaScript, Scala, Unix Shell Scripting, C, Python, R programming
Reporting Tools: Qlik View, Qlik Sense, Tubule, SOAP UI
IDE/GUI: Eclipse, IntelliJ IDEA, Net Beans, Visual Studio.
Database: MS-SQL, Oracle Database, MS-Access, AWS, Teradata, Mongo DB, Cassandra
Other Tools: Wireshark, Cisco Packet Tracer
Operating Systems: Window10, Windows 8, Windows 7, Windows Server2008/2003, Mac OS, Ubuntu, Red Hat Linux, Linux, UNIX
PROFESSIONAL EXPERIENCE:
Confidential
Hadoop Admin
Responsibilities:
- Configuring hosts as edge nodes with the desired filesystems and adding them to the cluster.
- Involving in the weekly releases and maintaining the sync between different environments such as Production, Disaster Recovery and Pre-Production
- Deleting the users or adding the users as of the request in the Hue, DataRobot and Trifacta.
- Actively involved in the OS Patching activities, Cloudera Upgrade and other maintenance activities.
- Actively involved in the palling and implementation of the Rack Awareness in the different environments.
- Involved in migrating the MySQL database to Oracle database and PSQL database to Oracle database.
- Performed Requirement Analysis, Planning, Architecture Design and Installation of the Hadoop cluster
- Acted as a point of contact between the vendor and my team on different issue,
- Actively involved in the planning and implementation on the Load Balancer with a single GTM and multiple LTM’s
- Involved in writing an automation script for different applications and different purposes such as installing the applications.
- Involved in configuring the LDAP on different application for a secure login.
- Actively involved in the trouble shooting the users issue on 24/7 basis.
- Implemented strategy to upgrade entire cluster nodes OS from RHEL5 to RHEL6 and ensured cluster remains up and running
- Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data ( Data Encryption at Rest)
- Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups
- Worked on installing production cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration
Environment: Hadoop, HDFS, Kerberos, Sentry, YARN, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, MongoDB, Cassandra, HBase, Eclipse, Oracle, LDAP, DataRobot and Trifacta
Confidential, Richardson, Texas
Hadoop Admin
Responsibilities:
- Experience in supporting and managing Hadoop Clusters using Hortonworks distributions.
- Interacting with Hortonworks support and log the issues in Hortonworks portal and fixing them as per the recommendations
- Scheduled several time based Oozie workflow by developing Python scripts.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks
- Created instances in AWS as well as migrated data to AWS from data Center using snowball and AWS migration service
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's on Java
- Involved in extracting the data from various sources into Hadoop HDFS for processing
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop
- Creating and truncating HBase tables in hue and taking backup of submitter ID(s)
- Responsible for building scalable distributed data solutions using Hadoop
- Commissioned and Decommissioned nodes on Hortonworks Hadoop cluster on Red hat LINUX
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau
- Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software in AWS EC2 and created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
- Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Ambari.
- Strong capability to utilize Unix shell programming methods, able to diagnose and resolve complex configuration issues, ability to adapt Unix domain for Hadoop Tools.
- Maintains the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
Environment: Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Ambari, Hortonworks, Sqoop, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java, Eclipse, Oracle and Unix/Linux.
Confidential, Whitehouse Station, NJ
Hadoop Admin
Responsibilities:
- Worked on analyzing Hadoop cluster with live 65 nodes and different big data analytic tools including Pig, HBase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Created HBase tables to store variable data.
- Managing and reviewing Hadoop log files and debugging failed jobs.
- Implemented Kerberos Security Authentication protocol for production cluster.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with Infrastructure teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Backed up data on regular basis to a remote cluster using distcp.
- Responsible to manage data coming from different sources.
- Cluster coordination services through Zookeeper.
- Loaded the dataset into Hive for ETL Operation.
- Worked on logical/physical data model level using ER studio according to requirements.
- Automated all the jobs or pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Implemented Fair scheduler to allocate fair amount of resources to small jobs.
- Worked with the BI team by partitioning and querying the data in Hive.
- Involved in analyzing large amounts of data sets to determine optimal way to aggregate and report on it.
Environment: HADOOP HDFS, MapReduce, HortonWorks, Ambari, Yarn, Hive, Pig, Oozie, Sqoop, ER studio, HBase.
Confidential, Secaucus, NJ
Hadoop Admin
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in Installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager.
- Capable to handle Hadoop cluster installations in various environments such as Unix, Linux and Windows, able to implement and execute Pig Latin scripts in Grunt Shell.
- Strong capability to utilize Unix shell programming methods, able to diagnose and resolve complex configuration issues, ability to adapt Unix domain for Hadoop Tools.
- Experienced with file manipulation, advanced research to resolve various problems and correct integrity for critical Big Data issues with NoSQL Hadoop HDFS Database.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Translated high level requirements into ETL process.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in Installing the Oozie workflow engine in order to run multiple Hive and Pig jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Oracle and Unix/Linux.
Confidential
Java Developer
Responsibilities:
- Involved in the design and prepared activity diagrams, sequence diagrams, class diagrams and use case diagrams for various use cases using Microsoft Visio.
- Followed agile methodology and Test driven approach in building the system.
- Application was based on the Model View Controller (MVC-2) architecture Used Spring MVC framework at the Web tier level to isolate each layer of the application so that complexity of integration will be reduced and maintenance will be very easy.
- Developed user interface using JSP, JSTL, HTML, CSS and JavaScript to simplify the complexities of the application.
- Used the Spring validation to validate form data.
- Interacted with database Microsoft SQL Server using Object/Relational mapping framework 'Hibernate' and used HQL, Criteria, and Named Queries.
- Configured Hibernate mapping files and Hibernate configuration files to connect with the database.
- Implemented various J2EE design patterns, like DTO, DAO and Singleton.
- Communicated between different applications through Web Services (XML, WSDL, UDDI, and SOAP) and exchanged data.
- Used Jira for project tracking, Bug tracking and Project Management.
- Configured and used Log4J for logging all the debugging and error information.
- Worked with ANT build scripts for compiling and building the project and CVS for version control
Environment: JDK, HTML, JavaScript, Servlet2.4, JSP2.0, Spring 3.0, Hibernate3.2, Web Services (SOAP, WSDL, UDDI), XML, Log4J, ANT, Junit, Microsoft SQL Server 2005, JBoss 5.1, Eclipse, CVS, Windows 7/ Server 2003.