Hadoop Administrator Resume
MD
SUMMARY
- 8+ years of professional experience in IT, including 4+ years of work experience in Big Data, Hadoop and Ecosystem Analytics.
- Passionate towards working in Big Data and Analytics environment.
- Well versed in Installation, Configuration, Supporting and Managing of Big Data and underlying infrastructure of Hadoop Cluster.
- Experience in designing and implementing complete end - to-end Hadoop Infrastructure using MapReduce, Spark, Kafka, Pig, Hive, Impala, Sqoop, Oozie, Flume and HBase.
- In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
- Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big Data.
- Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in writing Hadoop Jobs for analyzing data using Pig Latin Commands.
- Experience in Integrating Hive and Sqoop with HBase and analyzing data in HBase.
- Good Knowledge in NoSQL Databases like HBase, Cassandra and MongoDB.
- Knowledge in extending Hive and Pig core functionality by writing custom UDFs like UDAFs and UDTFs.
- Real data streaming using Spark and Kafka.
- Good working experience in PySpark and SparkSql.
- Familiar with Scala, closures, higher order functions, monads.
- Knowledge of administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Pseudo-Distributed Mode.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in using Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
- Experience in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Good Knowledge in configuring and monitoring tools like Ganglia and Nagios.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Experience in Launching EC2 instances in Amazon EMR using Console.
- Knowledge on Reporting tools like Tableau Software which is used to do analytics on data in cloud.
- Extensive experience with SQL, PL/SQL and database concepts.
- Experience in developing applications using Java & J2EE technologies.
- Extensive Knowledge in Java, J2EE, Servlets, JSP, JDBC, Struts and Spring Framework.
- Experience in working with popular frameworks likes Struts 2.0, Hibernate 3.0, Spring IOC, and Spring MVC.
- Experience in Web Services using XML, HTML and SOAP.
- Experience in using version control management tools like CVS, SVN and Rational Clear Case.
- Experience in loading data to HDFS from UNIX (Ubuntu, Fedora, Centos) file system.
- Highly motivated, self-starter with a positive attitude, willingness to learn new concepts and acceptance of challenges.
TECHNICAL SKILLS
Big Data Ecosystem: HDFS, MapReduce, Spark, Spark SQL, Scala, Impala, YARN, Hive, Pig, Zookeeper, Sqoop, Oozie, Flume, Kafka.
Web Technologies: HTML, XML, JDBC, JSP, JavaScript, AJAX, REST, SOAPjQuery.
Methodologies: Agile, UML, Design Patterns (Core Java and J2EE)
NOSQL Technologies: HBase, MongoDB, Cassandra
Databases: Oracle 11g/10g/9i, DB2, MS-SQL Server, Confidential, IBMMySQL, MS- Access
Frameworks: MVC, Hibernate
Languages: C, C++, Java, SQL, PL/SQL, Python, Scala, Unix shell scriptingVB
Web Servers: Web Logic 10.3, Web Sphere 6.1, Apache Tomcat 5.5/6.0.
Tools: & Utilities: Eclipse, Putty, Cygwin, MS Office, Crystal Reports, Access Report Designer, SVN, GIT, Maven, Jira.
Reporting Tool: Tableau
Operating System: Windows XP, UNIX (Solaris, Linux)
Software Package: MS Office 2010.
PROFESSIONAL EXPERIENCE
Confidential, MD
Hadoop Administrator
Responsibilities:
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation, administration and support forHadoop.
- Installed and Configured ApacheHadoopclusters for application development andHadooptools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Worked with the team to increase cluster, the configuration for additional data nodes was done by Commissioning process inHadoop.
- Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
- Worked with systems engineering team to plan and deploy newHadoopenvironments and expand existingHadoopclusters.
- Managed and scheduled Jobs on aHadoopcluster.
- Involved in defining job flows, managing and reviewing log files.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Collected the log data from web servers and integrated into HDFS using Flume.
- Involved in HDFS maintenance and administering it throughHadoop-Java API.
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
- Experience in managing and reviewingHadooplog files.
- Worked on setting up the Kerberos installation.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Worked on tuning the performance Pig queries.
- Implemented best income logic using Pig scripts and UDFs.
- Component unit testing using Azure Emulator.
- Analyze escalated incidences within the Azure SQL database.Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, Map Reduce, Spark, Kafka, HDFS, Zoo Keeper, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Sqoop, Flume, Oracle 11g, Knox, SQL, SharePoint, Ranger, UNIX Shell Scripting.
Confidential, MD
Hadoop Administrator
Responsibilities:
- Installed and configuredHadoopMapReduce, HDFS and developed multiple MapReduce jobs.
- Deployed aHadoopcluster and integrated with Nagios and Ganglia.
- Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of theHadoopcluster.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Hands on experience in provisioning and managing multi-nodeHadoopClusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Monitored multiple clusters environments using Metrics and Nagios.
- Experienced in providing security forHadoopCluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Worked on analyzing Data with HIVE and PIG.
- Implemented Kerberos for authenticating all the services inHadoopCluster.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
Environment: - HDFS, Map Reduce, Hive, Sqoop, PIG, Cloudera, Flume, SQL Server, UNIX, RedHat and CentOS.
Confidential, OH
Hadoop Administrator
Responsibilities:
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and reviewHadooplog files on clusters.
- Responsible on-boarding new users to theHadoopcluster (adding user a home directory and providing access to the datasets).
- Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
- Resolved tickets submitted by users, P1 issues, troubleshoot the errors, documenting, resolving the errors.
- Experienced in writing the automatic scripts for monitoring the file systems, key MAPR services.
- Responsible for giving presentations about new ecosystems to be implemented in the cluster with the teams and managers.
- Helped the users in production deployments throughout the process.
- Managed and reviewedHadoopLog files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Applied patches to cluster.
- Added new Data Nodes when needed and ran balancer.
- Responsible for building scalable distributed data solutions usingHadoop.
- Continuous monitoring and managing theHadoopcluster through Ganglia and Nagios.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Done major and minor upgrades to theHadoopcluster.
- Upgraded the ClouderaHadoopecosystems in the cluster using Cloudera distribution packages.
- Done stress and performance testing, benchmark for the cluster.
- Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team from Cloudera.
Environment: Flume, Oozie, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, Cloudera Manager.
Confidential
Java Developer
Responsibilities:
- Involved in various SDLC phases like Design, Development and Testing.
- Developed front end using Struts and JSP.
- Developed web pages using HTML, JavaScript, JQuery and CSS.
- Used various Core Java concepts such as Exception Handling, CollectionAPIs to implement various features and enhancements.
- Developed server side components Servlets for the application.
- Involved in coding, maintaining, and administering Servlets and JSP components to be deployed on a Web Sphere application server.
- Implemented Hibernate ORM to Map relational data directly to java objects.
- Worked with Complex SQL queries, Functions and Stored Procedures.
- Involved in developing spring web MVC framework for portals application.
- Implemented the logging mechanism using log4j framework.
- Developed REST API, Web Services.
- Wrote test cases in Junit for unit testing of classes.
- Used Maven to build the J2EE application.
- Used SVN to track and maintain the different version of the application.
- Involved in maintenance of different applications with onshore team.
- Good working experience in Tepestry processing claims.
- Working experience with professional billing claims.
Environment: Java, Spring Framework, Struts, Hibernate, RAD, SVN, Maven, Web Sphere Application Server, Web Services, Oracle Database 11g, IBM MQ, JMS, HTML, Java script, XML, CSS, REST API.
Confidential
Java/J2EE Developer
Responsibilities:
- Involved in client requirement gathering, analysis & application design.
- Used UML to draw use case diagrams, class & sequence diagrams.
- Implemented client side data validations using JavaScript.
- Implemented server side data validations using Java Beans.
- Implemented views using JSP & JSTL1.0.
- Developed Business Logic using Session Beans.
- Implemented Entity Beans for Object Relational mapping.
- Implemented Service Locater Pattern using local caching.
- Worked with collections.
- Implemented Session Facade Pattern using Session and Entity Beans
- Developed message driven beans to listen to JMS.
- Performed application level logging using log4j for debugging purpose.
- Involved in fine-tuning of application.
- Thoroughly involved in testing phase and implemented test cases using Junit.
- Involved in the development of Entity Relationship Diagrams using Rational Data Modeler.
Environment: Java SDK 1.4, Entity Bean, Session Bean, JSP, Servlet, JSTL1.0, CVS, JavaScript, and Oracle9i, SQL, JBOSSv3.0, Eclipse 2.1
