We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

5.00/5 (Submit Your Rating)

OrlandO

SUMMARY:

  • Having around 11 years of experience in IT field with 5+ years of experience in Operations, developing, maintaining, monitoring and upgrading Hadoop Clusters (Hortonworks and Cloudera distributions).
  • Hands on experience in installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark, Kafka, Impala, Zookeeper, Hue and Sqoop using both Hortonworks and Cloudera.
  • Hands on experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, Spark, Impala, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
  • Experience in converting Hive/SQL queries into Spark transformations using Java. Experience on ETL development using Kafka, Flume, and Sqoop.
  • Built large - scale data processing pipelines and data storage platforms using open-source big data technologies.
  • Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
  • Experience in installing, configuring Hive, its services and Metastore. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.
  • Experience in tuning and debugging Spark, Impala application running.
  • Experience integration of Kafka with Spark for real time data processing.
  • Adept in full life cycle of Enterprise BI application tools such as Tableau, Informatica, Pentaho.
  • Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the Operations, implementation, administration and support of ETL processes for large-scale Data Warehouses.
  • Experience in developing dashboards and building data pipelines using Pentaho platform
  • Experience in data migration projects using Pentaho DI (jobs, transformations).
  • In depth knowledge about database imports, worked with imported data to populate tables in Hive. Exposure about how to export data from relational databases to Hadoop Distributed File System.
  • Experience in setting up the High-Availability Hadoop Clusters.
  • Good knowledge about planning a Hadoop cluster like choosing the distribution, hardware selection for both master as well as slave nodes and cluster sizing.
  • Experience in developing Shell Scripts for system management.
  • Experience in Hadoop administration with good knowledge about Hadoop features like safe mode, auditing.
  • Experience with Software Development Processes & Models: Agile, Waterfall & Scrum Model.
  • Working experience on sprint planning tools like hip chat, Jira and GitHub, Nifi Registry version control tools.
  • Experience in UNIX shell scripting and has good understanding of OOPS and Data structures.
  • Team Player and a fast learner with good analytical and problem solving skills.
  • Self-Starter and Ability to work independently as well as a Team.

TECHNICAL SKILLS:

Hadoop Technologies: HDFS, MapReduce, Hive, Oozie, Pig, Spark, Impala, Kafka, HBase, Zookeeper, Nifi, Sqoop, Flume, Accumulo

Languages: SQL, PL/SQL, Java, C, JavaScript, HTML, DHTML, XML, UNIX Shell Script, XSD, XPATH, XSLT, WSDL

Databases: Oracle 11g/10g/9i/8i, SQL Server 2005 2008,MS-Access, MySQL, Mariadb, Mongo

Application Servers: Weblogic, WebSphere, JBoss, Tomcat, Sun Application Server, Oracle Application Server, Websphere Portal Server

Operating System: Windows XP/2003/2000/NT, Unix, Sun Solaris, Linux

Tools:: Tableau, R-Studio, Jupyter, TOAD, SQL*Plus, SQL Navigator, SQL* Loader, PUTTY, VSS (Visual Source Safe), Eclipse, BEA Workshop 10, RAD, JDeveloper, MyEclipse

PROFESSIONAL EXPERIENCE:

Confidential, Orlando

Hadoop Administrator

Responsibilities:

  • Responsible for building scalable distributed data solution using Hadoop for storing, processing and analyzing of data.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
  • Responsible for understanding the security requirements and accordingly secured the data accross all HDP environments.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hdfs, Hive, MapReduce, HBase, Spark, Kafka, Nifi, Zookeeper, Ranger etc., using Hortonworks distribution.
  • Involved in architecting and designing HDP and HDF clusters.
  • Implemented Ambari blueprints to build HDP clusters.
  • Upgraded Ambari from 2.6.1.0 to 2.6.2.2.
  • Upgraded HDP from 2.6.4.0 to 2.6.4.125.
  • Performed multiple patch upgrades for HDF i.e., from 3.1.1.1 to 3.1.2.0 and from 3.1.2.0 to 3.1.2.18 on Devvelopment and production environments.
  • Secured Hadoop environments by enabling Kerberos, Two-way SSL, Ranger, Ranger KMS and managing them
  • Integrated Hadoop clusters with Centrify and Active Directory for authentication and authorization of users
  • Encrypted the data by defining enrytion zones and manage access policies through Ranger KMS
  • Restricting users to access data in Hadoop clusters Confidential database, table and column level depending on the requirements by defining Ranger policies and ACL’s
  • Installed and configured R-Studio, Jupyter, Tableau and guide the users for reporting.
  • Worked on containerization of tools R-studio, Jupyter using Docker
  • Responsible for capacity planning, management and troubleshooting for HDFS, YARN/MapReduce, Hive, and HBase to optimize Hive analytics
  • Installed and configured Kafka service to stream real-time data
  • Worked on creating EC2 instances on AWS for a POC on swapping master nodes.
  • Enabled High Availability in Hadoop environments for multiple components like HDFS name node, YARN resource manager, Hive Server, Metastore, Kafka, Nifi, Oozie etc.
  • Create Ambari Views for Tez, Hive and HDFS for the users to monitor job processes
  • Responsible for setting rack awareness on all clusters
  • Responsible for DDL deployments as per requirement and validated DDLs among different environments
  • Created Oozie work flows to schedule the hive and spark jobs using Work Flow Manager from Ambari
  • Worked on swapping old master nodes with new master nodes with minimal down time
  • Enabled Mariadb Master-Slave replication
  • Tune MariaDB, Hive, Yarn, Nifi, Spark etc for optimal performance
  • Written scripts for disk monitoring and logs compression
  • Configured CGroups for resources allocation to third party application Paxata running on the cluster.
  • Worked on Integrating Nifi with elastic search for monitoring the logs.
  • Worked on integrating Ambari with Solarwinds for montoring the components.
  • Running benchmark tests, analyze system bottlenecks and prepare solutions to eliminate them.
  • Data summarization, ad-hoc query, and analysis of large datasets using Hive and Spark.
  • Involved in Designing and development of Nifi workflows for transfer of datasets between Hadoop and external data sources like RDBMS, NoSql, S3, HTTP Ports, Files, Click Stream sources and import and export data using Nifi from HDFS to RDBMS and vice versa.
  • Created and maintained Technical documentation for all the processes like for cluster builts using Ambari Blue Prints, installation of third party tools, usage guide for users, best practices guide etc
  • Commission and decommission nodes from the cluster, Name node recovery and balancing Hadoop cluster, scheduling jobs on the cluster, monitoring all the daemons and cluster health status on daily basis using AMBARI, creating HDFS snapshots, partitioning disks along with mounting them for optimal performance of the cluster.
  • Cluster sizing, Balancing Nodes, Managing Nodes and tuning servers.
  • Involved in daily databases maintenance activities such as installing, monitoring and backup of MariaDB and MongoDB databases.
  • Copying of huge data between different Hadoop environments using distcp. analyzing log files for Hadoop and eco system services and finding root causes.
  • Promoted code from lower environments to production environment.
  • Support developer team in case of issues related to job failures related to Hive queries, Spark, HBase, Kafka.
  • Provided guidance to users on re-writing their queries to improve performance and reduce cluster usage.
  • Provided regular user and application support for highly complex issues involving multiple components such as Hive, HBase, Spark, Kafka, MapReduce.
  • Responsible for on-call support on a rotating basis to help with hosts systems issues.

Environment: Hdfs, Hive, MapReduce, HBase, Spark, Sqoop, Kafka, Nifi, Zookeeper, Ranger, Shell Scripting, Linux Red Hat, Java, Mysql, Tableau

Confidential, Denver, CO

Hadoop Administrator

Responsibilities:

  • Responsible for building scalable distributed data solution using Hadoop, which analyzes existing client data in flat files and extracts the requisite data to the Data Warehouse for further analysis.
  • Analyze the system and convert functional and technical requirements into detailed design.
  • Designed the high level and low level design for the data transformation process using PIG scripts and UDFs.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
  • Collaborate with engineers and stakeholders to identify and minimize process inefficiency and provide exponential increase in performance of the applications.
  • Provided regular user and application support for highly complex issues involving multiple components such as Hive, Impala, Spark, Kafka, and MapReduce.
  • Handling data import from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from Mysql into HDFS.
  • Participate in analysis of data stores and help uncover insights using latest data analysis tools.
  • Day-to-day monitoring and troubleshooting of workflows and jobs, problems and performance issues across different clusters.
  • Provided guidance to users on re-writing their queries to improve performance and reduce cluster usage.
  • Worked on Impala for interactive reporting for tableau, Pentaho.
  • Worked on real time data processing with Spark.
  • Evaluated Spark’s performance vs Impala on transactional data.
  • Built 120 node Cloudera Hadoop clusters.
  • Involved in setting up Kerberos for Cloudera and Kafka cluster and rewriting applications to use Kerberos.
  • Written test scripts for test driven development and continuous integration.
  • Written scripts for automating the processes such as taking periodic backups, setting up user batch jobs.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark, Kafka, Impala, Zookeeper, Hue and Sqoop using both Cloudera and Hortonworks.
  • Involved in Upgrading the CDH and Cloudera Manager (5.8, 5.10 to 5.12).
  • Enabled HA for Cloudera Manager, Resource Manager, Name Node, and HiveMetastore.
  • Enabled Active Directory/LDAP for Cloudera Manager, Cloudera Navigator and Hue.
  • Enabled load balancer for impala to distribute data load on all impala daemons across the cluster.
  • Expertise in setting up in-memory layer such as Spark (1.6 and 2.x), impala and its maintenance like resolving out of memory issues, balancing load across daemons.
  • Implemented Fair schedulers on the Resource Manager to share the resources of the cluster for the Map Reduce jobs given by the users.
  • Migrated data across clusters using DISTCP.
  • Scheduling Workflows through UC4 application.
  • Written scripts for disk monitoring and logs compression
  • Involved in ongoing maintenance, support and improvement in Hadoop cluster.
  • Created Kafka topics, provide ACLs to users and setting up rest mirror and mirror maker to transfer the data between two Kafka clusters.
  • Helped the users to connect to Kerberized Hive from SQL Workbench and BI tools.

Environment: Hadoop, HDFS, Hive, Spark, Impala, Kafka, Accumulo, Pentaho, Shell Scripting, Linux Red Hat

Confidential, Dallas, TX

Hadoop Administrator

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark, Kafka, Impala, Zookeeper, Hue and Sqoop using both Cloudera and Hortonworks.
  • Responsible for building scalable distributed data solution using Hadoop which analyzes existing client data in flat files and extracts the requisite data to the Data Warehouse for further analysis
  • Exported the analyzed data to the relational databases using Sqoop for business use cases and also for generation of additional reports
  • Involved in the installation of cluster, commissioning & decommissioning datanodes, namenode recovery, capacity planning, and slots configuration along with the admin
  • The data extraction was done from flat files to HDFS and transformation was done through PIG scripts and HiveQL.
  • Real time streaming the data using Spark with Kafka.
  • Provided regular user and application support for highly complex issues involving multiple components such as Hive, Impala, Spark, Kafka, MapReduce
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala
  • Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • Involved in importing the real-time data to HDFS using Kafka and implemented the Oozie job for daily imports.
  • Developing and execution of various jobs and transformations using Pentaho.
  • Debugging the Pentaho ETL processes and resolving issues with jobs and transformations.
  • Implemented loading and transforming large sets of structured, semi structured and unstructured data
  • Designed the high level and low level design for the data transformation process using PIG scripts and UDFs
  • Written test scripts for test driven development and continuous integration.
  • Migrated data across clusters using DISTCP.
  • Improved the performance of the cluster for the extraction and transformation by finding the optimal input split size, reducers to be set.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades for different environments.
  • Designed technical solution for real-time analytics using Kafka, Storm, and Hbase
  • Configured cluster coordination services through Zookeeper.
  • Installed Oozie workflow engine to schedule and run Hive, pig and MapReduce action nodes

Environment: Hadoop, HDFS, Hive, Spark, Impala, Pig, Sqoop, Oozie, HBase, Shell Scripting, Ubuntu, Linux Red Hat, Kafka, Spark, Apache Storm, Pentaho

Confidential

Data Engineer

Responsibilities:

  • Involved in building multi-node Hadoop Cluster spanning multiple racks.
  • Worked on writing transformer/mapping Map-Reduce pipelines using Java.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.
  • Designed and implemented Incremental Imports into Hive tables.
  • Worked in Loading and transforming large sets of structured, semi structured and unstructured data.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Involved in importing the real-time data to HDFS using Kafka and implemented the Oozie job for daily imports.
  • Developing and execution of various jobs and transformations using Pentaho.
  • Debugging the Pentaho ETL processes and resolving issues with jobs and transformations.
  • Implemented loading and transforming large sets of structured, semi structured and unstructured data
  • Experienced in managing and reviewing the Hadoop log files.
  • Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
  • Worked extensively on various Pentaho Data Integration tools such as SPOON, KITCHEN and PAN
  • Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
  • Developed scripts and automated data management from end to end and sync up between all the clusters.
  • Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Scripts.

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Kafka, Spark, Apache Storm, Java, Oracle, My SQL, Hive, Pig, Sqoop, Map Reduce, SQL, Ubuntu, Linux Red Hat

Confidential

Hadoop Administrator

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Managing and Scheduling Jobs on a Hadoop cluster.
  • Involved in taking up the Backup, Recovery and Maintenance.
  • Shuffle algorithm, direct access to the disk, built-in compression, and code written in Java.
  • Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
  • Developed PIG scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Involved in commissioning and decommissioning Confidential the time of node failure.
  • Regular user and application support.
  • Used Cloudera connectors for improving the performance when importing and exporting data.
  • Running transformation on the data sources using Hive and Pig.
  • Written scripts for automating the processes.
  • Scheduled the jobs with Oozie.
  • Good understanding of hive partitions.
  • Good understanding of types of file formats in HDFS.
  • Processed XML’s files using Pig.
  • Install and maintain CDH cluster.
  • Installation include HDFS, MR, Hive, Pig, Oozie, Sqoop.
  • Worked on Hadoop Development cluster maintenance including metadata backups and upgrades.
  • Worked actively with the DevOps team to meet the specific business requirements for individual customers and proposed Hadoop solutions.

Environment: Hadoop, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Flume, Storm, Kafka, LINUX, Java, C++, SQL, Eclipse.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in the process Design, Coding and Testing phases of the software development cycle.
  • Developed a large scale distributed MVC J2EE application using struts, hibernate, and spring frameworks
  • Preparing the Business Requirements documents and getting sign off from the client
  • Designing the components and preparing the high level and detail level design documents
  • Writing test cases for unit testing, integration testing and system testing
  • Involved in the development of the application framework including setting up of session management, caching and paging
  • Created applications, connection pools, deployment of JSPs, Servlets, and EJBs in WebSphere.
  • Implemented the associated business modules integration using Spring and Hibernate.
  • Developed application service components and configured beans using Spring IoC, creation of Hibernate mapping files and generation of database schema.
  • Designed and developed Enterprise Eligibility business objects and domain objects with Object Relational Mapping framework such as Hibernate.
  • Coded Java backend, JSP, Struts, JavaScript, Business classes
  • The UI is enriched with JQuery JavaScript library that facilitates dynamic and asynchronous screen manipulation and AJAX request.
  • Developed Presentation layer components comprising of JSP, AJAX, Struts Action, Struts Form Beans and AJAX tag libraries.
  • Used JavaScript for developing UI Components like Editable drop down, data-driven menu customizations.
  • Developed back-end stored procedures and triggers using Oracle PL/SQL, involved in database objects creation, performance tuning of stored procedures, and query plan.
  • Developed SQL queries, joins with JDBC API, Hibernate ORM to access data.
  • Develop innovative and quality solutions by making use of latest tools and technologies like Apache CXF, Spring Core, and Spring AOP.
  • Developed REST architecture based web services to facilitate communication between client and servers.
  • Engaged with Eclipse for visually designing, constructing, testing and deploying J2EE application and web services.
  • Developed the UML Use Cases, Activity, Sequence and Class diagrams using Rational Rose.
  • Developed the different components of application such as JSPs, Servlets, EJB's using Web Sphere Studio Application Developer and used CVS for version control.
  • Developed a Filter View & back-end components with the Spring, MVC, iBatis, JSTL, Dojo, JQuery, JSTL
  • Used JSPs, JavaScript and Servlets to generate dynamic web pages and web content.
  • Used jQuery framework to implement the modal functionality and make AJAX calls
  • Developed pom.xml for the build of the application using Maven
  • Involved in the design and development of application built in Java/J2EE using Struts, Spring and Hibernate.
  • Prepared the REST and SOAP based service calls depending on the data passing to the web service.
  • Developed DAOs (Data Access Object) using Hibernate as ORM to interact with Oracle database.
  • Designed and developed Generate PDF functionality using Spring framework and iText
  • Used Value Objects, Service Locator and Singleton design patterns.
  • Involved in the migration of existing projects from Ant to Maven 2.
  • Unit testing using jUnit and jProbe.

Environment: Java, J2EE, Spring, Hibernate, Struts, JQuery, AJAX, Sencha ExtJS, JavaScript, Oracle, Crud, PL/SQL, JDBC, Apache CXF, Rest, Eclipse, Weblogic, ClearCase, Junit, Agile, UML JSP, JSTL, Servlet, Maven, IText, Jasper report, ILOG, Web 2.0, SOA.

Confidential

Java Developer

Responsibilities:

  • Designed and developed user interface using asynchronous technologies using AJAX and Struts frame works.
  • Responsible for development of configuration, mapping and Java beans for Persistent layer (Object and Relational Mapping) using Hibernate.
  • Involved in implementing Message Driven Beans for asynchronous processing.
  • Involved in integrating the business layer with DAO layer using ORM tool Hibernate.
  • Involved in working for the development of stateless session beans as part of enterprise layer.
  • Involved in Integration to integrate with external systems using SOA (Web services, WSDL, SOAP, UDDI, XML).
  • Designed and developed interface components using HTML, JSP and JSTL tags framework.
  • Implemented the project using IDE Eclipse 3.0
  • Fine-tuned the application for performance by doing query optimization.
  • Implemented design patterns MVC, DAO, Singleton, Factory etc.
  • Involved in develop and modify SQL queries and stored procedures using TOAD.
  • Implemented new features to deploy on Web logic application servers
  • Involved in writing test cases using JUNIT for various modules
  • Review source code and generate peer review reports.
  • Involved in unit testing and bug fixing.
  • Used IBM Rational Clear case as version control system.

Environment: Java 1.4, J2EE, JSP 2.0, HTML, Java Script, JFC (Swing), JDBC, SQL, PL-SQL procedures, WebLogic Application Server 8.1, Oracle9i, Struts Frame work 1.2, Ant, JUnit, Log4j and Windows NT.

We'd love your feedback!