We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

3.00/5 (Submit Your Rating)

Minneapolis, MN

PROFESSIONAL SUMMARY:

  • Over 7 years of administration experience including 4+ years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop eco - system components in the existing cluster.
  • Experience in Hadoop Administration (HDFS, MAP REDUCE, HIVE, PIG, SQOOP, FLUME AND OOZIE), NOSQL Administration.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, RackSpace and OpenStack.
  • Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
  • Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera and Hortonworks.
  • Good Experience in understanding the client’s Big Data business requirements and transform it into Hadoop centric technologies.
  • Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
  • Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
  • Experience in configuring Zookeeper to provide Cluster coordination services.
  • Strong experience in writing custom UDFs in java for Hive and Pig.
  • Good experience in managing and reviewing Hadoop log files.
  • Good experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs using Java.
  • Good working knowledge of extending Hive and Pig core functionality by writing custom UDFs.
  • Loading logs from multiple sources directly into HDFS using tools like Flume.
  • Good experience in performing minor and major upgrades.
  • Experience in benchmarking, performing backup and recovery of Namenodne metadata and data residing in the cluster.
  • Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
  • Adept at configuring NameNode High Availability.
  • Worked on Disaster Management with Hadoop Cluster.
  • Well experienced in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment.
  • Experienced in Linux Administration tasks like IP Management (IP Addressing, Subnetting, Ethernet Bonding and Static IP).
  • Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
  • Experience in deploying and managing the multi-node development, testing and production
  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing
  • Principles, generating key tab file for each and every service and managing key tab using key tab tools.
  • Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
  • Effective problem-solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.

TECHNICAL SKILLS:

Languages: Java, Python

Big Data Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Flume, ZooKeeper

Security: Kerberos

Cluster management tools: Cloudera Manager, Ambari, Ganglia, Nagios

Databases: Oracle, MySQL, SQL Server, Cassandra

Scripting language: Shell scripting, Puppet

Software Development Tool: Eclipse, NetBeans

Web Servers: Apache Tomcat

Operating Systems: Windows, Linux (Redhat, CentOS)

Build Tools: Maven

PROFESSIONAL EXPERIENCE:

Confidential, Minneapolis, MN

Hadoop Engineer

Responsibilities:

  • Managed 300+ Nodes HDP 2.2.4 cluster with 14 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5.
  • Installed and configured Hortonworks Ambari for easy management of existing Hadoop cluster.
  • Responsible for the design and implementation of a multi-datacenter Hadoop environment intended to support the analysis of large amounts of unstructured data along with ETL processing.
  • Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
  • Conducting RCA to find out data issues and resolve production problems.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Extensive knowledge in Teradata Performance Tuning, successfully tuned many long running queries.
  • Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Enabled Kerberos for Hadoop cluster Authentication and integrate with active directory for managing users and application groups.
  • Developed Sqoop jobs to extract data from RDBMS databases - Oracle and Teradata.
  • Loaded the data from Teradata to HDFS using Teradata Hadoop connectors
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Loaded Avro schema into Hive tables and prepared shell scripts for executing Hadoopcommands for single execution.
  • Worked with big data developers, designers and scientists in troubleshooting mapreduce job failures and issues with Hive, Pig and Sqoop.
  • Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Worked on design and implementation, configuration, performance tuning of Hortonworks HDP 2.3 Cluster with High Availability and Ambari 2.2.
  • Analyzing the Server logs for errors and exceptions, Jenkins Job - Builds - Scheduling and monitoring the console outputs.
  • Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
  • Experience on JIRA and ServiceNow to track issues on the big data platform.
  • Experienced in managing and reviewing Hadoop log files.
  • Configured Jenkins for successful deployment to test and production environments.
  • Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
  • Experience on Hbase High availability and manually tested using failover tests.
  • Create queues and allocated the clusters resources to provide the priority for jobs.
  • Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
  • Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
  • Coordinated with technical teams for installation of Hadoop and third related applications on systems.
  • Supported technical team members for automation, installation and configuration tasks.
  • Suggested improvement processes for all process automation scripts and tasks.
  • Assisted in designing, development and architecture of Hadoop and Hbase systems.
  • Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
  • Responsible for cluster Maintenance, Monitoring, Troubleshooting, Tuning, commissioning andDecommissioning of nodes.
  • Responsible for cluster availability and experienced on ON-call support
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.

Confidential, Santa Cruz, CA

Hadoop Admin

Responsibilities:

  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
  • Fine tuning hive jobs for optimized performance.
  • Partitioned and queried the data in Hive for further analysis by the BI team.
  • Implemented APACHE IMPALA for data processing on top of HIVE.
  • Fine tuning Hive jobs for better performance.
  • Benchmarking mechanisms like TERASORT, TESTDFSIO.
  • Involved in extracting the data from various sources into Hadoop HDFS for processing.
  • Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nodes on CDH3Hadoop cluster on Red hat LINUX.
  • Involved in loading data from LINUX file system to HDFS.
  • Imported weblogs from the web servers into HDFS using Flume.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Automated workflows using shellscripts to pull data from various databases into Hadoop.
  • Responsible to manage data coming from different sources.
  • Involved in loading data from UNIX file system to HDFS.
  • Services through Zookeeper.
  • Experience in managing and reviewing Hadoop log files.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Installed oozie workflow engine to run multiple Hive and pig jobs.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Confidential

Linux/MySQL Administrator

Responsibilities:

  • Installation and configuration of Linux for new build environment.
  • Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
  • Deep understanding of monitoring and troubleshooting mission critical Linux machines.
  • Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
  • Ensured data recovery by implementing system and application level backups.
  • Performed various configurations which include networking and IPTable, resolving host names and SSH keyless login.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
  • Automate administration tasks through the use of scripting and Job Scheduling using CRON.
  • Installation and configuration of Linux for new build environment.
  • Installing and maintaining the Linux servers
  • Monitoring System Metrics and logs for any problems.
  • Running cron-tab to back up data.
  • Adding, removing, or updating user account information, resetting passwords, etc.
  • Using Java Jdbc to load data into MySQL.
  • Maintaining the MySQL server and Authentication to required users for databases.
  • Creating and managing Logical volumes
  • Installing and updating packages using YUM.
  • Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
  • Performance tuning for high transaction and volumes data in mission critical environment.
  • Setting up alert and level for MySQL (uptime, Users, Replication information, Alert based on different query).
  • Estimate MySQL database capacities; develop methods for monitoring database capacity and usage.
  • Develop and optimize physical design of MySQL database systems.

Confidential

Linux Administrator

Responsibilities:

  • Implemented different suite of Linux infrastructure like DHCP, DNS, PXE, NFS.
  • Evaluated new hardware, software and infrastructure solutions.
  • Provided on call rotation for 24*7 and assistance for the team.
  • Extensive use of LVM, creating Volume Groups, Logical volumes.
  • Performed RPM and YUM package installations, patch and other server management.
  • Performed scheduled backup and necessary restoration.
  • Configured Domain Name System (DNS) for hostname to IP resolution
  • Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours
  • Developed and maintained installation and configuration procedures.
  • Performed backup and restores in Linux environment.
  • Implemented system and maintenance tasks using shell scripts.
  • Designed virtual servers for testing purposes using VMWARE.
  • Developed data flow, Entity Relationship and data structure diagrams.
  • Worked on adding and configuring devices like hard disks, etc.

We'd love your feedback!