Hadoop Engineer Resume
3.00/5 (Submit Your Rating)
Minneapolis, MN
PROFESSIONAL SUMMARY:
- Over 7 years of administration experience including 4+ years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop eco - system components in the existing cluster.
- Experience in Hadoop Administration (HDFS, MAP REDUCE, HIVE, PIG, SQOOP, FLUME AND OOZIE), NOSQL Administration.
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, RackSpace and OpenStack.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
- Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera and Hortonworks.
- Good Experience in understanding the client’s Big Data business requirements and transform it into Hadoop centric technologies.
- Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
- Experience in configuring Zookeeper to provide Cluster coordination services.
- Strong experience in writing custom UDFs in java for Hive and Pig.
- Good experience in managing and reviewing Hadoop log files.
- Good experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs using Java.
- Good working knowledge of extending Hive and Pig core functionality by writing custom UDFs.
- Loading logs from multiple sources directly into HDFS using tools like Flume.
- Good experience in performing minor and major upgrades.
- Experience in benchmarking, performing backup and recovery of Namenodne metadata and data residing in the cluster.
- Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
- Adept at configuring NameNode High Availability.
- Worked on Disaster Management with Hadoop Cluster.
- Well experienced in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment.
- Experienced in Linux Administration tasks like IP Management (IP Addressing, Subnetting, Ethernet Bonding and Static IP).
- Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Experience in deploying and managing the multi-node development, testing and production
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing
- Principles, generating key tab file for each and every service and managing key tab using key tab tools.
- Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
- Effective problem-solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.
TECHNICAL SKILLS:
Languages: Java, Python
Big Data Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Flume, ZooKeeper
Security: Kerberos
Cluster management tools: Cloudera Manager, Ambari, Ganglia, Nagios
Databases: Oracle, MySQL, SQL Server, Cassandra
Scripting language: Shell scripting, Puppet
Software Development Tool: Eclipse, NetBeans
Web Servers: Apache Tomcat
Operating Systems: Windows, Linux (Redhat, CentOS)
Build Tools: Maven
PROFESSIONAL EXPERIENCE:
Confidential, Minneapolis, MN
Hadoop Engineer
Responsibilities:
- Managed 300+ Nodes HDP 2.2.4 cluster with 14 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5.
- Installed and configured Hortonworks Ambari for easy management of existing Hadoop cluster.
- Responsible for the design and implementation of a multi-datacenter Hadoop environment intended to support the analysis of large amounts of unstructured data along with ETL processing.
- Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
- Conducting RCA to find out data issues and resolve production problems.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensive knowledge in Teradata Performance Tuning, successfully tuned many long running queries.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Enabled Kerberos for Hadoop cluster Authentication and integrate with active directory for managing users and application groups.
- Developed Sqoop jobs to extract data from RDBMS databases - Oracle and Teradata.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
- Loaded Avro schema into Hive tables and prepared shell scripts for executing Hadoopcommands for single execution.
- Worked with big data developers, designers and scientists in troubleshooting mapreduce job failures and issues with Hive, Pig and Sqoop.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on design and implementation, configuration, performance tuning of Hortonworks HDP 2.3 Cluster with High Availability and Ambari 2.2.
- Analyzing the Server logs for errors and exceptions, Jenkins Job - Builds - Scheduling and monitoring the console outputs.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
- Experience on JIRA and ServiceNow to track issues on the big data platform.
- Experienced in managing and reviewing Hadoop log files.
- Configured Jenkins for successful deployment to test and production environments.
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Experience on Hbase High availability and manually tested using failover tests.
- Create queues and allocated the clusters resources to provide the priority for jobs.
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
- Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Coordinated with technical teams for installation of Hadoop and third related applications on systems.
- Supported technical team members for automation, installation and configuration tasks.
- Suggested improvement processes for all process automation scripts and tasks.
- Assisted in designing, development and architecture of Hadoop and Hbase systems.
- Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Responsible for cluster Maintenance, Monitoring, Troubleshooting, Tuning, commissioning andDecommissioning of nodes.
- Responsible for cluster availability and experienced on ON-call support
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
Confidential, Santa Cruz, CA
Hadoop Admin
Responsibilities:
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Fine tuning hive jobs for optimized performance.
- Partitioned and queried the data in Hive for further analysis by the BI team.
- Implemented APACHE IMPALA for data processing on top of HIVE.
- Fine tuning Hive jobs for better performance.
- Benchmarking mechanisms like TERASORT, TESTDFSIO.
- Involved in extracting the data from various sources into Hadoop HDFS for processing.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nodes on CDH3Hadoop cluster on Red hat LINUX.
- Involved in loading data from LINUX file system to HDFS.
- Imported weblogs from the web servers into HDFS using Flume.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Implemented test scripts to support test-driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Automated workflows using shellscripts to pull data from various databases into Hadoop.
- Responsible to manage data coming from different sources.
- Involved in loading data from UNIX file system to HDFS.
- Services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Installed oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Confidential
Linux/MySQL Administrator
Responsibilities:
- Installation and configuration of Linux for new build environment.
- Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
- Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
- Ensured data recovery by implementing system and application level backups.
- Performed various configurations which include networking and IPTable, resolving host names and SSH keyless login.
- Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
- Automate administration tasks through the use of scripting and Job Scheduling using CRON.
- Installation and configuration of Linux for new build environment.
- Installing and maintaining the Linux servers
- Monitoring System Metrics and logs for any problems.
- Running cron-tab to back up data.
- Adding, removing, or updating user account information, resetting passwords, etc.
- Using Java Jdbc to load data into MySQL.
- Maintaining the MySQL server and Authentication to required users for databases.
- Creating and managing Logical volumes
- Installing and updating packages using YUM.
- Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
- Performance tuning for high transaction and volumes data in mission critical environment.
- Setting up alert and level for MySQL (uptime, Users, Replication information, Alert based on different query).
- Estimate MySQL database capacities; develop methods for monitoring database capacity and usage.
- Develop and optimize physical design of MySQL database systems.
Confidential
Linux Administrator
Responsibilities:
- Implemented different suite of Linux infrastructure like DHCP, DNS, PXE, NFS.
- Evaluated new hardware, software and infrastructure solutions.
- Provided on call rotation for 24*7 and assistance for the team.
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Performed RPM and YUM package installations, patch and other server management.
- Performed scheduled backup and necessary restoration.
- Configured Domain Name System (DNS) for hostname to IP resolution
- Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours
- Developed and maintained installation and configuration procedures.
- Performed backup and restores in Linux environment.
- Implemented system and maintenance tasks using shell scripts.
- Designed virtual servers for testing purposes using VMWARE.
- Developed data flow, Entity Relationship and data structure diagrams.
- Worked on adding and configuring devices like hard disks, etc.
