Hadoop Engineer Resume Minneapolis, MN - Hire IT People

PROFESSIONAL SUMMARY:

Over 7 years of administration experience including 4+ years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop eco - system components in the existing cluster.
Experience in Hadoop Administration (HDFS, MAP REDUCE, HIVE, PIG, SQOOP, FLUME AND OOZIE), NOSQL Administration.
Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, RackSpace and OpenStack.
Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera and Hortonworks.
Good Experience in understanding the client’s Big Data business requirements and transform it into Hadoop centric technologies.
Analyzing the clients existing Hadoop infrastructure and understand the performance bottlenecks and provide the performance tuning accordingly.
Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
Experience in configuring Zookeeper to provide Cluster coordination services.
Strong experience in writing custom UDFs in java for Hive and Pig.
Good experience in managing and reviewing Hadoop log files.
Good experience in analyzing data using HiveQL, Pig Latin, HBase and custom MapReduce programs using Java.
Good working knowledge of extending Hive and Pig core functionality by writing custom UDFs.
Loading logs from multiple sources directly into HDFS using tools like Flume.
Good experience in performing minor and major upgrades.
Experience in benchmarking, performing backup and recovery of Namenodne metadata and data residing in the cluster.
Familiar in commissioning and decommissioning of nodes on Hadoop Cluster.
Adept at configuring NameNode High Availability.
Worked on Disaster Management with Hadoop Cluster.
Well experienced in building servers like DHCP, PXE with kick-start, DNS and NFS and used them in building infrastructure in a Linux Environment.
Experienced in Linux Administration tasks like IP Management (IP Addressing, Subnetting, Ethernet Bonding and Static IP).
Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
Experience in deploying and managing the multi-node development, testing and production
Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing
Principles, generating key tab file for each and every service and managing key tab using key tab tools.
Worked on setting up Name Node high availability for major production cluster and designed Automatic failover control using zookeeper and quorum journal nodes.
Effective problem-solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines. Ability to learn and use new technologies quickly.

TECHNICAL SKILLS:

Languages: Java, Python

Big Data Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Flume, ZooKeeper

Security: Kerberos

Cluster management tools: Cloudera Manager, Ambari, Ganglia, Nagios

Databases: Oracle, MySQL, SQL Server, Cassandra

Scripting language: Shell scripting, Puppet

Software Development Tool: Eclipse, NetBeans

Web Servers: Apache Tomcat

Operating Systems: Windows, Linux (Redhat, CentOS)

Build Tools: Maven

PROFESSIONAL EXPERIENCE:

Confidential, Minneapolis, MN

Hadoop Engineer

Responsibilities:

Managed 300+ Nodes HDP 2.2.4 cluster with 14 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5.
Installed and configured Hortonworks Ambari for easy management of existing Hadoop cluster.
Responsible for the design and implementation of a multi-datacenter Hadoop environment intended to support the analysis of large amounts of unstructured data along with ETL processing.
Coordinated with Hortonworks support team through support portal to sort out the critical issues during upgrades.
Conducting RCA to find out data issues and resolve production problems.
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Extensive knowledge in Teradata Performance Tuning, successfully tuned many long running queries.
Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Enabled Kerberos for Hadoop cluster Authentication and integrate with active directory for managing users and application groups.
Developed Sqoop jobs to extract data from RDBMS databases - Oracle and Teradata.
Loaded the data from Teradata to HDFS using Teradata Hadoop connectors
Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
Loaded Avro schema into Hive tables and prepared shell scripts for executing Hadoopcommands for single execution.
Worked with big data developers, designers and scientists in troubleshooting mapreduce job failures and issues with Hive, Pig and Sqoop.
Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Worked on design and implementation, configuration, performance tuning of Hortonworks HDP 2.3 Cluster with High Availability and Ambari 2.2.
Analyzing the Server logs for errors and exceptions, Jenkins Job - Builds - Scheduling and monitoring the console outputs.
Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment.
Experience on JIRA and ServiceNow to track issues on the big data platform.
Experienced in managing and reviewing Hadoop log files.
Configured Jenkins for successful deployment to test and production environments.
Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
Experience on Hbase High availability and manually tested using failover tests.
Create queues and allocated the clusters resources to provide the priority for jobs.
Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases with cron jobs.
Provided technical assistance for configuration, administration and monitoring of Hadoop clusters.
Coordinated with technical teams for installation of Hadoop and third related applications on systems.
Supported technical team members for automation, installation and configuration tasks.
Suggested improvement processes for all process automation scripts and tasks.
Assisted in designing, development and architecture of Hadoop and Hbase systems.
Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
Responsible for cluster Maintenance, Monitoring, Troubleshooting, Tuning, commissioning andDecommissioning of nodes.
Responsible for cluster availability and experienced on ON-call support
Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.

Confidential, Santa Cruz, CA

Hadoop Admin

Responsibilities:

Monitored workload, job performance and capacity planning using Cloudera Manager.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
Retrieved data from HDFS into relational databases with Sqoop. Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
Fine tuning hive jobs for optimized performance.
Partitioned and queried the data in Hive for further analysis by the BI team.
Implemented APACHE IMPALA for data processing on top of HIVE.
Fine tuning Hive jobs for better performance.
Benchmarking mechanisms like TERASORT, TESTDFSIO.
Involved in extracting the data from various sources into Hadoop HDFS for processing.
Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
Responsible for building scalable distributed data solutions using Hadoop.
Implemented nodes on CDH3Hadoop cluster on Red hat LINUX.
Involved in loading data from LINUX file system to HDFS.
Imported weblogs from the web servers into HDFS using Flume.
Created Hbase tables to store various data formats of PII data coming from different portfolios.
Implemented test scripts to support test-driven development and continuous integration.
Worked on tuning the performance Pig queries.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Automated workflows using shellscripts to pull data from various databases into Hadoop.
Responsible to manage data coming from different sources.
Involved in loading data from UNIX file system to HDFS.
Services through Zookeeper.
Experience in managing and reviewing Hadoop log files.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Installed oozie workflow engine to run multiple Hive and pig jobs.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.

Confidential

Linux/MySQL Administrator

Responsibilities:

Installation and configuration of Linux for new build environment.
Created volume groups logical volumes and partitions on the Linux servers and mounted file systems and created partitions.
Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Improve system performance by working with the development team to analyze, identify and resolve issues quickly.
Ensured data recovery by implementing system and application level backups.
Performed various configurations which include networking and IPTable, resolving host names and SSH keyless login.
Managing Disk File Systems, Server Performance, Users Creation and Granting file access Permissions and RAID configurations.
Automate administration tasks through the use of scripting and Job Scheduling using CRON.
Installation and configuration of Linux for new build environment.
Installing and maintaining the Linux servers
Monitoring System Metrics and logs for any problems.
Running cron-tab to back up data.
Adding, removing, or updating user account information, resetting passwords, etc.
Using Java Jdbc to load data into MySQL.
Maintaining the MySQL server and Authentication to required users for databases.
Creating and managing Logical volumes
Installing and updating packages using YUM.
Support pre-production and production support teams in the analysis of critical services and assists with maintenance operations.
Performance tuning for high transaction and volumes data in mission critical environment.
Setting up alert and level for MySQL (uptime, Users, Replication information, Alert based on different query).
Estimate MySQL database capacities; develop methods for monitoring database capacity and usage.
Develop and optimize physical design of MySQL database systems.

Confidential

Linux Administrator

Responsibilities:

Implemented different suite of Linux infrastructure like DHCP, DNS, PXE, NFS.
Evaluated new hardware, software and infrastructure solutions.
Provided on call rotation for 24*7 and assistance for the team.
Extensive use of LVM, creating Volume Groups, Logical volumes.
Performed RPM and YUM package installations, patch and other server management.
Performed scheduled backup and necessary restoration.
Configured Domain Name System (DNS) for hostname to IP resolution
Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hours
Developed and maintained installation and configuration procedures.
Performed backup and restores in Linux environment.
Implemented system and maintenance tasks using shell scripts.
Designed virtual servers for testing purposes using VMWARE.
Developed data flow, Entity Relationship and data structure diagrams.
Worked on adding and configuring devices like hard disks, etc.

We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

Minneapolis, MN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship