We provide IT Staff Augmentation Services!

Senior Hadoop Administrator Resume

5.00/5 (Submit Your Rating)

NJ

SUMMARY:

  • 7 + years of overall experience in Systems Administration and Enterprise Application Development in diverse industries which includes expertise in Big data ecosystem related technologies.
  • 4 years of comprehensive experience as a Big Data & Analytics Administrator.
  • Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
  • Experience in installation, configuration, supporting and monitoring Hadoop clusters using HDFS.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper.
  • Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
  • Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
  • Experience in administrating Hadoop ecosystem components like MapReduce, Pig, Hive, HAWQ, SpringXD and Sqoop and also experience in setting Hadoop Cluster with Isilon.
  • Optimized the utilization of data systems, and improved the efficiency of security and storage solutions by control and prevention of loss of sensitive data.
  • Experience in planning, designing, deploying, fine-tuning and administering large scale Productions Hadoop clusters.
  • Experience in setting up and configuring and administering Hadoop cluster with major Hadoop distribution Pivotal, fine tuning and benchmarking the cluster.
  • Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS.
  • Exposure to important performance tuning Hadoop properties, rpc ports and their daemon addresses.
  • Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
  • Good knowledge about Map-Reduce Framework includes MR daemons, sorting and shuffle phase, and task.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Map-Reduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, OOZIE

Languages: Core Java, J2EE, SQL, PL/SQL, Unix Shell Scripting

Web Technologies: JSP, EJB 2.0, JNDI, JMS, JDBC, HTML, JavaScript

Web/Application servers: Tomcat 6.0/5.0/4.0, JBoss 5.1.0

Databases: Oracle 11G/10G, SQL Server, DB2, Sybase, Teradata

Operating Systems: MS-DOS, Windows XP, Windows 7, UNIX and Linux

IDE: IntelliJ IDEA 7.2, EditPlus3, Eclipse3.5, NetBeans6.5, TOAD, PL/SQL, Teradata

Frame Works: Hadoop MapReduce, MVC, Struts 2.x/1.x

Version Control: VSS Visual Source Safe, Subversion, CVS

Testing Technologies: JUnit 4/3.8

Office Packages: MS-Office 2010, 2007, 2003 and Visio

Business Intelligence: Business Object XI 3.1, Cognos 8.4

PROFESSIONAL EXPERIENCE:

Senior Hadoop Administrator

Confidential, NJ 

Responsibilities:

  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
  • Resolving tickets submitted, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Adding new Data Nodes when needed and running balancer.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
  • Done major and minor upgrades to the Hadoop cluster.
  • Done stress and performance testing, benchmark for the cluster.
  • Working closely with both internal and external cyber security customers.
  • Research effort to tightly integrate Hadoop and HPC systems.
  • Compared Hadoop to commercial big-data appliances from Netezza, XtremeData and LexisNexis. Published and presented results.
  • Research effort to tightly integrate Hadoop and HPC systems.
  • Deployed, and administered 70 node Hadoop cluster. Administered two smaller clusters.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Deployed, and administered Hadoop clusters.
  • Compared Hadoop to commercial big-data appliances from Netezza, XtremeData, and LexisNexis. Published and presented results.
  • Worked on developing Linux scripts for Job Automation.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Developing machine-learning capability via Apache Mahout.

Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, Tomcat 6.

Senior Hadoop Administrator

Confidential, NY

Responsibilities:

  • Involved in Hadoop Implementation project specializing in but not limited to Hadoop Cluster management, write MapReduce Programs, Hive Queries (HQL) and used Flume to analyze the log files.
  • Installation, Configuration and Management of BigData 2.x cluster (Hortonworks Data Platform1.3/2.2).
  • Involved and played a key role in the development of an ingestion framework based on Oozie and another framework using Java.
  • Developed Data Quality checks to match the ingested data with source in RDBMS using Hive.
  • Did a POC for data ingestion using ETL tools including Talend, DataStage, ToadDataPoint.
  • Created some custom component detection and monitoring using Zookeeper APIs.
  • Supported BigData 1.x cluster (HDP 1.3) with issues related to jobs and cluster-utilization.
  • Deep understanding and related experience with Hadoop, HBase, Hive, and YARN/Map-Reduce.
  • Enabled High-Availability for NameNode and setup fencing mechanism for split-brain scenario.
  • Enabled High-Availability for Resource Manager and several ecosystem components including Hiveserver2, Hive Metastore, and HBase.
  • Configured YARN queues - based on Capacity Scheduler for resource management.
  • Configured Node-Labels for YARN to isolate resources at a node level - separating the nodes specific for YARN applications and HBASE separately.
  • Configured CGroups to collect CPU utilization stats.
  • Setup BucketCache on HBASE specific slavenodes for improved performance.
  • Setup rack-awareness for the BigData cluster and setup rack-topology script for improved fault tolerance.
  • Setup HDFS ACLs to restrict/enable access to HDFS data.
  • Performance evaluation for Hadoop/YARN - TestDFSIO, TeraSort.
  • Performance evaluation of Hive 14 with Tez using HiveTestBench.
  • Configured Talend, DataStage and Toad DataPoint for ETL activities on Hadoop/Hive databases.
  • Backing up HBase data and HDFS using HDFS Snapshots and evaluated the performance overhead.
  • Created several recipes for automation of configuration parameters/scripts using Chef.
  • Management and configured retention period of log files for all the services across the cluster.
  • Involved in development of ETL processes with Hadoop, YARN and Hive.
  • Developed Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.
  • Coordinate with Operation/L2 team for knowledge transfer.
  • Setting up quotas and replication factor for user/group directories to keep the disk usage under control using HDFS quotas.

Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0,Pig 0.11.1,Oozie 3.3.0, H base 0.94.11,Sqoop1.4.4, Flume 1.4.0, Java, SQL, PL/SQL, Oracle 10g, Eclipse

Hadoop Administrator

Confidential, Yonkers, NY 

Responsibilities:

  • Build/deploy/configure/maintain multiple Hadoop clusters in production, staging, and development environments that process over 2 TB events per day.
  • Build/deploy/configure/maintain multiple real-time clusters consisting of Apache products: Flume, Storm, Mesos, Spark, and Kafka.
  • Created scripts to form EC2 clusters for training and for processing.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like frequency of calls, top calling customers.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Created Cassandra Advanced Data Modeling course for DataStax.
  • Secured production environments by setting up and deploying Kerberos in the Hadoop cluster.
  • Support application administrators, developers, and users. Release and support real-time products from development through production.
  • Participate in various Proof of Concept projects to evaluate new technologies for data collection and analysis.
  • Identify necessary alerts and remediation process for new components.
  • Assist in the capacity planning process.
  • Analyze and troubleshoot performance bottlenecks.
  • Serve as team facilitator for ancillary operations (Slack, PagerDuty, Jira, Change Control)
  • Imported data from MySQL server to HDFS using Sqoop.
  • Manage the day-to- day operations of the cluster for backup and support.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific Jobs.
  • Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.

Environment: Ambari, Hbase, Hive, Pig, Sqoop, Apache Ranger, Splunk, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper, RegEx, JSON.

Hadoop Administrator

Confidential , Buffalo, NY

Responsibilities:

  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Supported code/design analysis, strategy development and project planning
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
  • Performed both major and minor upgrades to the existing CDH cluster .
  • Upgraded the Hadoop cluster from cdh3 to cdh4.

Environment: Hadoop, Hive, MapReduce, Amazon Web Services (AWS), NoSQL, HDFS, JAVA, UNIX, Redhat and CentOS.

Linux Administrator

Confidential

Responsibilities:

  • Installed and maintained the Linux servers.
  • Installed Cent OS using Pre-Execution environment book and kick-start method on multiple Updated system OS and application software according to requirement.
  • Setup securities for users, reset user passwords, Locked/Unlocked user accounts.
  • Monitored System Metrics and logs for any problems.
  • Ran cron-tab jobs to back up MySQL data.
  • Involved in adding, removing, or updating user account information, resetting passwords.
  • Configured the send mail configuration file, created e-mail ids, created alias database.
  • Used heterogeneous backup software for Windows and UNIX to backup and retrieve file.
  • Worked with heterogeneous Client & Server management.
  • Maintained the RDBMS server and Authentication to required users for databases.
  • Provided technical support for issues via helpdesk tickets and the telephone.
  • Took backup at regular intervals and planned with a good disaster recovery plan.
  • Documented and maintain server, network, and support documentation including application diagrams.
  • Planned for storage and backup including studying the disk space requirement and backup device performance.
  • Responsible for Standard Operating procedures issuance to incident management team.
  • Responsible for handling escalated incidents and High Availability environment incident Management on HPUX Platform HPUX, LINX, and Solaris .
  • Problem Management on HPUX platform. Perform Root Cause Analysis using quality tools.

Environment: Hadoop, MapReduce, Hive, Pig, Oozie, Hbase, Sqoop, Flume, Java, SQL, Eclipse, UNIX Script, BO, YARN

Linux/Unix Administrator

Confidential

Responsibilities:

  • Build Linux servers. Upgrade and patch existing servers. Compile, built and upgrade Linux kernel.
  • Setup Solaris Custom Jumpstart server and clients and implement Jumpstar installation.
  • Worked with Telnet, rlogin, used to inter-operate hosts.
  • Contact various systems administration works under CentOS, Red Hat Linux environments.
  • Performed regular day-to- day system administrative tasks including User Management, Backup, and Network.
  • Management and Software Management including Documentation etc.
  • Recommend system configurations for clients based on estimated requirements.
  • Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
  • Monitored system activities, log maintenance, and disk space management.
  • Encapsulated root file systems, and mirrored the file systems were mirrored to ensure systems had redundant boot disks.
  • Administer Apache Servers. Published client’s web site in our Apache server.
  • Fix all the system problems, based on system email information and users’ complaints.
  • Upgrade software, add patches, and add new hardware in UNIX machines.
  • Configuration Management: CMDB audits and regular updates of CMDB
  • Problem Management on HPUX platform. Perform Root Cause Analysis using quality tools.
  • HA engineering for architecture of HA environment with OS / patch Release management, Application load and maintain configuration of Central OS image servers.
  • Analysis of availability, configuration management for reporting to customer Disaster Recovery planning and conducting DR tests.

Environment: Solaris 2.8, Redhat Linux 3, JumpStart, Automount, Samba, Apache, Tomcat, Netscape Planet Web Server, Java/Shell/Perl programming, Solstice DiskSuite, NIS/NFS, DNS, FTP, Win NT and 2000 Server.

We'd love your feedback!