Senior Hadoop Administrator Resume
NJ
SUMMARY:
- 7 + years of overall experience in Systems Administration and Enterprise Application Development in diverse industries which includes expertise in Big data ecosystem related technologies.
- 4 years of comprehensive experience as a Big Data & Analytics Administrator.
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using HDFS.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper.
- Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
- Experience in administrating Hadoop ecosystem components like MapReduce, Pig, Hive, HAWQ, SpringXD and Sqoop and also experience in setting Hadoop Cluster with Isilon.
- Optimized the utilization of data systems, and improved the efficiency of security and storage solutions by control and prevention of loss of sensitive data.
- Experience in planning, designing, deploying, fine-tuning and administering large scale Productions Hadoop clusters.
- Experience in setting up and configuring and administering Hadoop cluster with major Hadoop distribution Pivotal, fine tuning and benchmarking the cluster.
- Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS.
- Exposure to important performance tuning Hadoop properties, rpc ports and their daemon addresses.
- Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
- Good knowledge about Map-Reduce Framework includes MR daemons, sorting and shuffle phase, and task.
TECHNICAL SKILLS:
Big Data Technologies: Apache Hadoop, Map-Reduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, OOZIE
Languages: Core Java, J2EE, SQL, PL/SQL, Unix Shell Scripting
Web Technologies: JSP, EJB 2.0, JNDI, JMS, JDBC, HTML, JavaScript
Web/Application servers: Tomcat 6.0/5.0/4.0, JBoss 5.1.0
Databases: Oracle 11G/10G, SQL Server, DB2, Sybase, Teradata
Operating Systems: MS-DOS, Windows XP, Windows 7, UNIX and Linux
IDE: IntelliJ IDEA 7.2, EditPlus3, Eclipse3.5, NetBeans6.5, TOAD, PL/SQL, Teradata
Frame Works: Hadoop MapReduce, MVC, Struts 2.x/1.x
Version Control: VSS Visual Source Safe, Subversion, CVS
Testing Technologies: JUnit 4/3.8
Office Packages: MS-Office 2010, 2007, 2003 and Visio
Business Intelligence: Business Object XI 3.1, Cognos 8.4
PROFESSIONAL EXPERIENCE:
Senior Hadoop Administrator
Confidential, NJ
Responsibilities:
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
- Resolving tickets submitted, P1 issues, troubleshoot the error documenting, resolving the errors.
- Adding new Data Nodes when needed and running balancer.
- Responsible for building scalable distributed data solutions using Hadoop.
- Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
- Done major and minor upgrades to the Hadoop cluster.
- Done stress and performance testing, benchmark for the cluster.
- Working closely with both internal and external cyber security customers.
- Research effort to tightly integrate Hadoop and HPC systems.
- Compared Hadoop to commercial big-data appliances from Netezza, XtremeData and LexisNexis. Published and presented results.
- Research effort to tightly integrate Hadoop and HPC systems.
- Deployed, and administered 70 node Hadoop cluster. Administered two smaller clusters.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
- Deployed, and administered Hadoop clusters.
- Compared Hadoop to commercial big-data appliances from Netezza, XtremeData, and LexisNexis. Published and presented results.
- Worked on developing Linux scripts for Job Automation.
- Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
- Developing machine-learning capability via Apache Mahout.
Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, Tomcat 6.
Senior Hadoop Administrator
Confidential, NY
Responsibilities:
- Involved in Hadoop Implementation project specializing in but not limited to Hadoop Cluster management, write MapReduce Programs, Hive Queries (HQL) and used Flume to analyze the log files.
- Installation, Configuration and Management of BigData 2.x cluster (Hortonworks Data Platform1.3/2.2).
- Involved and played a key role in the development of an ingestion framework based on Oozie and another framework using Java.
- Developed Data Quality checks to match the ingested data with source in RDBMS using Hive.
- Did a POC for data ingestion using ETL tools including Talend, DataStage, ToadDataPoint.
- Created some custom component detection and monitoring using Zookeeper APIs.
- Supported BigData 1.x cluster (HDP 1.3) with issues related to jobs and cluster-utilization.
- Deep understanding and related experience with Hadoop, HBase, Hive, and YARN/Map-Reduce.
- Enabled High-Availability for NameNode and setup fencing mechanism for split-brain scenario.
- Enabled High-Availability for Resource Manager and several ecosystem components including Hiveserver2, Hive Metastore, and HBase.
- Configured YARN queues - based on Capacity Scheduler for resource management.
- Configured Node-Labels for YARN to isolate resources at a node level - separating the nodes specific for YARN applications and HBASE separately.
- Configured CGroups to collect CPU utilization stats.
- Setup BucketCache on HBASE specific slavenodes for improved performance.
- Setup rack-awareness for the BigData cluster and setup rack-topology script for improved fault tolerance.
- Setup HDFS ACLs to restrict/enable access to HDFS data.
- Performance evaluation for Hadoop/YARN - TestDFSIO, TeraSort.
- Performance evaluation of Hive 14 with Tez using HiveTestBench.
- Configured Talend, DataStage and Toad DataPoint for ETL activities on Hadoop/Hive databases.
- Backing up HBase data and HDFS using HDFS Snapshots and evaluated the performance overhead.
- Created several recipes for automation of configuration parameters/scripts using Chef.
- Management and configured retention period of log files for all the services across the cluster.
- Involved in development of ETL processes with Hadoop, YARN and Hive.
- Developed Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.
- Coordinate with Operation/L2 team for knowledge transfer.
- Setting up quotas and replication factor for user/group directories to keep the disk usage under control using HDFS quotas.
Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0,Pig 0.11.1,Oozie 3.3.0, H base 0.94.11,Sqoop1.4.4, Flume 1.4.0, Java, SQL, PL/SQL, Oracle 10g, Eclipse
Hadoop Administrator
Confidential, Yonkers, NY
Responsibilities:
- Build/deploy/configure/maintain multiple Hadoop clusters in production, staging, and development environments that process over 2 TB events per day.
- Build/deploy/configure/maintain multiple real-time clusters consisting of Apache products: Flume, Storm, Mesos, Spark, and Kafka.
- Created scripts to form EC2 clusters for training and for processing.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like frequency of calls, top calling customers.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Created Cassandra Advanced Data Modeling course for DataStax.
- Secured production environments by setting up and deploying Kerberos in the Hadoop cluster.
- Support application administrators, developers, and users. Release and support real-time products from development through production.
- Participate in various Proof of Concept projects to evaluate new technologies for data collection and analysis.
- Identify necessary alerts and remediation process for new components.
- Assist in the capacity planning process.
- Analyze and troubleshoot performance bottlenecks.
- Serve as team facilitator for ancillary operations (Slack, PagerDuty, Jira, Change Control)
- Imported data from MySQL server to HDFS using Sqoop.
- Manage the day-to- day operations of the cluster for backup and support.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific Jobs.
- Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.
Environment: Ambari, Hbase, Hive, Pig, Sqoop, Apache Ranger, Splunk, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper, RegEx, JSON.
Hadoop Administrator
Confidential , Buffalo, NY
Responsibilities:
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning
- Created reports for the BI team using Sqoop to export data into HDFS and Hive
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
- Performed both major and minor upgrades to the existing CDH cluster .
- Upgraded the Hadoop cluster from cdh3 to cdh4.
Environment: Hadoop, Hive, MapReduce, Amazon Web Services (AWS), NoSQL, HDFS, JAVA, UNIX, Redhat and CentOS.
Linux Administrator
Confidential
Responsibilities:
- Installed and maintained the Linux servers.
- Installed Cent OS using Pre-Execution environment book and kick-start method on multiple Updated system OS and application software according to requirement.
- Setup securities for users, reset user passwords, Locked/Unlocked user accounts.
- Monitored System Metrics and logs for any problems.
- Ran cron-tab jobs to back up MySQL data.
- Involved in adding, removing, or updating user account information, resetting passwords.
- Configured the send mail configuration file, created e-mail ids, created alias database.
- Used heterogeneous backup software for Windows and UNIX to backup and retrieve file.
- Worked with heterogeneous Client & Server management.
- Maintained the RDBMS server and Authentication to required users for databases.
- Provided technical support for issues via helpdesk tickets and the telephone.
- Took backup at regular intervals and planned with a good disaster recovery plan.
- Documented and maintain server, network, and support documentation including application diagrams.
- Planned for storage and backup including studying the disk space requirement and backup device performance.
- Responsible for Standard Operating procedures issuance to incident management team.
- Responsible for handling escalated incidents and High Availability environment incident Management on HPUX Platform HPUX, LINX, and Solaris .
- Problem Management on HPUX platform. Perform Root Cause Analysis using quality tools.
Environment: Hadoop, MapReduce, Hive, Pig, Oozie, Hbase, Sqoop, Flume, Java, SQL, Eclipse, UNIX Script, BO, YARN
Linux/Unix Administrator
Confidential
Responsibilities:
- Build Linux servers. Upgrade and patch existing servers. Compile, built and upgrade Linux kernel.
- Setup Solaris Custom Jumpstart server and clients and implement Jumpstar installation.
- Worked with Telnet, rlogin, used to inter-operate hosts.
- Contact various systems administration works under CentOS, Red Hat Linux environments.
- Performed regular day-to- day system administrative tasks including User Management, Backup, and Network.
- Management and Software Management including Documentation etc.
- Recommend system configurations for clients based on estimated requirements.
- Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
- Monitored system activities, log maintenance, and disk space management.
- Encapsulated root file systems, and mirrored the file systems were mirrored to ensure systems had redundant boot disks.
- Administer Apache Servers. Published client’s web site in our Apache server.
- Fix all the system problems, based on system email information and users’ complaints.
- Upgrade software, add patches, and add new hardware in UNIX machines.
- Configuration Management: CMDB audits and regular updates of CMDB
- Problem Management on HPUX platform. Perform Root Cause Analysis using quality tools.
- HA engineering for architecture of HA environment with OS / patch Release management, Application load and maintain configuration of Central OS image servers.
- Analysis of availability, configuration management for reporting to customer Disaster Recovery planning and conducting DR tests.
Environment: Solaris 2.8, Redhat Linux 3, JumpStart, Automount, Samba, Apache, Tomcat, Netscape Planet Web Server, Java/Shell/Perl programming, Solstice DiskSuite, NIS/NFS, DNS, FTP, Win NT and 2000 Server.