Senior Hadoop Administrator Resume NJ - Hire IT People

SUMMARY:

7 + years of overall experience in Systems Administration and Enterprise Application Development in diverse industries which includes expertise in Big data ecosystem related technologies.
4 years of comprehensive experience as a Big Data & Analytics Administrator.
Experience in working with MapReduce programs using Apache Hadoop for working with Big Data.
Experience in installation, configuration, supporting and monitoring Hadoop clusters using HDFS.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Knowledge in job workflow scheduling and monitoring tools like oozie and Zookeeper.
Experience in designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and the Hadoop ecosystem.
Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
Experience in administrating Hadoop ecosystem components like MapReduce, Pig, Hive, HAWQ, SpringXD and Sqoop and also experience in setting Hadoop Cluster with Isilon.
Optimized the utilization of data systems, and improved the efficiency of security and storage solutions by control and prevention of loss of sensitive data.
Experience in planning, designing, deploying, fine-tuning and administering large scale Productions Hadoop clusters.
Experience in setting up and configuring and administering Hadoop cluster with major Hadoop distribution Pivotal, fine tuning and benchmarking the cluster.
Experience in deploying Hadoop cluster on Public and Private Cloud Environment like Amazon AWS.
Exposure to important performance tuning Hadoop properties, rpc ports and their daemon addresses.
Defining job flows in Hadoop environment using tools like Oozie for data scrubbing and processing.
Good knowledge about Map-Reduce Framework includes MR daemons, sorting and shuffle phase, and task.

TECHNICAL SKILLS:

Big Data Technologies: Apache Hadoop, Map-Reduce, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, OOZIE

Languages: Core Java, J2EE, SQL, PL/SQL, Unix Shell Scripting

Web Technologies: JSP, EJB 2.0, JNDI, JMS, JDBC, HTML, JavaScript

Web/Application servers: Tomcat 6.0/5.0/4.0, JBoss 5.1.0

Databases: Oracle 11G/10G, SQL Server, DB2, Sybase, Teradata

Operating Systems: MS-DOS, Windows XP, Windows 7, UNIX and Linux

IDE: IntelliJ IDEA 7.2, EditPlus3, Eclipse3.5, NetBeans6.5, TOAD, PL/SQL, Teradata

Frame Works: Hadoop MapReduce, MVC, Struts 2.x/1.x

Version Control: VSS Visual Source Safe, Subversion, CVS

Testing Technologies: JUnit 4/3.8

Office Packages: MS-Office 2010, 2007, 2003 and Visio

Business Intelligence: Business Object XI 3.1, Cognos 8.4

PROFESSIONAL EXPERIENCE:

Senior Hadoop Administrator

Confidential, NJ

Responsibilities:

Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Played responsible role for deciding the hardware configurations for the cluster along with other teams in the company.
Resolving tickets submitted, P1 issues, troubleshoot the error documenting, resolving the errors.
Adding new Data Nodes when needed and running balancer.
Responsible for building scalable distributed data solutions using Hadoop.
Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
Done major and minor upgrades to the Hadoop cluster.
Done stress and performance testing, benchmark for the cluster.
Working closely with both internal and external cyber security customers.
Research effort to tightly integrate Hadoop and HPC systems.
Compared Hadoop to commercial big-data appliances from Netezza, XtremeData and LexisNexis. Published and presented results.
Research effort to tightly integrate Hadoop and HPC systems.
Deployed, and administered 70 node Hadoop cluster. Administered two smaller clusters.
Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
Deployed, and administered Hadoop clusters.
Compared Hadoop to commercial big-data appliances from Netezza, XtremeData, and LexisNexis. Published and presented results.
Worked on developing Linux scripts for Job Automation.
Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
Developing machine-learning capability via Apache Mahout.

Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, Tomcat 6.

Senior Hadoop Administrator

Confidential, NY

Responsibilities:

Involved in Hadoop Implementation project specializing in but not limited to Hadoop Cluster management, write MapReduce Programs, Hive Queries (HQL) and used Flume to analyze the log files.
Installation, Configuration and Management of BigData 2.x cluster (Hortonworks Data Platform1.3/2.2).
Involved and played a key role in the development of an ingestion framework based on Oozie and another framework using Java.
Developed Data Quality checks to match the ingested data with source in RDBMS using Hive.
Did a POC for data ingestion using ETL tools including Talend, DataStage, ToadDataPoint.
Created some custom component detection and monitoring using Zookeeper APIs.
Supported BigData 1.x cluster (HDP 1.3) with issues related to jobs and cluster-utilization.
Deep understanding and related experience with Hadoop, HBase, Hive, and YARN/Map-Reduce.
Enabled High-Availability for NameNode and setup fencing mechanism for split-brain scenario.
Enabled High-Availability for Resource Manager and several ecosystem components including Hiveserver2, Hive Metastore, and HBase.
Configured YARN queues - based on Capacity Scheduler for resource management.
Configured Node-Labels for YARN to isolate resources at a node level - separating the nodes specific for YARN applications and HBASE separately.
Configured CGroups to collect CPU utilization stats.
Setup BucketCache on HBASE specific slavenodes for improved performance.
Setup rack-awareness for the BigData cluster and setup rack-topology script for improved fault tolerance.
Setup HDFS ACLs to restrict/enable access to HDFS data.
Performance evaluation for Hadoop/YARN - TestDFSIO, TeraSort.
Performance evaluation of Hive 14 with Tez using HiveTestBench.
Configured Talend, DataStage and Toad DataPoint for ETL activities on Hadoop/Hive databases.
Backing up HBase data and HDFS using HDFS Snapshots and evaluated the performance overhead.
Created several recipes for automation of configuration parameters/scripts using Chef.
Management and configured retention period of log files for all the services across the cluster.
Involved in development of ETL processes with Hadoop, YARN and Hive.
Developed Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.
Coordinate with Operation/L2 team for knowledge transfer.
Setting up quotas and replication factor for user/group directories to keep the disk usage under control using HDFS quotas.

Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0,Pig 0.11.1,Oozie 3.3.0, H base 0.94.11,Sqoop1.4.4, Flume 1.4.0, Java, SQL, PL/SQL, Oracle 10g, Eclipse

Hadoop Administrator

Confidential, Yonkers, NY

Responsibilities:

Build/deploy/configure/maintain multiple Hadoop clusters in production, staging, and development environments that process over 2 TB events per day.
Build/deploy/configure/maintain multiple real-time clusters consisting of Apache products: Flume, Storm, Mesos, Spark, and Kafka.
Created scripts to form EC2 clusters for training and for processing.
Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like frequency of calls, top calling customers.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Created Cassandra Advanced Data Modeling course for DataStax.
Secured production environments by setting up and deploying Kerberos in the Hadoop cluster.
Support application administrators, developers, and users. Release and support real-time products from development through production.
Participate in various Proof of Concept projects to evaluate new technologies for data collection and analysis.
Identify necessary alerts and remediation process for new components.
Assist in the capacity planning process.
Analyze and troubleshoot performance bottlenecks.
Serve as team facilitator for ancillary operations (Slack, PagerDuty, Jira, Change Control)
Imported data from MySQL server to HDFS using Sqoop.
Manage the day-to- day operations of the cluster for backup and support.
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific Jobs.
Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.

Environment: Ambari, Hbase, Hive, Pig, Sqoop, Apache Ranger, Splunk, Yarn, Apache Oozie workflow scheduler, Flume, Zookeeper, RegEx, JSON.

Hadoop Administrator

Confidential , Buffalo, NY

Responsibilities:

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
Shared responsibility for administration of Hadoop, Hive and Pig.
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Supported code/design analysis, strategy development and project planning
Created reports for the BI team using Sqoop to export data into HDFS and Hive
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Assisted with data capacity planning and node forecasting.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Administrator for Pig, Hive and Hbase installing updates, patches and upgrades.
Performed both major and minor upgrades to the existing CDH cluster .
Upgraded the Hadoop cluster from cdh3 to cdh4.

Environment: Hadoop, Hive, MapReduce, Amazon Web Services (AWS), NoSQL, HDFS, JAVA, UNIX, Redhat and CentOS.

Linux Administrator

Confidential

Responsibilities:

Installed and maintained the Linux servers.
Installed Cent OS using Pre-Execution environment book and kick-start method on multiple Updated system OS and application software according to requirement.
Setup securities for users, reset user passwords, Locked/Unlocked user accounts.
Monitored System Metrics and logs for any problems.
Ran cron-tab jobs to back up MySQL data.
Involved in adding, removing, or updating user account information, resetting passwords.
Configured the send mail configuration file, created e-mail ids, created alias database.
Used heterogeneous backup software for Windows and UNIX to backup and retrieve file.
Worked with heterogeneous Client & Server management.
Maintained the RDBMS server and Authentication to required users for databases.
Provided technical support for issues via helpdesk tickets and the telephone.
Took backup at regular intervals and planned with a good disaster recovery plan.
Documented and maintain server, network, and support documentation including application diagrams.
Planned for storage and backup including studying the disk space requirement and backup device performance.
Responsible for Standard Operating procedures issuance to incident management team.
Responsible for handling escalated incidents and High Availability environment incident Management on HPUX Platform HPUX, LINX, and Solaris .
Problem Management on HPUX platform. Perform Root Cause Analysis using quality tools.

Environment: Hadoop, MapReduce, Hive, Pig, Oozie, Hbase, Sqoop, Flume, Java, SQL, Eclipse, UNIX Script, BO, YARN

Linux/Unix Administrator

Confidential

Responsibilities:

Build Linux servers. Upgrade and patch existing servers. Compile, built and upgrade Linux kernel.
Setup Solaris Custom Jumpstart server and clients and implement Jumpstar installation.
Worked with Telnet, rlogin, used to inter-operate hosts.
Contact various systems administration works under CentOS, Red Hat Linux environments.
Performed regular day-to- day system administrative tasks including User Management, Backup, and Network.
Management and Software Management including Documentation etc.
Recommend system configurations for clients based on estimated requirements.
Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
Monitored system activities, log maintenance, and disk space management.
Encapsulated root file systems, and mirrored the file systems were mirrored to ensure systems had redundant boot disks.
Administer Apache Servers. Published client’s web site in our Apache server.
Fix all the system problems, based on system email information and users’ complaints.
Upgrade software, add patches, and add new hardware in UNIX machines.
Configuration Management: CMDB audits and regular updates of CMDB
Problem Management on HPUX platform. Perform Root Cause Analysis using quality tools.
HA engineering for architecture of HA environment with OS / patch Release management, Application load and maintain configuration of Central OS image servers.
Analysis of availability, configuration management for reporting to customer Disaster Recovery planning and conducting DR tests.

Environment: Solaris 2.8, Redhat Linux 3, JumpStart, Automount, Samba, Apache, Tomcat, Netscape Planet Web Server, Java/Shell/Perl programming, Solstice DiskSuite, NIS/NFS, DNS, FTP, Win NT and 2000 Server.

We provide IT Staff Augmentation Services!

Senior Hadoop Administrator Resume

NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship