Senior Hadoop Admin Resume
San Jose, CaliforniA
SUMMARY:
- Senior Hadoop Admin/Developer with 9 years of professional IT experience which includes 5+ years of experience in Hadoop ecosystems tools and related technologies.
- Excellent understanding and thorough knowledge on Hadoop Architecture and components such as Map Reduce and Hadoop Distributed File System.
- In depth knowledge on all the Hadoop daemons Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node.
- Hands on experience in upgrading, applying patches for cloudera distribution.
- Back up of data from active cluster to a backup cluster using distcp utility.
- Experience in installing, configuring and administrating Hadoop cluster for major Hadoop Distributions.
- Experience in Importing and exporting data from RDBMS into HDFS using Sqoop Import and Export.
- Experience in designing both time and data driven automated workflows using Oozie.
- Good understanding of Cloudera, Hortonworks and MapR distributions of Hadoop.
- Experience in creating and maintaining Stored Procedures, Triggers and functions and strong RDBMS concepts using SQL server.
- Used to analyze large datasets using hive queries and extending hive and pig core functionality using custom UDFs.
- Good understanding on SOLR and AZKABAN Server.
- Experience in working with NoSQL databases like HBase.
- Experience in working with Hadoop in Standalone, pseudo and distributed modes.
- Experience on using Flume for efficiently collecting, aggregating and moving large amount of log data.
- Experience in deploying and managing the multinode Hadoop cluster with different Hadoop components (HDFS, HIVE, PIG, SQOOP, OOZIE, FLUME, ZOOKEEPER) using Cloudera Manager and Hortonworks
- Hands on experience with various ecosystem tools like HDFS, MapReduce, Hive, Pig, Oozie, Flume, Zookeeper.
- Experienced in installing, configuring and administrating Hadoop clusters of major distributions.
- Hands on experience in exporting and importing large volumes of data to and from warehouse to HDFS.
- Hands on experience in upgrading, applying patches for cloudera distribution.
- Implemented and Configured High Availability Hadoop Cluster(Quorum Based) for both HDFS and MapReduce using Zookeeper and Journal nodes
- Experience in Apache Oozie to design a workflow scheduler to manage Apache Hadoop jobs
- Hands on experience in provisioning and managing multi - tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS) and on private cloud infrastructure - open stack cloud platform.
- Back up of data from active cluster to a backup cluster using distcp utility.
- Have Experience in installation, configuration, and deployment in IBM Web sphere on AIX.
- Hands on experience with Automation tools like Puppet.
- Hands on experience in managing disk file systems,server performance, user creation and granting permissions.
- Monitoring and troubleshooting user issues with networks and systems.
TECHNICAL SKILLS:
- HDFS
- MapReduce
- Hbase
- Hive
- Pig
- Oozie
- Zookeeper
- Flume
- Sqoop. Cloudera. Access control
- Cluster maintenance
- Performance tuning
- Storage capacity management
- C
- C++
- Pig Latin. Shell Scripting
- Perl
- Python
- XML. MySQL
- UNIX
- Linux
- Windows XP/Vista/7/8
- Mac OSX
- MS Office
- LaTex
- Origin
PROFESSIONAL EXPERIENCE:
Senior Hadoop Admin
Confidential, San Jose, California
Responsibilities:
- Working with the Cloud and systems engineering teams to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with Application teams to setup new Hadoop users.
- Cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Cloudera Manager Enterprise and other tools.
- Performance tuning of Hadoop clusters and Hadoop Map Reduce routines.
- Screen Hadoop cluster job performances and capacity planning
- Monitor Hadoop cluster connectivity and security, Manage and review Hadoop log files.
- File system management and monitoring, HDFS support and maintenance.
- Worked collaboratively with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with application teams and external vendor partner resources to install operating system and Hadoop updates, patches, version upgrades when required.
Senior Hadoop Admin
Confidential, Santa Clara, CA
Responsibilities:
- Worked on installing, configuring and administrating Hadoop cluster.
- Benchmarking clusters using TeraGen,TeraSort,TeraValidate and identifying performance bottlenecks.
- Built Hadoop cluster ensuring High availability for Namenode, mixed-workload management, performance optimization, health monitoring, backup and recovery across one or more nodes.
- Validated YARN and HIVE parameters for mapreduce jobs to run successfully.
- Installed Azkaban Server to solve the problem of Hadoop job dependencies.
- Installed and configured Apache Phoenix
- Tuning the cluster and reporting the statistics.
- Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
- Continuous monitoring and managing of the Hadoop cluster
- Implemented and Configured High Availability Hadoop Cluster (Quorum Based) for HDFS, IMPALA and SOLR.
- Hands on experience with various ecosystem tools like HDFS, MapReduce, Hive, Pig, Oozie, Zookeeper
Environment: Hadoop, Cloudera CDH 5.3.3, HDFS, Map Reduce, Hive, Sqoop, Solr, HBase, Oozie, Pig.
Senior Hadoop Admin
Confidential, San Francisco, CA
Responsibilities:
- Capturing data from existing databases that provide SQL interfaces using Sqoop
- Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop
- Processed information from Hadoop HDFS. This information will comprise of various useful insights that can be used in the decision making process. All these insights will be presented to the users in the form of Charts
- Working on different Big Data technologies, good knowledge of Hadoop, Map-Reduce, Hive
- Worked on deployments and automation task
- Installed and configured Hadoop cluster in pseudo and fully distributed mode environments
- Involved in developing the data loading and extraction processes for big data analysis
- Worked on professional services engagements to help customers design, build clusters, applications, troubleshoot network, disk and operating system related issues.
- Administer Linux servers, other Unix variants, and managed hadoop clusters
- Work with HBase and Hive scripts to extract, transform and load the data into HBase and Hive
- Continuous monitoring and managing of the Hadoop cluster
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Developing scripts and batch job to schedule a bundle (group of coordinators) which consists of various Hadoop programs using Oozie
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports
Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, Cloudera CDH4, HBase, Oozie, Pig, AWS EC2 cloud
Senior Hadoop Admin/Developer
Confidential, San Bruno, CA
Responsibilities:
- The datasets are partitioned by date/timestamps and loaded into Hive tables contributing performance efficiency and used compression codec’s to compress the data to increase storage efficiency.
- Worked with Hive to perform analysis on the streamed log data, which includes user activities over social platforms and web sites to improve relevance.
- Developed of custom MapReduce programs to perform parallel pattern search on the datasets to extract unique insights and deliver the most targeted audiences for advertisers.
- Used Sqoop to integrate databases with Hadoop to import/export data.
- Worked with DevOps team in Hadoop cluster planning and installation.
- Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
- Worked on Hadoop cluster maintenance including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrades.
- Experience working with HDFS, file system designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Developed Hive queries to pre-process the data for analysis by imposing read only structure on the stream data
- Automated jobs for pulling data from FTP server to load data into Hive tables using Oozie workflow.
- Developed Pig Latin scripts to extract and filter relevant data from the web server output files to load into HDFS.
Environment: HDFS, MapReduce, Hive, Flume, Sqoop, Oozie.
Hadoop Admin
Confidential, MN
Responsibilities:
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, and backup & DR systems.
- Involved in analyzing system failures, identifying root-cause and recommendation of remediation actions. Documented issue log with solutions for future references.
- Worked with systems engineering team for planning new Hadoop environment deployments, expansion of existing Hadoop clusters.
- Monitored multiple hadoop clusters environments using Ganglia and Nagios. Monitoring workload, job performance and capacity planning using Cloudera Manager.
- Worked with application teams to install OS level updates, patches and version upgrades required for Hadoop cluster environments.
- Installed and configured Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Installation and configuration of Name Node High Availability (NNHA) using Zookeeper.
- Analyzed web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
- Experienced in Linux Administration tasks like IP Management (IP Addressing, Ethernet Bonding, Static IP and Subnetting).
- Responsible for daily system administration of Linux and Windows servers. Also implemented HTTPD, NFS, SAN and NAS on Linux Servers.
- Worked on creation of UNIX shell scripts to watch for 'null' files and trigger jobs accordingly and also had good knowledge in Python scripting language.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
- Worked on disaster management for Hadoop cluster.
- Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Zookeeper, Ganglia, Linux (CentOS/REDHAT).
Hadoop Admin/Developer
Confidential, San Jose, CA
Responsibilities:
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Installed and configured Flume, Hive, Pig, Sqoop, HBase and Oozie on the Hadoop cluster.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile devices and pushed to HDFS.
- Good understanding of job schedulers like Fair Scheduler which assigns resources to jobs such that all jobs get, on average, an equal share of resources over time and an idea about Capacity Scheduler.
- Responsible for performing peer code reviews, troubleshooting issues and maintaining status report.
- Analyzed data using Hadoop components Hive and Pig.
- Responsible for running Hadoop streaming jobs to process terabytes of xml's data.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Imported and exported the data using Sqoop Export and Sqoop Import.
Environment: CDH4, Flume, Hive, HBase, Sqoop, Pig, Oozie, Cloudera Manager, Java, Linux, CentOS.
Unix Admin
Confidential
Responsibilities:
- Setup Solaris Custom Jumpstart server and clients and implement Jumpstart installation.
- Worked with Telnet, FTP, TCP/IP, rlogin, used to inter-operate hosts.
- Contact various systems administration works under CentOS, Red hat Linux environments.
- Performed regular day-to-day system administrative tasks including User Management, Backup, Network Management, and Software Management including Documentation.
- Recommend system configurations for clients based on estimated requirements.
- Performed reorganization of disk partitions, file systems, hard disk addition, and memory upgrade.
- Monitored system activities, log maintenance, and disk space management.
- Encapsulated root file systems, and mirrored the file systems were mirrored to ensure systems had redundant boot disks.
- Administer Apache Servers. Published client’s web site in our Apache server.
- Fix all the system problems, based on system email information and users’ complaints.
- Upgrade software, add patches, and add new hardware in UNIX machines.
Java Developer
Confidential
Responsibilities:
- Developed JavaScript behavior code for user interaction.
- Created database program in SQL server to manipulate data accumulated by internet transactions.
- Wrote Servlets class to generate dynamic HTML pages.
- Developed Servlets and back-end Java classes using WebSphere application server.
- Developed an API to write XML documents from a database.
- Performed usability testing for the application using JUnit Test.
- Maintenance of a Java GUI application using JFC/Swing.
- Created complex SQL and used JDBC connectivity to access the database.
- Involved in the design and coding of the data capture templates, presentation and component templates.
- Part of the team that designed, customized and implemented metadata search and database synchronization.
Environment: Java, WebSphere 3.5, EJB, Servlets, JavaScript, JDBC, SQL, JUnit, Eclipse IDE.