Hadoop Admin Resume
San Ramon, CA
SUMMARY:
- Over 8 years of total Information Technology experience with expertise in Administration and Operations experience, in Big Data and Cloud Computing Technologies
- Expertise in setting up fully distributed multi node Hadoop clusters, with Apache, Cloudera Hadoop.
- Expertise in AWS services such as EC2, Simple Storage Service (S3), Autoscaling, EBS, Glacier, VPC, ELB, RDS, IAM, Cloud Watch, and Redshift.
- Expertise in MIT kerberos and High Availability as well as Integration of Hadoop clusters.
- Experience in upgrading Hadoop clusters.
- Strong knowledge in installing, configuring and using ecosystem components like Hadoop MapReduce, Oozie, Hive, Sqoop, Pig, Flume, Zookeeper, Kafka, NameNode Recovery, HDFS High Availability Experience in Hadoop Shell commands, verifying managing and reviewing Hadoop Log files.
- Designed and Implemented CI & CD Pipelines achieving the end to end automation Supported server/VM provisioning activities, middleware installation and deployment activities via puppet.
- Written puppet manifests Provision several pre - prod environments.
- Written puppet modules to automate our build/deployment process and do an overall process improvement to any manual processes.
- Designed, Installed and Implemented / puppet. Good Knowledge in automation by using Puppet
- Implementing AWS architectures for web applications
- Experience in EC2, S3, ELB, IAM, Cloudwatch, VPC in AWS
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Extensive experience on performing administration, configuration management, monitoring, debugging, and performance tuning in Hadoop Clusters.
- Performed AWS EC2 instance mirroring, WebLogic domain creations and several proprietary middleware Installations.
- Worked in agile projects delivering end to end continuous integration/continuous delivery pipeline by Integration of tools like Jenkins, puppe and AWS for VM provisioning.
- Evaluating performance of EC2 instances their CPU, memory usage and setting up EC2 Security Groups and VPC.
- Configured and Managed Jenkins in various Environments, Linux and Windows.
- Administered Version Control systems GIT, to create daily backups and checkpoint files.
- Created various branches in GIT, merged from development branch to release branch and created tags for releases.
- Experience creating, managing and performing container based deployments using Docker images Containing Middleware and Applications together.
- Enabling/Disabling of Passive and Active check for Hosts and Service in Nagios.
- Good knowledge in installing, configuring & maintaining Chef server and workstation
- Expertise in provisioning clusters and building manifests files in puppet for any services.
- Excellent knowledge in Import/Export structured, un-structured data from various data sources such as RDBMS, Event logs, Message queues into HDFS, using a variety of tools such as Sqoop, Flume etc.
- Expertise in converting non kerberoized Hadoop cluster to Hadoop with kerberoized cluster
- Administration and Operations experience with Big Data and Cloud Computing Technologies
- Handling in setting up fully distributed multi node Hadoop clusters, with Apache and AWS EC2instances
- Handling in AWS services such as EC2, Simple Storage Service(S3), Auto scaling, EBS, ELB, RDS, IAM, Cloud Watch
- Performing administration, configuration management, monitoring, debugging, and performance tuning in Hadoop Clusters.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, HBase, Pig, Hive, Zookeeper, Sqoop, MapReduce, Kafka, Spark, Oozie, Hortonworks and Cloudera
Languages: Unix Shell Script, JavaScript, Python, Java, Pig, MySQL, HiveQL, CSS, JavaScript, HTML
DevOps Tools: Puppet, Jira, Jenkins, Docker, GIT, GitHub
Frameworks: Apache and Cloudera Hadoop, Amazon Web Services
Platforms: Ubuntu, Centos, Redhat, Amazon Linux
Web Servers: Apache/HTTPD, Nginx
Operating Systems: Windows, Macintosh, Ubuntu (Linux).
Monitoring Tools: Nagios, Pager Duty
RDBMS: MySQL, SQL Server
Data Warehousing: Hive
NoSQL Databases: MongoDB, HBase, Cassandra
Log Collector & Aggregation: Flume, Kafka
Source Control Tools: Github
Team Communication: Slack
Defect Tracking Tools: FogBugz, Jira
AWS Components: EC2, Simple Storage Service (S3), EBS, VPC, ELB, RDS, IAM, CloudWatch
Other Tools: S3Organizer, Sqoop, Oozie, Zookeeper, Hue, DBVisualizer, Google authentication to Services
PROFESSIONAL EXPERIENCE:
Confidential, San Ramon, CA
Hadoop Admin
Roles and Responsibilities:
- Administration & Monitoring Hadoop.
- Worked on Hadoop Upgradation from 4.5 to 5.2
- Monitor Hadoop cluster job performance and capacity planning
- Removing from monitoring of particular security group nodes in nagios in case of retirement
- Responsible for managing and scheduling jobs on Hadoop Cluster
- Replacement of Retired Hadoop slave nodes through AWS console and Nagios Repositories
- Performed dynamic updates of Hadoop Yarn and MapReduce memory settings
- Worked with DBA team to migrate Hive and Oozie metastore Database from MySQL to RDS
- Worked with fair and capacity schedulers, creating new queues, adding users to queue, Increase mapper and reducers capacity and also administer view and submit Mapreduce jobs
- Experience in Administration/Maintenance of source control management systems, such as GIT and GITHUB knowledge
- Hands on experience in installing and administrating CI tools like Jenkins
- Experience in integrating Shell scripts using Jenkins
- Installed and configured an automated tool Puppet that included the installation and configuration of the Puppet master, agent nodes and an admin control workstation.
- Working with Modules, Classes, Manifests in Puppet
- Experience in creating Docker images
- Used containerization technologies like Docker for building clusters for orchestrating containers deployment.
- Operations - Custom Shell scripts, VM and Environment management.
- Experience in working with Amazon EC2, S3, Glaciers
- Create multiple groups and set permission polices for various groups in AWS
- Experience in creating life cycle policies in AWS S3 for backups to Glaciers
- Experience in maintaining, executing, and scheduling build scripts to automate DEV/PROD builds.
- Configured Elastic Load Balancers with EC2 Auto scaling groups.
- Created monitors, alarms and notifications for EC2 hosts using Cloudwatch.
- Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/Ubuntu) and configuring launched instances with respect to specific applications
- Worked with IAM service creating new IAM users & groups, defining roles and policies and Identity providers
- Experience in assigning MFA in AWS using IAM and s3 buckets .
- Defined AWS Security Groups which acted as virtual firewalls that controlled the traffic allowed to reach one or more AWS EC2 instances.
- AmazonRoute53 to oversee DNS zones and furthermore give open DNS names to flexible load balancers IP.
- Using default and custom VPCs to create private cloud environments with public and private subnets
- Loaded data from Oracle, MS SQL Server, MySQL, Flat File database into HDFS, HIVE
- Fixed Namenode partition failed, fsimage not rotated, MR job failed with too many fetch failures and troubleshooting common Hadoop cluster issues
- Implemented manifest files in puppet for automated orchestration of Hadoop and Cassandra clusters
- Maintaining Github repositories for Configuration Management
- Configured distributed monitoring system Ganglia for Hadoop clusters
- Managing cluster coordination services through Zoo Keeper
- Configured and deployed Namenode High Availability Hadoop cluster with SSL and kerberoized
- Deal with the several services restart and killing the process with Pid to clear the alert
- Monitoring Log files of several services, clear files incase of Diskspace issues on sharethis nodes
- 24X7 production support for weekly schedule with Ops team
Environment: CentOS, CDH4, Hive, Sqoop, Flume, Hbase, MySQL, Cassandra, Oozie, Puppet, Pager Duty, Nagios, AWS (S3, EC2, IAM, Cloud Watch, RDS, ELB, Auto Scaling, EBS, VPC, EMR, Github.
Confidential, SFO, CA
Hadoop Administrator
Roles and Responsibilities:
- Maintaining Hortonworks distributed Hadoop Cluster
- Recovering lost ec2 key pairs
- Loading databases using sqoop
- Scheduling the Database Backups to AWS S3
- Providing data to Sales or Reporting team to their needs
- Migrating the reports as per the requirement
- Responsible for ongoing administration of data and analytics infrastructure
Environment: CentOS, Python, Shell Script, AWS (EC2, S3, IAM, Cloud Watch), MySQL, Nagios, Kafka,, Jira.
Confidential
Hadoop AdministratorRoles and Responsibilities:
- Modified the existing architecture using Auto scaling and Elastic Load Balancer.
- Implemented Auto scaling and Elastic Load Balancer
- MySQL basic administration (Backups, User Privileges etc.)
- Nagios & Pager duty setup.
- Creation of custom metrics in Cloud Watch.
Confidential
Hadoop AdministratorRoles and Responsibilities:
- Migrated the data from MySQL to HDFS using Sqoop.
- Wrote ETL scripts on Pig, Written udf's for pig.
- Installed and configured multiple Hadoop nodes
- Move the insight data into Teradata using scripts
Confidential
Hadoop/AWS Trainee
Roles and Responsibilities:
- Prepared detailed schema, and program specifications from which data base was modified
- Implemented Autoscaling and Elastic Load Balancer
- MySQL basic administration
- Nagio setup and creation of custom metrics in cloudwatch
- Utilized UNIX shell scripting/programming to build re-usable utilities
- Trained on AWS, NoSQL Databases, Cloud Operations, Implemented PoC projects