Hadoop Engineer Resume
Milpitas, CA
SUMMARY:
- Over 8+ years of IT experience in design, implementation, troubleshooting and maintenance of complex Enterprise Infrastructure.
- 6+ years of hands - on experience in installing, patching, upgrading and configuring Linux based operating system - RHEL and CentOS in a large set of clusters.
- 5+ years of experience in configuring, installing, benchmarking and managing Apache, Hortonworks and Cloudera distribution of Hadoop.
- 4+ years of extensive hands-on experience in IP network design, network integration, deployment and troubleshooting.
- Experience in configuring AWS EC2, S3, VPC, RDS, RedShift Data Warehouse, Cloud Watch, Cloud Formation, Cloud Trail, IAM, and SNS.
- Expertise on using Amazon AWS API tools like: Linux Command line, Puppet integrated AWS API tools
- Experience in deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
- Experience in installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
- Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
- Experience in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
- Experience in managing Hadoop cluster using Cloudera Manager.
- Experience in using Zookeeper for coordinating the distributed applications.
- Experience in developing PIG and HIVE scripting for data processing on HDFS.
- Experience in scheduling jobs using OOZIE workflow.
- Experience in configuring, installing, managing and administrating HBase clusters.
- Experience in managing Hadoop resource using Static and Dynamic Resource Pools.
- Experience in installing minor patches and upgrading Hadoop Cluster to major version.
- Experience in designing, installing and configuring Confidential ESXi, within vSphere 5 environment with Virtual Center management, Consolidated Backup, DRS, HA, vMotion and Confidential Data.
- Experience in designing and building disaster recovery plan for Hadoop Cluster to provide business continuity.
- Extensive Experience of Operating Systems including Windows, Red Hat, Cent OS.
- Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of professional.
PROFESSIONAL EXPERIENCE:
Hadoop Engineer
Confidential,Milpitas, CA
Responsibilities:- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Installed and configured RHEL7 EC2 instances for Production, QA and Development environment.
- Installed Kerberos for authentication of application and Hadoop service users.
- Responsible for planning, installing, and supporting AWS infrastructure.
- Supported technical team in management and review of Hadoop logs.
- Assisted in creation of ETL processes for transformation of Data from Teradata to Hadoop Landing Zone.
- Installed application on AWS EC2 instances and configured the storage on S3 buckets.
- Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Worked on AWS - Amazon Cloud - EC2, Security Groups, Elastic IP's, Load balancers, Auto scaling groups, S3, Elastic Bean Stack, Direct Connect, VPC, Cloud watch, IAM and many other services as well.
- Created RedShift data warehouse Cluster using AWS Management console with 10 petabytes of data with a few clicks in the VPC.
- Used AWS S3 as the data source for the RedShift.
- Using ODBC/JDBC connections to connect Hive metastore to transfer the metadata to RedShift.
- Used AWS import/export and direct connect service to transfer the data privately to AWS S3 between our network data center and AWS.
- Configured replication of snapshots to s3 in another reason for the disaster recovery.
- ORC formatted data in RedShift.
- Monitored Metrics for compute utilization, storage utilization, and read/write traffic to our Amazon Redshift data warehouse cluster
- Designed the future state architecture of various applications which are being migrated from on premise data center to AWS considering the HA and DR of those applications.
- Monitored resources such as EC2, CPU memory, Amazon RDS DB services and EBS volumes using Cloud Watch.
- Responsible to create various Cloud Watch alarms that sends an Amazon Simple Notification Service (SNS) message when the alarm triggers.
- Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
- Experienced in configuring AWS S3 and their life cycle policies and to backup files and archive files in Amazon Glacier.
- Designed Stacks using Amazon Cloud Formation templates to launch AWS Infrastructure and resources.
- Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups.
- Designed, configured and managed public/private cloud infrastructures utilizing Amazon Web Services and Experienced in creating Amazon EC2 instances and setting up security groups and Configured Elastic Load balancers.
- Worked on auto scaling the instances to design cost effective, fault tolerant and highly reliable systems.
- Worked on the POC for implementation of NAS solution in AWS for various applications with dependencies.
- Worked on Amazon RDS which includes automatic failover and high availability at the database layer for MYSQL workloads.
- Captured regular snapshots for EBS volumes using CPM Cloud protection manager.
- Created VPC s virtual private cloud with both public and private subnets and groups for servers and created security groups to associate with the networks.
- Designed roles and groups for users and resources using AWS Identity access management IAM.
- Enabled MFA multi factor authentication to secure the AWS accounts.
- Experienced in supporting multi region and multi-AZ applications in AWS.
- Written templates for AWS infrastructure as a code using Terraform to build staging and production environments.
Hadoop Engineer
Confidential,Culver City, CA
Responsibilities:•
- Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director, Cloudera Manager.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
- Installed Kerberos for authentication of application and Hadoop service users.
- Used Cron job to backup Hadoop Service databases to S3 buckets.
- Supported technical team in management and review of Hadoop logs.
- Assisted in creation of ETL processes for transformation of Data from Oracle and SAP to Hadoop Landing Zone.
- Installed application on AWS EC2 instances and configured the storage on S3 buckets.
- Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
- Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
- Used Cloudera Navigator for data governance: Audit and Linage.
- Configured Apache Sentry for fine - grained authorization and role-based access control of data in Hadoop.
- Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
- Monitoring performance and tuning configuration of services in Hadoop Cluster.
- Imported the data from relational databases into HDFS using Sqoop.
- Involved in creating Hive DB, tables and load flat files.
- Used Oozie to schedule jobs.
- Configured Apache Phoenix on top HBase to query data through SQL.
Environment: Oozie, CDH 5.8, 5.9 and 5.10 Hadoop Cluster, AWS, RHEL6 EC2, S3, Sqoop, Apache, SQL
Hadoop & AWS - Administrator
Confidential,Palo Alto, CA
Responsibilities:- Manage multiple AWS accounts with multiple VPCs for both production and non - production where primary objectives are automation, build out, integration and cost control
- Design AWS formation templates to create VPC architecture, EC2s, Subnets and NATS to meet high availability application and security parameters across multiple AZs
- Design roles and groups for users and resources using IAM
- Create and manage S3 buckets and policies for storage and backup purposes
- Support, manage and maintain researchers' development efforts with custom applications, etc. on AWS e.g. WordPress, Shiny app on Rstudio, QlikSense, etc.
- Work on automation and continuous integration process using Jenkins and Ansible
- Implement EC2 backup strategies by creating EBS Snapshots and attaching the volumes to EC2s when needed
- Manage migration of on-prem servers to AWS by creating golden images for upload and deployment
- Manage several Linux and some Windows servers
- Manage, maintain and deploy to test/development, staging and production environments
- Liaise with Developers to manage upgrades and new releases of applications, tools and systems
- Document all system configurations, build processes and best practices, backup procedures, troubleshooting guidelines using Atlassian Confluence
- Provide support across both technical and non-technical teams
- Responsible for the installation, configuration, maintenance and troubleshooting of Hadoop Cluster. Duties included monitoring cluster performance using various tools to ensure the availability, integrity and confidentiality of application and equipment.
- Experience in installing and configuring RHEL servers in Production, Test and Development environment and used them in building application and database servers.
- Deployed the Hadoop cluster in cloud environment with scalable nodes as per the business requirement.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
- Improved the Hadoop cluster performance by considering the OS kernel, Disk I/O, Networking, memory, reducer buffer, mapper task, JVM task and HDFS by setting appropriate configuration parameters.
- Experience in commissioning and decommissioning nodes of Hadoop cluster.
- Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
- Imported the data from relational databases into HDFS using Sqoop.
- Performed administration, troubleshooting and maintenance of ETL and ELT processes.
- Managing and reviewing Hadoop log files and supporting MapReduce programs running on the cluster.
- Used Apache Solr to search data in HDFS Hadoop cluster.
- Involved in creating Hive tables, loading data, and writing Hive queries
- Involved in upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
- Implemented APACHE IMPALA for data processing on top of HIVE.
- Scheduled jobs using OOZIE workflow.
- Used Hortonworks Apache Falcon for data management and pipeline process in the Hadoop cluster.
- Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
- Developed PIG and HIVE scripting for data processing on HDFS.
- Configuring, installing, managing and administrating HBase clusters.
- Experience in managing the cluster resources by implementing fair and capacity scheduler.
- Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
- Supported technical team members for automation, installation and configuration tasks.
- Conducted detailed analysis of system and application architecture components as per functional requirements.
- Coordinated with technical team for production deployment of software applications for maintenance.
Environment: Puppet, HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, RHEL, MySQL
Hadoop Administrator
Confidential
Responsibilities:- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Installed and configured CDH 5.3 cluster using Cloudera Manager.
- Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Managed and reviewed Hadoop Log files.
- Implemented Rack Awareness for data locality optimization.
- Installed and configured Hive with remote Metastore using MySQL.
- Pro - actively monitored systems and services and implementation of Hadoop Deployment, configuration management, performance, backup and procedures.
- Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Worked on Recovery of Node failure.
- Performed Hadoop Upgrade activities.
- Managed and scheduling Jobs on a Hadoop cluster.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
- Installed and configured Kerberos for the authentication of users and Hadoop daemons.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Worked with support teams to resolve performance issues.
- Worked on testing, implementation and documentation.
Environment: HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, RHEL, MySQL
Hadoop Administrator - Developer
Confidential
Responsibilities:- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre - process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning
- Created reports for the BI team using Sqoop to export data into HDFS and Hive
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Administrator for Pig, Hive and HBase installing updates, patches and upgrades.
- Performed both major and minor upgrades to the existing CDH cluster.
- Upgraded the Hadoop cluster from cdh3 to cdh4.
Environment: Confidential 3.5, Solaris 2.6/2.7/8, Oracle 10g, Weblogic10.x, Veritas NetBackup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.
System Administrator
Confidential
Responsibilities:- Installation, Configuration, Upgradation and administration of Sun Solaris, Red Hat Linux.
- User account management and support.
- Jumpstart & Kick - start OS integration, DDNS, DHCP, SMTP, Samba, NFS, FTP, SSH, LDAP integration.
- Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
- Responsible for configuring and managing Squid server in Linux.
- Configuration and Administration of NIS environment.
- Managing file systems and disk management using Solstice Disk suite.
- Involved in Installing and configuring of NFS.
- Worked on Solaris volume manager to create file systems as per user and database requirements.
- Trouble shooting the system and end user issues.
- Responsible for configuring real time backup of web servers.
- Log file was managed for troubleshooting and probable errors.
- Responsible for reviewing all open tickets, resolve and close any existing tickets.
- Document solutions for any issues that have not been discovered previously.
Environment: Confidential 3.5, Solaris 2.6/2.7/8, Oracle 10g, Weblogic10.x, Veritas NetBackup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.