Hadoop Engineer Resume Milpitas, CA - Hire IT People

SUMMARY:

Over 8+ years of IT experience in design, implementation, troubleshooting and maintenance of complex Enterprise Infrastructure.
6+ years of hands - on experience in installing, patching, upgrading and configuring Linux based operating system - RHEL and CentOS in a large set of clusters.
5+ years of experience in configuring, installing, benchmarking and managing Apache, Hortonworks and Cloudera distribution of Hadoop.
4+ years of extensive hands-on experience in IP network design, network integration, deployment and troubleshooting.
Experience in configuring AWS EC2, S3, VPC, RDS, RedShift Data Warehouse, Cloud Watch, Cloud Formation, Cloud Trail, IAM, and SNS.
Expertise on using Amazon AWS API tools like: Linux Command line, Puppet integrated AWS API tools
Experience in deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
Experience in installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
Experience in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
Experience in managing Hadoop cluster using Cloudera Manager.
Experience in using Zookeeper for coordinating the distributed applications.
Experience in developing PIG and HIVE scripting for data processing on HDFS.
Experience in scheduling jobs using OOZIE workflow.
Experience in configuring, installing, managing and administrating HBase clusters.
Experience in managing Hadoop resource using Static and Dynamic Resource Pools.
Experience in installing minor patches and upgrading Hadoop Cluster to major version.
Experience in designing, installing and configuring Confidential ESXi, within vSphere 5 environment with Virtual Center management, Consolidated Backup, DRS, HA, vMotion and Confidential Data.
Experience in designing and building disaster recovery plan for Hadoop Cluster to provide business continuity.
Extensive Experience of Operating Systems including Windows, Red Hat, Cent OS.
Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of professional.

PROFESSIONAL EXPERIENCE:

Hadoop Engineer

Confidential,Milpitas, CA

Responsibilities:

Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
Installed and configured RHEL7 EC2 instances for Production, QA and Development environment.
Installed Kerberos for authentication of application and Hadoop service users.
Responsible for planning, installing, and supporting AWS infrastructure.
Supported technical team in management and review of Hadoop logs.
Assisted in creation of ETL processes for transformation of Data from Teradata to Hadoop Landing Zone.
Installed application on AWS EC2 instances and configured the storage on S3 buckets.
Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
Worked on AWS - Amazon Cloud - EC2, Security Groups, Elastic IP's, Load balancers, Auto scaling groups, S3, Elastic Bean Stack, Direct Connect, VPC, Cloud watch, IAM and many other services as well.
Created RedShift data warehouse Cluster using AWS Management console with 10 petabytes of data with a few clicks in the VPC.
Used AWS S3 as the data source for the RedShift.
Using ODBC/JDBC connections to connect Hive metastore to transfer the metadata to RedShift.
Used AWS import/export and direct connect service to transfer the data privately to AWS S3 between our network data center and AWS.
Configured replication of snapshots to s3 in another reason for the disaster recovery.
ORC formatted data in RedShift.
Monitored Metrics for compute utilization, storage utilization, and read/write traffic to our Amazon Redshift data warehouse cluster
Designed the future state architecture of various applications which are being migrated from on premise data center to AWS considering the HA and DR of those applications.
Monitored resources such as EC2, CPU memory, Amazon RDS DB services and EBS volumes using Cloud Watch.
Responsible to create various Cloud Watch alarms that sends an Amazon Simple Notification Service (SNS) message when the alarm triggers.
Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
Experienced in configuring AWS S3 and their life cycle policies and to backup files and archive files in Amazon Glacier.
Designed Stacks using Amazon Cloud Formation templates to launch AWS Infrastructure and resources.
Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups.
Designed, configured and managed public/private cloud infrastructures utilizing Amazon Web Services and Experienced in creating Amazon EC2 instances and setting up security groups and Configured Elastic Load balancers.
Worked on auto scaling the instances to design cost effective, fault tolerant and highly reliable systems.
Worked on the POC for implementation of NAS solution in AWS for various applications with dependencies.
Worked on Amazon RDS which includes automatic failover and high availability at the database layer for MYSQL workloads.
Captured regular snapshots for EBS volumes using CPM Cloud protection manager.
Created VPC s virtual private cloud with both public and private subnets and groups for servers and created security groups to associate with the networks.
Designed roles and groups for users and resources using AWS Identity access management IAM.
Enabled MFA multi factor authentication to secure the AWS accounts.
Experienced in supporting multi region and multi-AZ applications in AWS.
Written templates for AWS infrastructure as a code using Terraform to build staging and production environments.

Hadoop Engineer

Confidential,Culver City, CA

Responsibilities:

•

Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director, Cloudera Manager.
Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
Installed Kerberos for authentication of application and Hadoop service users.
Used Cron job to backup Hadoop Service databases to S3 buckets.
Supported technical team in management and review of Hadoop logs.
Assisted in creation of ETL processes for transformation of Data from Oracle and SAP to Hadoop Landing Zone.
Installed application on AWS EC2 instances and configured the storage on S3 buckets.
Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
Used Cloudera Navigator for data governance: Audit and Linage.
Configured Apache Sentry for fine - grained authorization and role-based access control of data in Hadoop.
Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
Monitoring performance and tuning configuration of services in Hadoop Cluster.
Imported the data from relational databases into HDFS using Sqoop.
Involved in creating Hive DB, tables and load flat files.
Used Oozie to schedule jobs.
Configured Apache Phoenix on top HBase to query data through SQL.

Environment: Oozie, CDH 5.8, 5.9 and 5.10 Hadoop Cluster, AWS, RHEL6 EC2, S3, Sqoop, Apache, SQL

Hadoop & AWS - Administrator

Confidential,Palo Alto, CA

Responsibilities:

Manage multiple AWS accounts with multiple VPCs for both production and non - production where primary objectives are automation, build out, integration and cost control
Design AWS formation templates to create VPC architecture, EC2s, Subnets and NATS to meet high availability application and security parameters across multiple AZs
Design roles and groups for users and resources using IAM
Create and manage S3 buckets and policies for storage and backup purposes
Support, manage and maintain researchers' development efforts with custom applications, etc. on AWS e.g. WordPress, Shiny app on Rstudio, QlikSense, etc.
Work on automation and continuous integration process using Jenkins and Ansible
Implement EC2 backup strategies by creating EBS Snapshots and attaching the volumes to EC2s when needed
Manage migration of on-prem servers to AWS by creating golden images for upload and deployment
Manage several Linux and some Windows servers
Manage, maintain and deploy to test/development, staging and production environments
Liaise with Developers to manage upgrades and new releases of applications, tools and systems
Document all system configurations, build processes and best practices, backup procedures, troubleshooting guidelines using Atlassian Confluence
Provide support across both technical and non-technical teams
Responsible for the installation, configuration, maintenance and troubleshooting of Hadoop Cluster. Duties included monitoring cluster performance using various tools to ensure the availability, integrity and confidentiality of application and equipment.
Experience in installing and configuring RHEL servers in Production, Test and Development environment and used them in building application and database servers.
Deployed the Hadoop cluster in cloud environment with scalable nodes as per the business requirement.
Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
Improved the Hadoop cluster performance by considering the OS kernel, Disk I/O, Networking, memory, reducer buffer, mapper task, JVM task and HDFS by setting appropriate configuration parameters.
Experience in commissioning and decommissioning nodes of Hadoop cluster.
Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
Imported the data from relational databases into HDFS using Sqoop.
Performed administration, troubleshooting and maintenance of ETL and ELT processes.
Managing and reviewing Hadoop log files and supporting MapReduce programs running on the cluster.
Used Apache Solr to search data in HDFS Hadoop cluster.
Involved in creating Hive tables, loading data, and writing Hive queries
Involved in upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
Implemented APACHE IMPALA for data processing on top of HIVE.
Scheduled jobs using OOZIE workflow.
Used Hortonworks Apache Falcon for data management and pipeline process in the Hadoop cluster.
Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
Developed PIG and HIVE scripting for data processing on HDFS.
Configuring, installing, managing and administrating HBase clusters.
Experience in managing the cluster resources by implementing fair and capacity scheduler.
Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
Supported technical team members for automation, installation and configuration tasks.
Conducted detailed analysis of system and application architecture components as per functional requirements.
Coordinated with technical team for production deployment of software applications for maintenance.

Environment: Puppet, HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, RHEL, MySQL

Hadoop Administrator

Confidential

Responsibilities:

Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Installed and configured CDH 5.3 cluster using Cloudera Manager.
Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
Managed and reviewed Hadoop Log files.
Implemented Rack Awareness for data locality optimization.
Installed and configured Hive with remote Metastore using MySQL.
Pro - actively monitored systems and services and implementation of Hadoop Deployment, configuration management, performance, backup and procedures.
Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Worked on Recovery of Node failure.
Performed Hadoop Upgrade activities.
Managed and scheduling Jobs on a Hadoop cluster.
Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
Installed and configured Kerberos for the authentication of users and Hadoop daemons.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Worked with support teams to resolve performance issues.
Worked on testing, implementation and documentation.

Environment: HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, RHEL, MySQL

Hadoop Administrator - Developer

Confidential

Responsibilities:

Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre - process the data.
Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
Shared responsibility for administration of Hadoop, Hive and Pig.
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Supported code/design analysis, strategy development and project planning
Created reports for the BI team using Sqoop to export data into HDFS and Hive
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Assisted with data capacity planning and node forecasting.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Administrator for Pig, Hive and HBase installing updates, patches and upgrades.
Performed both major and minor upgrades to the existing CDH cluster.
Upgraded the Hadoop cluster from cdh3 to cdh4.

Environment: Confidential 3.5, Solaris 2.6/2.7/8, Oracle 10g, Weblogic10.x, Veritas NetBackup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.

System Administrator

Confidential

Responsibilities:

Installation, Configuration, Upgradation and administration of Sun Solaris, Red Hat Linux.
User account management and support.
Jumpstart & Kick - start OS integration, DDNS, DHCP, SMTP, Samba, NFS, FTP, SSH, LDAP integration.
Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
Responsible for configuring and managing Squid server in Linux.
Configuration and Administration of NIS environment.
Managing file systems and disk management using Solstice Disk suite.
Involved in Installing and configuring of NFS.
Worked on Solaris volume manager to create file systems as per user and database requirements.
Trouble shooting the system and end user issues.
Responsible for configuring real time backup of web servers.
Log file was managed for troubleshooting and probable errors.
Responsible for reviewing all open tickets, resolve and close any existing tickets.
Document solutions for any issues that have not been discovered previously.

Environment: Confidential 3.5, Solaris 2.6/2.7/8, Oracle 10g, Weblogic10.x, Veritas NetBackup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.

We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

Milpitas, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship