We provide IT Staff Augmentation Services!

Hadoop Engineer Resume

5.00/5 (Submit Your Rating)

Milpitas, CA

SUMMARY:

  • Over 8+ years of IT experience in design, implementation, troubleshooting and maintenance of complex Enterprise Infrastructure.
  • 6+ years of hands - on experience in installing, patching, upgrading and configuring Linux based operating system - RHEL and CentOS in a large set of clusters.
  • 5+ years of experience in configuring, installing, benchmarking and managing Apache, Hortonworks and Cloudera distribution of Hadoop.
  • 4+ years of extensive hands-on experience in IP network design, network integration, deployment and troubleshooting.
  • Experience in configuring AWS EC2, S3, VPC, RDS, RedShift Data Warehouse, Cloud Watch, Cloud Formation, Cloud Trail, IAM, and SNS.
  • Expertise on using Amazon AWS API tools like: Linux Command line, Puppet integrated AWS API tools
  • Experience in deploying scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
  • Experience in installing and monitoring the Hadoop cluster resources using Ganglia and Nagios.
  • Experience in designing and implementation of secure Hadoop cluster using MIT and AD Kerberos, Apache Sentry, Knox and Ranger.
  • Experience in managing Hadoop infrastructure like commissioning, decommissioning, log rotation, rack topology implementation.
  • Experience in managing Hadoop cluster using Cloudera Manager.
  • Experience in using Zookeeper for coordinating the distributed applications.
  • Experience in developing PIG and HIVE scripting for data processing on HDFS.
  • Experience in scheduling jobs using OOZIE workflow.
  • Experience in configuring, installing, managing and administrating HBase clusters.
  • Experience in managing Hadoop resource using Static and Dynamic Resource Pools.
  • Experience in installing minor patches and upgrading Hadoop Cluster to major version.
  • Experience in designing, installing and configuring Confidential ESXi, within vSphere 5 environment with Virtual Center management, Consolidated Backup, DRS, HA, vMotion and Confidential Data.
  • Experience in designing and building disaster recovery plan for Hadoop Cluster to provide business continuity.
  • Extensive Experience of Operating Systems including Windows, Red Hat, Cent OS.
  • Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of professional.

PROFESSIONAL EXPERIENCE:

Hadoop Engineer

Confidential,Milpitas, CA

Responsibilities:
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Installed and configured RHEL7 EC2 instances for Production, QA and Development environment.
  • Installed Kerberos for authentication of application and Hadoop service users.
  • Responsible for planning, installing, and supporting AWS infrastructure.
  • Supported technical team in management and review of Hadoop logs.
  • Assisted in creation of ETL processes for transformation of Data from Teradata to Hadoop Landing Zone.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Worked on AWS - Amazon Cloud - EC2, Security Groups, Elastic IP's, Load balancers, Auto scaling groups, S3, Elastic Bean Stack, Direct Connect, VPC, Cloud watch, IAM and many other services as well.
  • Created RedShift data warehouse Cluster using AWS Management console with 10 petabytes of data with a few clicks in the VPC.
  • Used AWS S3 as the data source for the RedShift.
  • Using ODBC/JDBC connections to connect Hive metastore to transfer the metadata to RedShift.
  • Used AWS import/export and direct connect service to transfer the data privately to AWS S3 between our network data center and AWS.
  • Configured replication of snapshots to s3 in another reason for the disaster recovery.
  • ORC formatted data in RedShift.
  • Monitored Metrics for compute utilization, storage utilization, and read/write traffic to our Amazon Redshift data warehouse cluster
  • Designed the future state architecture of various applications which are being migrated from on premise data center to AWS considering the HA and DR of those applications.
  • Monitored resources such as EC2, CPU memory, Amazon RDS DB services and EBS volumes using Cloud Watch.
  • Responsible to create various Cloud Watch alarms that sends an Amazon Simple Notification Service (SNS) message when the alarm triggers.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Experienced in configuring AWS S3 and their life cycle policies and to backup files and archive files in Amazon Glacier.
  • Designed Stacks using Amazon Cloud Formation templates to launch AWS Infrastructure and resources.
  • Developed AWS Cloud formation templates to create custom sized VPC, subnets, EC2 instances, ELB and security groups.
  • Designed, configured and managed public/private cloud infrastructures utilizing Amazon Web Services and Experienced in creating Amazon EC2 instances and setting up security groups and Configured Elastic Load balancers.
  • Worked on auto scaling the instances to design cost effective, fault tolerant and highly reliable systems.
  • Worked on the POC for implementation of NAS solution in AWS for various applications with dependencies.
  • Worked on Amazon RDS which includes automatic failover and high availability at the database layer for MYSQL workloads.
  • Captured regular snapshots for EBS volumes using CPM Cloud protection manager.
  • Created VPC s virtual private cloud with both public and private subnets and groups for servers and created security groups to associate with the networks.
  • Designed roles and groups for users and resources using AWS Identity access management IAM.
  • Enabled MFA multi factor authentication to secure the AWS accounts.
  • Experienced in supporting multi region and multi-AZ applications in AWS.
  • Written templates for AWS infrastructure as a code using Terraform to build staging and production environments.

Hadoop Engineer

Confidential,Culver City, CA

Responsibilities:

  • Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director, Cloudera Manager.
  • Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
  • Installed Kerberos for authentication of application and Hadoop service users.
  • Used Cron job to backup Hadoop Service databases to S3 buckets.
  • Supported technical team in management and review of Hadoop logs.
  • Assisted in creation of ETL processes for transformation of Data from Oracle and SAP to Hadoop Landing Zone.
  • Installed application on AWS EC2 instances and configured the storage on S3 buckets.
  • Decommissioning and commissioning new Data Nodes on current Hadoop cluster.
  • Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
  • Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
  • Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
  • Used Cloudera Navigator for data governance: Audit and Linage.
  • Configured Apache Sentry for fine - grained authorization and role-based access control of data in Hadoop.
  • Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
  • Monitoring performance and tuning configuration of services in Hadoop Cluster.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Involved in creating Hive DB, tables and load flat files.
  • Used Oozie to schedule jobs.
  • Configured Apache Phoenix on top HBase to query data through SQL.

Environment: Oozie, CDH 5.8, 5.9 and 5.10 Hadoop Cluster, AWS, RHEL6 EC2, S3, Sqoop, Apache, SQL

Hadoop & AWS - Administrator

Confidential,Palo Alto, CA

Responsibilities:
  • Manage multiple AWS accounts with multiple VPCs for both production and non - production where primary objectives are automation, build out, integration and cost control
  • Design AWS formation templates to create VPC architecture, EC2s, Subnets and NATS to meet high availability application and security parameters across multiple AZs
  • Design roles and groups for users and resources using IAM
  • Create and manage S3 buckets and policies for storage and backup purposes
  • Support, manage and maintain researchers' development efforts with custom applications, etc. on AWS e.g. WordPress, Shiny app on Rstudio, QlikSense, etc.
  • Work on automation and continuous integration process using Jenkins and Ansible
  • Implement EC2 backup strategies by creating EBS Snapshots and attaching the volumes to EC2s when needed
  • Manage migration of on-prem servers to AWS by creating golden images for upload and deployment
  • Manage several Linux and some Windows servers
  • Manage, maintain and deploy to test/development, staging and production environments
  • Liaise with Developers to manage upgrades and new releases of applications, tools and systems
  • Document all system configurations, build processes and best practices, backup procedures, troubleshooting guidelines using Atlassian Confluence
  • Provide support across both technical and non-technical teams
  • Responsible for the installation, configuration, maintenance and troubleshooting of Hadoop Cluster. Duties included monitoring cluster performance using various tools to ensure the availability, integrity and confidentiality of application and equipment.
  • Experience in installing and configuring RHEL servers in Production, Test and Development environment and used them in building application and database servers.
  • Deployed the Hadoop cluster in cloud environment with scalable nodes as per the business requirement.
  • Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
  • Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
  • Improved the Hadoop cluster performance by considering the OS kernel, Disk I/O, Networking, memory, reducer buffer, mapper task, JVM task and HDFS by setting appropriate configuration parameters.
  • Experience in commissioning and decommissioning nodes of Hadoop cluster.
  • Worked in the cluster disaster recovery plan for the Hadoop cluster by implementing the cluster data backup in Amazon S3 buckets.
  • Imported the data from relational databases into HDFS using Sqoop.
  • Performed administration, troubleshooting and maintenance of ETL and ELT processes.
  • Managing and reviewing Hadoop log files and supporting MapReduce programs running on the cluster.
  • Used Apache Solr to search data in HDFS Hadoop cluster.
  • Involved in creating Hive tables, loading data, and writing Hive queries
  • Involved in upgrading Hadoop cluster from current version to minor version upgrade as well as to major versions.
  • Implemented APACHE IMPALA for data processing on top of HIVE.
  • Scheduled jobs using OOZIE workflow.
  • Used Hortonworks Apache Falcon for data management and pipeline process in the Hadoop cluster.
  • Installed and configure Zookeeper service for coordinating configuration-related information of all the nodes in the cluster to manage it efficiently.
  • Developed PIG and HIVE scripting for data processing on HDFS.
  • Configuring, installing, managing and administrating HBase clusters.
  • Experience in managing the cluster resources by implementing fair and capacity scheduler.
  • Formulated procedures for planning and execution of system upgrades for all existing Hadoop clusters.
  • Supported technical team members for automation, installation and configuration tasks.
  • Conducted detailed analysis of system and application architecture components as per functional requirements.
  • Coordinated with technical team for production deployment of software applications for maintenance.

Environment: Puppet, HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, RHEL, MySQL

Hadoop Administrator

Confidential

Responsibilities:
  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Installed and configured CDH 5.3 cluster using Cloudera Manager.
  • Implemented Commissioning and Decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Managed and reviewed Hadoop Log files.
  • Implemented Rack Awareness for data locality optimization.
  • Installed and configured Hive with remote Metastore using MySQL.
  • Pro - actively monitored systems and services and implementation of Hadoop Deployment, configuration management, performance, backup and procedures.
  • Monitored the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Worked on Recovery of Node failure.
  • Performed Hadoop Upgrade activities.
  • Managed and scheduling Jobs on a Hadoop cluster.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop
  • Installed and configured Kerberos for the authentication of users and Hadoop daemons.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
  • Worked with support teams to resolve performance issues.
  • Worked on testing, implementation and documentation.

Environment: HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, HBase, Hive, Flume, Sqoop, RHEL, MySQL

Hadoop Administrator - Developer

Confidential

Responsibilities:
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre - process the data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Supported code/design analysis, strategy development and project planning
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Administrator for Pig, Hive and HBase installing updates, patches and upgrades.
  • Performed both major and minor upgrades to the existing CDH cluster.
  • Upgraded the Hadoop cluster from cdh3 to cdh4.

Environment: Confidential 3.5, Solaris 2.6/2.7/8, Oracle 10g, Weblogic10.x, Veritas NetBackup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.

System Administrator

Confidential

Responsibilities:
  • Installation, Configuration, Upgradation and administration of Sun Solaris, Red Hat Linux.
  • User account management and support.
  • Jumpstart & Kick - start OS integration, DDNS, DHCP, SMTP, Samba, NFS, FTP, SSH, LDAP integration.
  • Network traffic control, IPsec, Quos, VLAN, Proxy, Radius integration on Cisco Hardware via Red Hat Linux Software.
  • Responsible for configuring and managing Squid server in Linux.
  • Configuration and Administration of NIS environment.
  • Managing file systems and disk management using Solstice Disk suite.
  • Involved in Installing and configuring of NFS.
  • Worked on Solaris volume manager to create file systems as per user and database requirements.
  • Trouble shooting the system and end user issues.
  • Responsible for configuring real time backup of web servers.
  • Log file was managed for troubleshooting and probable errors.
  • Responsible for reviewing all open tickets, resolve and close any existing tickets.
  • Document solutions for any issues that have not been discovered previously.

Environment: Confidential 3.5, Solaris 2.6/2.7/8, Oracle 10g, Weblogic10.x, Veritas NetBackup, Veritas Volume Manager, Samba, NFS, NIS, LVM, Linux, Shell Programming.

We'd love your feedback!