We provide IT Staff Augmentation Services!

Hadoop Engineer/administrator Resume

4.00/5 (Submit Your Rating)

Austin-tX

SUMMARY

  • AWS certified cloud engineerwith around 7 years of experience in design, development and implementations of robust technology systems, with specialized expertise in Hadoop Administration, Big Data/Cloud and Linux Administration
  • Experience in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN, Spark, Kafka, Oozie, and Flume for data storage and analysis.
  • Expertise in Commissioning and Decommissioning the nodes in Hadoop Cluster. Collecting and aggregating a large amount of Log Data using Apache Flume and storing data in HDFS for further analysis.
  • Job/workflow scheduling and monitoring tools like Oozie.
  • Experience in designing both time driven and data driven automated workflows using Oozie
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies.
  • Installed and configured various Hadoop distributions like CDH - 5.7 and HDP 2.6.5 and higher versions.
  • Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
  • Extensive experience in installing, configuring and administering Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
  • Experience in Sentry, Ranger, Knox configuration to provide the security for Hadoop components.
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experience in setting up of Hadoop cluster in cloud services like AWS and Azure.
  • Knowledge on AWS services such as EC2, S3, Glaciers, IAM, EBS, SNS, SQS, RDS, VPC, Load Balancers, Auto scaling, Cloud Formation, Cloud Front and Cloud Watch.
  • Experience in Linux System Administration, Linux System Security, Project Management and Risk Management in Information Systems.
  • Involved in the functional usage and deployment of applications to Oracle WebLogic, JBOSS, Apache Tomcat, Nginx and WebSphere servers.
  • Experience on working with VMware Workstation and Virtual Box.

PROFESSIONAL EXPERIENCE

Confidential, Austin-TX

Hadoop Engineer/Administrator

Responsibilities:

  • Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, MapR, and Hortonworks& Cloudera Hadoop Distribution.
  • Big data Hadoop clusters with Around 10000 Nodes (1000 peta bytes) with Prod, Dev and UAT environments with Hortonworks distribution platform of 3.1, 3.0 and 2.6.5.
  • Currently working on Enterprise - wide development team forPuppet-based application, middleware, patching andconfiguration deploymentengineering
  • Hands-on Experience in configuration of Network architecture on AWS withVPC,Subnets,Internet gateway,NAT, Route table.
  • Built and Administrated the 26 clusters as part of Day to day operations, which includes standalone clusters of HBASE and Kafka.
  • Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored Application team for creating Hive queries to test use cases.
  • All the clusters are Authenticated with Kerberos environment (centralized) and few clusters are additionally maintained with Ranger authorization.
  • Performed Requirement Analysis, Planning, Architecture Design and Installation of the Hadoop cluster
  • BuiltS3buckets and managed policies for S3 buckets and usedS3 bucketandGlacierfor storage and backup onAWS.
  • Work with other teams to help develop thePuppetinfrastructure to conform to various requirements including security and compliance of managed servers.
  • Experience in Upgrades and Patches and Installation of Ecosystem Products through Ambari.
  • Automated the configuration management for several servers using Chef and Puppet.
  • Monitored job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
  • Responsible for day-to-day activities which include HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/Troubleshooting, Manage and review Hadoop log files, Backup restoring and capacity planning.
  • Design and deployment of clustered HPC monitoring systems, including a dedicated monitoring cluster.
  • Develop and document best practices, HDFS support and maintenance, Setting up new Hadoop users.
  • Responsible for the new and existing administration of Hadoop infrastructure.
  • Included DBA Responsibilities like data modeling, design and implementation, software installation and configuration, database backup and recovery, database connectivity and security.

ENVIRONMENT: Hadoop, Map Reduce, Cassandra, HDFS, Pig, GIT, Jenkins, kafka, Puppet, Ansible, Maven Spark, Yarn, HBase, Oozie, MapR, NoSQL, ETL, MYSQL, agile, Windows, UNIX Shell Scripting

Confidential, PRINCETON, NJ

Hadoop Engineer

Responsibilities:

  • Hadoop clusters with 84 nodes including Prod and Dev environments with Hortonworks distribution.
  • Build and maintained the clusters of HDP and HDF - NIFI (Hortonworks data platform & Hadoop Data Flow) in all the four environment Dev, Prod, UAT1 and UAT2.
  • Installed and maintained Ambari tool to monitor/install the HDP and HDF.
  • Installed Mysql and Postgres Database for the backend Database.
  • Installed all the required services in all environments with prior requirements of business and developers uses.
  • Implemented Row level security in environment through Ranger.
  • Kerberos is enabled for authentication in the process of securing the cluster.
  • Added additional nodes to the clusters for the better performance.
  • Installed all the clusters in cloud on Microsoft Azure.
  • Experienced on Shell scripting with good Unix and Linux knowledge. installed the services like Hive, Sqoop, Smart sense, etc.,
  • Monitored HDFS file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
  • Implemented various scripts for backing up HDFS daily/weekly/monthly with retention period.
  • Responsible for day-to-day activities which include HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/Troubleshooting, Manage and review Hadoop log files, Backup restoring and capacity planning.
  • Implemented different YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups with application management which sits on top of Hadoop.
  • Experience in methodologies such as Agile, Scrum, and Test-driven development
  • Installed and maintained Atscale Application on all the platforms through edge node for the business usage.
  • Hands on experience in installing Atscale 6, Atscale 7 and above.

Environment: HDP 2.6.5, HDP 2.6.3, HDF 2.2, HDF 2.3, Ambari 2.6.1, Ambari 2.6.2, Cloud Microsoft Azure nodes, Hive, Sqoop, Kafka, Spark, Spark2, Yarn, Hbase, Zookeeper, Smart sense and Slider

Confidential, SAN ANTONIO-TX

Hadoop Engineer/Administrator

Responsibilities:

  • Installed and configured Hadoop and Ecosystem components in Cloudera and Hortonworks environments.
  • Configured Hadoop, Hive and Pig on Amazon EC2 servers, Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.2 cluster and Implemented Sentry for the Dev Cluster.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Worked on tuning the performance Pig queries.
  • Converted ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
  • Implemented best income logic using Pig scripts and UDFs Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
  • Implemented Hadoop stack and different bigdata analytic tools, migration from different databases to Hadoop (Hdfs).
  • Developed backup policies for HADOOP systems and action plans for network failure.
  • Involved in the User/Group Management in Hadoop with AD/LDAP integration.
  • Resource management and load management using capacity scheduling and appending changes according to requirements.
  • Implemented strategy to upgrade entire cluster nodes OS from RHEL5 to RHEL6 and ensured cluster remains up and running.
  • Developed scripts in shell and python to automate lot of day to day admin activities.
  • Implemented HCatalog for making partitions available for Pig/Java MR and established Remote Hive metastore using MySQL.
  • Installed several projects on Hadoop servers and configured each project to run jobs and scripts successfully.

ENVIRONMENT: Cloudera Manager 4&5, Ganglia, Tableau, Shell Scripting, Oozie, Pig, Hive, Flume, bash scripting, Teradata, Kafka, Impala, Oozie, Sentry, CentOS.

Confidential

System Administrator

Responsibilities:

  • Administer and maintain the Windows 2003 and 2008 Active Directory Infrastructure for Production.
  • Migration/Move multiple application and print servers including data, shares and printers from Windows 2003 to Windows 2008
  • Created multiple Device Groups depending the application and requirements of the environment.
  • Provide user account administration for the distributed server environment and infrastructure Applications.
  • Knowledge in Installing and configuring ESX 3.0/3.5 server,Configuring DRS and HA in Vsphere Knowledge in Fault tolerance, Migrating Virtual machines using Vmotion.
  • Performing Storage V-Motion,Installing Virtual Centre and managing ESX hosts through VC.
  • Deploying Vm’s with clones and Templates,Hot adding devices to virtual machines
  • Providing high availability to Vm’s.
  • Responsible for the general operation of the distributed server environment, including performance, reliability and efficient use of network resources.
  • In-charged overseeing at the systems Allocated storage on EMC DMX-3, DMX-4, DMX 3000's/2000's and CX600/700's.
  • Data Replication (BCV, SRDF)Used migration techniques like Array based i.e. SRDF/Open replicator to migrate the data from Old S DMX
  • 3000/DMX 2000 to DMX-4/DMX-3 storage systems in UNIX, Windows, Linux and AIX environment for online/offline data migration.
  • Implemented Business Continuance features like EMC Time Finder in Symmetrix/DMX arrays.
  • Worked on NetApp SnapMirror, Flexvol, Snapshots, Netapp Filer view, Netapp Management console, Implementations of Aggregates, FAS 3200, Vseries 3200.
  • Analyzed and maintained performance data to ensure optimal usage of the storage resources available.
  • Created Raid groups, storage groups and bound the Clariion luns to the hosts using navisphere manager and navicli.
  • Created larger luns (metas) to support the application needs using SYMCLI.
  • Planned and configured File systems with CIFS and NFS protocol and implementation in multiprotocol environment
  • Worked on NetApp SnapMirror, Flexvol, Snapshots, Netapp Filer view, Netapp Management console, Implementations of Aggregates, FAS 3200, Vseries 3200.
  • Configured CIFS servers and VDM's for Windows only environment. Protected through DPM 2006,2007, SP1 2012

ENVIRONMENT: Symmetrix DMX 3000, VNX, Clariion CX3-80, CX3-20, CX3-10c, CX700, CX300, CX500, Maintained NetApp FAS 270, 960 and 3040 series, Brocade 5300 and 4800.

We'd love your feedback!