We provide IT Staff Augmentation Services!

Hadoop Admin Resume

New York New, YorK

SUMMARY:

  • Over 9 years of IT experience as a Developer, Designer & quality Tester with cross platform integration experience using Hadoop Ecosystem, Java and Software Functional Testing.
  • Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Pig, Hive, Oozie, TECHNICAL SKILLS
  • Experienced in installation, configuration, supporting and monitoring 300+ node Hadoop cluster using Cloudera manager and Hortonworks distributions.
  • Troubleshooting, debugging & fixing Talend specific issues, while maintaining the health and performance of the ETL environment
  • Involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
  • Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and maintain them.
  • Experience in HDFS data storage and support for running map-reduce jobs.
  • Experience in Chef, Puppet or related tools for configuration management.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
  • Involved in Infrastructure set up and installation of HDP stack on Amazon Cloud.
  • Experience with ingesting data from RDBMS sources like - Oracle, SQL and Teradata into HDFS using Sqoop.
  • Experience in big data technologies: Hadoop HDFS, Map-reduce, Pig, Hive, Oozie, Sqoop, Zookeeper and NoSQL.
  • Adding/installation of new components and removal of them through Cloudera Manager.
  • Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
  • Worked on evaluating, architecting, installation/setup of Hortonworks 2.6/1.8 Big Data ecosystem which includes Hadoop, Pig, Hive, Sqoop etc.
  • Experience in designing and implementing HDFS access controls, directory and file permissions user authorization that facilitates stable, secure access for multiple users in a large multi-tenant cluster
  • Experienced in setting up Hortonworks (HDP2.4) cluster with and without using Ambari 2.2.
  • Experience in using Ambari for Installation and management of Hadoop clusters.
  • Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
  • Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Responsible for the Provisioning, installing, configuring, monitoring and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Falcon, Smart sense, Storm, Kafka.
  • Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Good working knowledge of Vertica DB architecture, column orientation and High Availability.
  • Configured Informatica environment to connect to different databases using DB config, Input Table, Output Table, Update table Components.
  • Performed systems analysis for several information systems documenting and identifying performance and administrative bottlenecks.
  • Involved in implementing security on HDF and HDF Hadoop Clusters with Kerberos for authentication and Ranger for authorization and LDAP integration for Ambari, Ranger, and NiFi.
  • Responsible for support of Hadoop Production environment which includes Hive, YARN, Spark, Impala, Kafka, SOLR, Oozie, Sentry, Encryption, HBase, etc.
  • Migrating applications from existing systems like MySQL, Oracle, DB2 and Teradata to Hadoop.
  • Benchmarking Hadoop clusters to validate the hardware before and after installation to tweak the configurations to obtain better performance.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Experience in administering the Linux systems to deploy Hadoop cluster and monitoring the cluster.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.

PROFESSIONAL EXPERIENCE:

Confidential, New York, New York

Hadoop Admin

  • Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, managing. Management and support of Hadoop Services including HDFS, Hive, Impala, and SPARK.
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera.
  • Troubleshooting many cloud related issues such as Data Node down, Network failure, login issues and data block missing.
  • Worked as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC (Proof-of-Concept) to PROD clusters on Cloudera (CDH 5.5.2) distribution.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Day to day responsibilities includes solving developer issues, deployments moving code from one environment to other environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades.
  • Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
  • Migrated Flume with Spark for real time data and developed the Spark Streaming Application with java to consume the data from Kafka and push them into Hive.
  • Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to HDFS. Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Imported logs from web servers with Flume to ingest the data into HDFS.
  • Using Flume and Spool directory loading the data from local system to HDFS.
  • Retrieved data from HDFS into relational databases with Sqoop.
  • Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis Fine tuning hive jobs for optimized performance.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Experience in Ansible and related tools for configuration management.
  • Used Apache Nifi to copy the data from local file system to HDP.
  • Involved in chef-infra maintenance including backup/security fix on Chef Server.
  • Deployed application updates using Jenkins. Installed, configured, and managed Jenkins
  • Triggering the SIT environment build of client remotely through Jenkins.
  • Deployed and configured Git repositories with branching, forks, tagging, and notifications.
  • Experienced and proficient deploying and administering GitHub
  • Deploy builds to production and work with the teams to identify and troubleshoot any issues.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design.
  • Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Viewing the selected issues of web interface using SonarQube.
  • Developed a fully functional login page for the company's user facing website with complete UI and validations.
  • Installed, Configured and utilized AppDynamics (Tremendous Performance Management Tool) in the whole JBoss Environment (Prod and Non-Prod).
  • Reviewed OpenShift PaaS product architecture and suggested improvement features after conducting research on Competitors products.
  • Migrated data source passwords to encrypted passwords using Vault tool in all the JBoss application servers
  • Participated in Migration undergoing from JBoss 4 to Web logic or JBoss 4 to JBoss 6 and its respective POC.
  • Responsible for upgradation of SonarQube using upgrade center.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Installed and configured Hive in Hadoop cluster and help business users/application teams fine tune their HIVE QL for optimizing performance and efficient use of resources in cluster.
  • Conduct performance tuning of the Hadoop Cluster and map reduce jobs. Also, the real-time applications with best practices to fix the design flaws.
  • Implemented Oozie work-flow for ETL Process for critical data feeds across the platform.
  • Configured Ethernet bonding for all Nodes to double the network bandwidth
  • Implementing Kerberos Security Authentication protocol for existing cluster.
  • Built high availability for major production cluster and designed automatic failover control using Zookeeper Failover Controller (ZKFC) and Quorum Journal nodes.

Environment: HDFS, Map Reduce, Hive 1.1.0, Kafka, Hue 3.9.0, Pig, Flume, Oozie, Sqoop, Apache Hadoop 2.6, Spark, SOLR, Storm, Knox, Cloudera Manager, Red Hat, MySQL, Prometheus, Docker, Puppet.

Confidential, New York, New York

Hadoop Admin

Responsibilities:

  • Worked on developing architecture document and proper guidelines
  • Worked on installing Kafka on Virtual Machine.
  • Designed and implemented end to end big data platform solution on AWS.
  • Manage Hadoop clusters in production, development, Disaster Recovery environments.
  • Implemented SignalHub a data science tool and configured it on top of HDFS.
  • Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Rangers, Falcon, Smart sense, Storm, Kafka.
  • Configured Spark streaming to get streaming information from the Kafka and store them in HDFS.
  • Populated HDFS with huge amounts of data using Apache Kafka.
  • Recovering from node failures and troubleshooting common Hadoop cluster issues.
  • Scripting Hadoop package installation and configuration to support fully-automated deployments.
  • Automated Hadoop deployment using Ambari blueprints and Ambari REST API's.
  • Automated Hadoop and cloud deployment using Ansible.
  • Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow (powered by Nifi).
  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Configured Ranger for policy based authorization and fine grain access control to Hadoop cluster.
  • Implemented hdfs encryption for creating encrypted zones in HDFS.
  • Implemented a multitenant Hadoop cluster and on boarded tenants to the cluster.
  • Achieved data isolation through ranger policy based access control.
  • Used YARN capacity scheduler to define compute capacity. Responsible for building a cluster on HDP 2.5.
  • Worked closely with developers to investigate problems and make changes to the Hadoop environment and associated applications.
  • Expertise in recommending hardware configuration for Hadoop cluster. Managing and reviewing Hadoop log files.
  • Proven results-oriented person with a focus on delivery.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Managed cluster coordination services through Zookeeper. System/cluster configuration and health check-up.
  • Continuous monitoring and managing the Hadoop cluster through Ambari.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Resolving tickets submitted by users, troubleshoot the error documenting, resolving the errors.
  • Performed HDFS cluster support and maintenance tasks like Adding and Removing Nodes without any effect to running jobs and data .
  • Optimization and Tuning the application
  • Created User Guide Development and Training overviews for supporting teams
  • Design monitoring solutions and baseline statistics reporting to support the implementation
  • Experience with designing and building solutions for data ingestion both real time & batch using Sqoop/PIG/Impala/Kafka.
  • Extremely good knowledge and experience with Map Reduce, Spark Streaming, SparkSQL for data processing and reporting.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Used Apache Kafka for importing real time network log data into HDFS.
  • Developed business specific Custom UDF's in Hive, Pig.
  • Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
  • Optimized Map Reduce code by writing Pig Latin scripts.
  • Import data from external table into HIVE by using load command
  • Created table in hive and use static, dynamic partition for data slicing mechanism
  • Working experience with monitoring cluster, identifying risks, establishing good practices to be followed in shared environment
  • Good understanding on cluster configurations and resource management using YARN

Environment: HDFS, Hortonworks, Map Reduce, Hive, Kafka, Pig, Flume, Oozie, Sqoop, NiFi, HDP2.5, Ambari 2.4, Spark, SOLR, Storm, Knox, Centos 7 and MySQL

Confidential

Hadoop Admin

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Performed real time analysis on the incoming data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential

Hadoop Admin

Responsibilities:

  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Cluster maintenance as well as creation and removal of nodes.
  • Evaluation of Hadoop infrastructure requirements and design/deploy solutions (high availability, big data clusters.
  • Cluster Monitoring and Troubleshooting Hadoop issues
  • Manage and review Hadoop log files
  • Works with application teams to install operating system and Hadoop updates, patches, version upgrades as required
  • Created NRF documents which explains the flow of the architecture, which measure the performance, security, memory usage, dependency.
  • Setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Help maintain and troubleshoot UNIX and Linux environment.
  • Experience analyzing and evaluating system security threats and safeguards.
  • Experience in Importing and exporting data into HDFS and Hive using Sqoop.
  • Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Experienced in handling data from different data sets, join them and preprocess using Pig join operations.
  • Developed Map-Reduce programs to clean and aggregate the data
  • Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
  • Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
  • Imported and exported data from Teradata to HDFS and vice-versa.
  • Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
  • Implement counters on HBase data to count total records on different tables.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • We used Amazon Web Services to perform big data analytics.
  • Implemented Secondary sorting to sort reducer output globally in map reduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
  • Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
  • Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle cron jobs.
  • Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
  • Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Worked on spring framework for multi-threading.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, RDBMS/DB, Flat files, Teradata, MySQL, CSV, Avro data files. JAVA, J2EE.

Confidential

SDET

Responsibilities:

  • Involved in almost all the phases of SDLC.
  • Executed test cases manually and logged defects using Clear Quest
  • Automated the functionality and interface testing of the application using Quick Test Professional (QTP)
  • Design, Develop and maintain automation framework (Hybrid Framework).
  • Analyze the requirements and prepare automation scripts scenario
  • Develop test data for Regression testing using QTP
  • Wrote Test cases on IBM rational Manual Tester
  • Conducted Cross Browser testing on Different Platform
  • Client Application Testing, Web based Application Performance, Stress, Volume and Load testing of the system using Load Runner 9.5.
  • Analyzed performance of the application program itself under various test loads of many simultaneous Users.
  • Analyzed the impact on server performance CPU usage, server memory usage for the applications of varied numbers of multiple, simultaneous users.
  • Inserted Transactions and Rendezvous points into Web Users
  • Created User Scripts using VuGen and used Controller to generate and executed Load Runner Scenarios
  • Complete involvement in Requirement Analysis and documentation on Requirement Specification.
  • Prepared use-case diagrams, class diagrams and sequence diagrams as part of requirement specification documentation.
  • Involved in design of the core implementation logic using MVC architecture.
  • Used Apache Maven to build and configure the application.
  • Developed JAX-WS web services to provide services to the other systems.
  • Developed JAX-WS client to utilize few of the services provided by the other systems.
  • Involved in developing EJB 3.0 Stateless Session beans for business tier to expose business to services component as well as web tier.
  • Implemented Hibernate at DAO layer by configuring hibernate configuration file for different databases.
  • Developed business services to utilize Hibernate service classes that connect to the database and perform the required action.
  • Developed JavaScript validations to validate form fields.
  • Performed unit testing for the developed code using JUnit.
  • Developed design documents for the code developed.
  • Used SVN repository for version control of the developed code.

Environment: SQL, Oracle 10g, Apache Tomcat, HP Load Runner, IBM Rational Robot, Clear quest, Java, J2EE, HTML, DHTML, XML, JavaScript, Eclipse, WebLogic, PL/SQL and Oracle.

Hire Now