We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

3.00/5 (Submit Your Rating)

New York, NY

SUMMARY

8+ years of experience in IT industry includes AWS and Big data Hadoop Admin in Banking, Telecom and financial clients.

TECHNICAL SKILLS

  • Big Data Hadoop
  • Cloudera CDH
  • Hortonworks HDP
  • AWS
  • CDP (Unity release)
  • Spark
  • Linux Scripting
  • JAVA

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Hadoop Administrator

Responsibilities:

  • Installed, configured and maintained Hadoop clusters for Enterprise Analytics and Data science teams.
  • Implemented hadoop tools like Hive, Pig, HBase, Oozie, Flume, Zookeeper and Sqoop, Kafka, Spark.
  • Installing and Upgrading Cloudera CDH on production &Hortonworks HDP Versions on test.
  • Moving the Services (Re - distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Configured AWS IAM and Security Groups.
  • Hands on experience in provisioning and managing multi-node AWS Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
  • Installed and configured HA of Hue to point Hadoop Cluster in Cloudera Manager.
  • Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
  • Installed and configured Map Reduce, HDFS and developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
  • Kafka- Used for building real-time data pipelines between clusters.
  • Ran Log aggregations, website Activity tracking and commit log for distributing system using Apache kafka.
  • Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Extensively worked on Informatica tool to extract data from flat files, Oracle and Teradata and to load the data into the target database.
  • Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Experience in Python and Shell scripts.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Automated workflows using shell scripts to pull data from various databases into Hadoop.
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.

Environment: Hadoop, AWS, EC2, S3, EMR, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, Zookeeper, Ranger, Knox,, NoSQL and Unix/Linux.

Confidential, Austin, TX

Hadoop Administrator

Responsibilities:

  • Launching Amazon EC2 Cloud Instances using Amazon Web Services (Linux/ Ubuntu/RHEL) and Configuring launched instances with respect to specific applications.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed S3 buckets creation, policies and also on the IAM role based polices and customizing the JSON template.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Managed servers on the Amazon Web Services (AWS) platform instances using Puppet, Chef Configuration management.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Worked closely with the data modelers to model the new incoming data sets.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce, Spark and Shell scripts (for scheduling of few jobs.
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
  • Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.

Environment: HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBASE, Oozie, Scala, Spark, Linux.

Confidential, Great Neck, NY.

Hadoop Admin

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Having knowledge on documenting processes, server diagrams, preparing server requisition documents
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound or CPU bound
  • Worked on setting up high availability for major production cluster.
  • Performed Hadoop version updates using automation tools.
  • Working on setting up 100 node production cluster and a 40 node backup cluster at two different data centers
  • Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
  • Involved in setting up hive, hiveserver2, hive authorization and testing the environment
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Managed load balancers, firewalls in a production environment.
  • Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop, HDFS, Sqoop, Pig, Zookeeper, MapReduce, Hive, Oozie, Java (jdk1.6), Cloudera, Erwin.

Confidential, NY

Hadoop Admin

Responsibilities:

  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Extensively using Cloudera manager for managing multiple clusters with petabytes of data.
  • Implementing Oracle Big Data Appliance for production environment.
  • Working with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, R Studio, Teradata.
  • Conducting root cause analysis and resolve production problems and data issues.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Executing tasks for upgrading cluster on the staging platform before doing it on production cluster.
  • Monitoring cluster stability, use tools to gather statistics and improve performance.
  • Keeping current with latest technologies to help automate tasks and implement tools and processes to manage the environment.
  • Implementing security for Hadoop Cluster with Kerberos Authentication.
  • Experience in LDAP integration with Hadoop and access provisioning for secured cluster.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Worked on setting up high availability for major production cluster.
  • Performed Hadoop version updates using automation tools.
  • Implemented rack aware topology on the Hadoop cluster.
  • Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Good experience in troubleshoot production level issues in the cluster and its functionality.
  • Backed up data on regular basis to a remote cluster using distcp.
  • Managing and scheduling Jobs on a Hadoop cluster.

Environment: Hadoop,AWS, LDAP, Teradata, Sentry, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Flume, HBase, Zookeeper, Cloudera Distributed Hadoop, Cloudera Manager.

We'd love your feedback!