We provide IT Staff Augmentation Services!

Hadoop Administrator Resume

3.00/5 (Submit Your Rating)

Roseville, CA

SUMMARY

  • Having 8+ years of experience in software development, building scalable and high - performance Big Data applications with specialization in HadoopStack, NoSQL Databases, Distributed computing, and Java/ J2EE technologies.
  • Extensive experience in Hadoop Map Reduce programming, Spark, Scala, Pig, NoSQL, and Hive.
  • Experience with Horton works & Cloudera Manager Administration also experiences in Installing, Updating Hadoop and its related components in Single node cluster as well as Multi-node cluster environment using Apache, Cloudera, Horton works.
  • Good experience in UNIX/LINUX Administrator along with SQLdeveloper in designing and implementing the Relational Database model as per business needs in different domains.
  • Hands-on experience on major components in Hadoop Ecosystem including HDFS and MR framework, YARN, HBase, Hive, Pig, Scoop, Zookeeper.
  • Experience in managing and handling Linux platform servers (especially Ubuntu) and hands-on experience on Red hat Linux.
  • Installation, Configuration, and Administration of Hadoop cluster of major Hadoopdistributions such as ClouderaEnterprise (CDH3 and CDH4) and HortonworksDataPlatform (HDP1 and HDP2).
  • Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Used Network Monitoring Daemons like Ganglia and Service monitoring tools like Nagios.
  • Backup configuration and Recovery from a Namenode failure.
  • Installation of various HadoopEcosystems and Hadoop Daemons.
  • Installation and configuration of Sqoop and Flume.
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Hands-on experience in analyzing Log files for Hadoop and ecosystem services and finding the root cause.
  • Experience in Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • Experience in copying files within-cluster or intracluster using DistCp command-line utility
  • Experience in HDFS data storage and support for running map-reduce jobs.
  • Installing and configuring the Hadoop ecosystem like Sqoop, pig, hive.
  • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
  • Hands-on experience with installing Kerberos Security and setting up permissions, set up Standards and Processes for Hadoop based application design and implementation.
  • Experience with the cloud: Hadoop-on-Azure, AWS/EMR, Cloudera Manager (also direct-Hadoop-EC2(non-EMR))
  • Brief exposure in Implementing and Maintaining HadoopSecurity and Hive Security.
  • Experience in Database Administration, performing tuning and backup & recovery and troubleshooting in a large scale customer-facing environment.
  • Expertise in the deployment of Hadoop, Yarn, Spark and Storm integration with Cassandra, ignite and RabbitMQ, Kafka.
  • Expertise in Commissioning and Decommissioning of nodes in the clusters, Backup configuration, and Recovery from a Name node failure.
  • Good working knowledge on importing and exporting data from different databases namely MySQL into HDFS and Hive using Scoop.
  • Strong knowledge of yarn terminology and the High-Availability HadoopClusters.
  • Hands-on experience in analyzing Log files for Hadoop and ecosystem services and finding the root cause.
  • Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
  • Very Good Knowledge in YARN (Hadoop 2.x.x) terminology and High availability Hadoop Clusters.
  • Experience in analyzing the log files for Hadoop and ecosystem services and finding out the root cause.
  • Very Good experience on high-volume transactional systems running on Unix/Linux and Windows.
  • Involved in all phases of the Software Development Life Cycle (SDLC) in large scale enterprise software using Object-Oriented Analysis and Design.

TECHNICAL SKILLS

Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper,and Oozie

Languages: XML, R/R Studio, SAS, Schemas, JSON, Ajax, Java, Scala, Python, Shell Scripting

Hadoop Frameworks: Cloudera CDHs, Hortonworks HDPs, MAPR.

NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB

Business Intelligence Tools: Tableau Server, Tableau Reader, Tableau, Splunk, Amazon Redshift.

Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall.

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.

PROFESSIONAL EXPERIENCE

Confidential, Roseville,CA

Hadoop Administrator

Responsibilities:

  • Worked on Hadoop Stack, ETL TOOLS like TALEND, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
  • Worked with the Data Science team to gather requirements for various data mining projects.
  • Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Baselayer to store and do analytics for Developers, we provide services to developers, install their custom software’s, upgrade Hadoop components, solve their issues, and help them troubleshoot their long-running jobs, we are L3 and L4 support for the Datalike, and I also manage clusters for other teams.
  • Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, ElasticSearch, Tableau, GoCD, Red hat infrastructure for data ingestion, processing, and storage.
  • In a mix of DevOps and Hadoop admin here, and work on L3 issues and installing new components as the requirements come and did as much automation and implemented CI /CD Model.
  • Involved in implementing security on Hortonworks Hadoop Cluster using Kerberos by working along with the operations team to move non-secured cluster to secured cluster.
  • Responsible for upgrading Hortonworks HadoopHDP2.2.0 and MapReduce2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, MapReduce, Spark and loaded data into HDFS. Hadoop security setup using MIT Kerberos, AD integration(LDAP) and Sentry authorization.
  • Migrated services from a managed hosting environment to AWS including service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
  • Used R for an effective data handling and storage facility,
  • Managing Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built. designing cloud-hosted solutions, specific AWS product suite experience.
  • Performed a major upgrade in the production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of the cluster.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single-node clusters for POCs.
  • Created TeradataDatabaseMacros for Application Developers which assist them to conduct performance and space analysis, as well as to object dependency analysis on the Teradata database platforms
  • Implementing a Continuous Delivery framework using Jenkins, Puppet, Maven & Nexus in a Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
  • Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for TomcatServers, Jenkins.
  • Worked with application teams to install an operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, SAS, SPSS, Unix Shell Scripts, Zookeeper, SQL, Map Reduce, Pig.

Confidential, Peoria,IL

HadoopAdministrator

Responsibilities:

  • Involved in start to end process of Hadoop cluster setup wherein installation, configuration and monitoring the Hadoop Cluster.
  • Automated Setup Hadoop Cluster, Implemented Kerberos security for various Hadoop services using Horton Works.
  • Responsible for Clustermaintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Responsible for Installation and configuration of Hive, Pig, HBase, and Sqoop on the Hadoop cluster.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
  • Involved in loading data from the UNIX file system to HDFS, Importing and exporting data into HDFS using Sqoop, experienced in managing and reviewing Hadoop log files.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
  • Managed and reviewed Hadoop Logfiles as a part of the administration for troubleshooting purposes. Communicate and escalate issues appropriately.
  • Extracted meaningful data from dealer csv files, text files, and mainframe files and generated Python panda's reports for data analysis.
  • Developed python code using version control tools like GIThub and SVN on vagrant machines.
  • Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Worked with the systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters. Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.

Environment: Horton Work, Hadoop, HDFS, Pig, Hive, Sqoop, Flume, Kafka, Storm, UNIX, Cloudera Manager, Zookeeper and HBase, Python, Spark, Apache, SQL, ETL.

Confidential, Minneapolis, MN

Hadoop Administrator

Responsibilities:

  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage& review Hadoop log files.
  • Monitoring systems and services, architecture design and implementation of Hadoopdeployment, configuration management, backup, and disaster recovery systems and procedures.
  • Performed both major and minor upgrades to the existing Horton works Hadoopcluster.
  • Build an automated setup for the cluster monitoring and issue escalation process.
  • Administration, installation, upgrading and managing distributions and tuning Hadoop Clusters. (Cloudera Manager) HBase, Hive.
  • Created a self-managed Python script to deploy testing of the technologies and calculate statistics.
  • Worked on Hadoop Stack, ETL TOOLS like Tableau and Security like Kerberos. User provisioning with LDAP and a lot of other Big Data technologies for multiple use cases.
  • Expertise in Hadoop Stack Map reduces, Sqoop, Pig, Hive, and HBase, Kafka, Spark.
  • Plans and executes on system upgrades for existing HadoopClusters.
  • Ability to work with incomplete or imperfect data, experience with real-time transactional data. Strong collaborator and team player with an agile hand on experience on Impala.
  • Installs manage and configure the Hadoop clusters, utilized Python to run scripts, generatetables, and reports.
  • Monitors the Hadoopjobs and performance.
  • Build the Docker image for the applications and running them on specified ports in DockerContainer.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hiveschema for analysis.
  • Implemented complex MapReduce programs to perform joins on the Map side using distributed cache.
  • Participate in development/implementation of Cloudera Hadoop environment.
  • Developed SparkSQL to load tables into HDFS to run select queries on top.
  • Developed Sparkcode and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark Streaming to divide streaming data into batches as an input to the Spark engine for batch processing.

Environment: Hadoop, AWS, Java, HDFS, MapReduce, Spark, Pig, Hive, Impala, Sqoop, Flume, Docker, Kafka, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse, and Cloudera.

We'd love your feedback!