We provide IT Staff Augmentation Services!

Sr. Hadoop Administrator Resume

San Jose, CA


  • 7+ Years of extensive IT experience with 4 +years of experience as a Hadoop Administrator and 4 years of experience in Python and UNIX/Linux Administrator along with SQL developer in designing and implementing Relational Database model as per business needs in different domains.
  • Expertise in Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH5.x), Yarn distributions.
  • Hands on experience on backup configuration and Recovery from a NameNode failure.
  • Decommissioning and commissioning the Node on running hadoop cluster.
  • Extensive experience in installation, configuration, management and deployment of Big Data components and the underlying infrastructure of Hadoop Cluster.
  • Involved in bench marking Hadoop/HBase cluster file systems various batch jobs and workloads
  • Experience monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network
  • Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
  • Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
  • As an admin involved in Cluster maintenance, trouble shooting, Monitoring and followed proper backup& Recovery strategies.
  • Experience in HDFS data storage and support for running map - reduce jobs.
  • Good working knowledge on importing and exporting data from different databases namely MySQL, PostgreSQL, Oracle into HDFS and Hive using Sqoop.
  • Extensive experience in NoSQL and real time analytics.
  • Strong knowledge on yarn terminology and the High-Availability Hadoop Clusters.
  • Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
  • Experience in Chef, Puppet or related tools for configuration management.
  • Expertise in Installing, Configuration and Managing Red hat Linux 5, 6.
  • Good experience on scheduling cron jobs in Linux.
  • Proactively maintain and develop all Linux infrastructure technology to maintain a 24x7x365 uptime service
  • Maintain best practices on managing systems and services across all environments
  • Fault finding, analysis and of logging information for reporting of performance exceptions
  • Manage, coordinate, and implement software upgrades, patches, hot fixes on servers, workstations, and network hardware
  • Provide input on ways to improve the stability, security, efficiency, and scalability of the environment
  • Install and maintain all server hardware and software systems and administer all server performance and ensure availability for same.
  • Stabilized system by disk replacement, firmware upgrade in SAN storage, Solaris Volume Management, clustering environment on scheduled maintenance hours.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's and YARN.
  • Enhanced business continuity procedure by adding critical middleware server and identified through power-down test activity.
  • Assist to configure and deploy all virtual machines and install and provide backup to all configuration procedures.
  • Responsible for scheduling and upgrading these servers throughout the year to the latest versions of software
  • Communicated and worked with the individual application development groups, DBAs and the Operations
  • Created custom monitoring plugins for Nagios using UNIX shell scripting, and Perl.
  • Perform troubleshoot on all tools and maintain multiple servers and provide back up for all files and script management servers.
  • Perform tests on all new software and maintain patches for management services and perform audit on all security processes
  • Collaborate with other teams and team members to develop automation strategies and deployment processes
  • Provided root cause analysis of incident reports during any downtime issues
  • Provided customer with administrative support on a UNIX based platform historical query database serving many users.
  • Authorized to work in United States for any employer


Sr. Hadoop Administrator

Confidential, San Jose, CA


  • Loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into Hive tables, which are partitioned.
  • Installation of various Hadoop Ecosystems and Hadoop Daemons.
  • Responsible to manage data coming from different sources.
  • Supported MapReduce Programs those are running on the cluster.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developed Scripts and Batch Job to schedule various Hadoop Program.
  • Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
  • Built automated set up for cluster monitoring and issue escalation process.
  • Administration, installing, upgrading and managing distributions of Hadoop (CDH5, Cloudera manager), Hive, HBase.
  • Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files
  • Monitoring systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning
  • Tuning performance parameters in Hive to handle huge amounts of data being processed.
  • Implementation of Partition/Bucketing schemes in Hive for easier access of data.
  • Hive tables are partitioned for different runs to store the data and compare the results after metadata and business logic changes.
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution
  • Managing and reviewing Hadoop and HBase log files.
  • Expertise in writing Hive and Pig scripts to perform data analysis on large data sets.
  • Leveraged the feature of custom UDFs in Pig and Hive to process data.
  • Experience with Unix or Linux, including shell scripting
  • Developed bash scripts to bring the Tlog files from ftp server and then processing it to load into hive tables.
  • Used Test driven approach (TDD) for developing services required for the application and Implemented Integration test cases and Developing predictive analytic using Apache Spark Scala APIs.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design and Used Git to resolve and coding the work on python and portlet.
  • Involved in optimizing Joins in Hive queries
  • Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
  • Used Scala to write code for all Spark use cases.
  • Assigned name to each of the columns using case class option in Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Built real time pipeline for streaming data using Kafka/Microsoft Azure Queue and Spark Streaming.
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in Cassandra cluster.
  • Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
  • Used several python libraries like wxPython, NumPy and matPlotLib and I was involved in environment, code installation as well as the SVN implementation and Hands-on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
  • Familiarity on real time streaming data with Spark for fast large scale in memory MapReduce.
  • Implemented a continuous Delivery Pipeline with Jenkins and GitHub to build a new Dockercontainer automatically
  • Used Docker to implement a high-level API to provide lightweight containers that run processes isolation and worked on creation of customized Docker container images, tagged and pushed the images to the Docker repository.
  • Worked in development of applications especially in UNIX environment and familiar with all of its commands.
  • Build all database mapping classes using Django models and Apache Cassandra and Experienced in developing Restful applications with HATEOAS and documenting the API in Swagger.
  • Worked in data formats such as TextFile, Sequence File, Row Columnar and Optimized Row Columnar in Hive.
  • Developed Wrapper in Python for instantiating multi-threaded application and Deploy and monitor scalable infrastructure on Amazon web services (AWS).
  • Partitioned and Bucketed data sets in Apache Hive to improve performance.
  • Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval and Automation of various administrative tasks on multiple servers using DevOps.

Environment: Python 3, Django, Hadoop, HDFS, Map Reduce, Shell Scripting, spark, solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, cluster health, monitoring security, Redhat Linux, impala, Cloudera ManagerGolang, bitbucket, pdb, AWS, Jira, Jenkins, dockers, Pyspark, Rest, Virtual Machine, Ajax, jQuery, JavaScript, LINUX.

Hadoop Administrator

Confidential, Austin, TX


  • Involved in integrating the GIT into the Puppet to ensure the integrity of applications by creating Production, Development, Test, and Release Branches.
  • Responsible for loading the customer's data and event logs from Oracle database, Teradata into HDFS using Sqoop
  • End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
  • Performance tuning of Hadoop clusters and Hadoop MapReduce routines
  • Screen Hadoop cluster job performances and capacity planning
  • Monitor Hadoop cluster connectivity and security
  • Manage and review Hadoop log files
  • HDFS support and maintenance
  • Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity
  • Designed, configured and managed the backup and disaster recovery for HDFS data.
  • Experience with Unix or Linux, including shell scripting
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution.
  • Commissioned Data Nodes when data grew and decommissioned when the hardware degraded
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning
  • Expertise in recommending hardware configuration for Hadoop cluster
  • Installing, Upgrading and Managing Hadoop Cluster on Cloudera distribution
  • Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, and Sqoop.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Exported analyzed data to HDFS using Sqoop for generating reports.
  • Managing and reviewing Hadoop and HBase log files.
  • Experience with Unix or Linux, including shell scripting
  • Scheduling all hadoop/hive/sqoop/Hbase jobs using Oozie.

Environment: Hadoop, Map Reduce, Shell Scripting, spark, Pig, Hive, Cloudera Manager, CDH 5.4.3, HDFS, Yarn, Hue, Sentry, Oozie, Zoo keeper, Impala, Solr, Kerberos, cluster health, Puppet, Ganglia, Nagios, Flume, Sqoop, storm, Kafka, KMS

Python/Java Consultant

Confidential, Bentonville, AR


  • Implemented user interface guidelines and standards throughout the development and maintenance of the website using the HTML, CSS, JavaScript and jQuery.
  • Developed GUI using Django and Python for dynamically displaying the test block documentation and other features with Python code for a web browser.
  • Implemented AJAX for dynamic functionality of a webpages for front end applications.
  • Developed and tested many features for dashboard, created using Bootstrap, CSS, and JavaScript.
  • Used and Python Django to interface with the jQuery UI and manage the storage and deletion of content.
  • Wrote scripts to parse JSON documents and load the data in database.
  • Worked on front end frame works like CSS, Bootstrap for responsive webpages.
  • Used python libraries like Beautiful Soup and matplotlib.
  • Used Pandas for a data alignment and data manipulation.
  • Developed front end using Angular.js, React.JS, Node.js, bootstrap.js, backbone.js, JavaScript, where back end is java with REST webservice.
  • Involved in the application development using Spring Core, MVC modules and Java web based technologies: such as, Servlets, JSP, Java Web Service (REST/SOAP based), WSDL.
  • Utilized standard Python modules such as csv, robot parser, iterators and pickle for development.
  • Developed views and templates with Python and Django's view controller and templating language to create user-friendly website interface.
  • Worked on Python OpenStack APIs and used NumPy for Numerical analysis.
  • Developed Wrapper in Python for instantiating multi-threaded application.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
  • Used Ajax and jQuery for transmitting JSON data objects between frontend and controllers.
  • Used Angular MVC and two-way data binding. Worked on automation scripts using Selenium in JAVA.
  • Developed entire frontend and backend modules using on Django including Tastypie Web Framework using Git.
  • Designed, coded and tested key modules of the project using java oops concepts.
  • Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Developed Splunk infrastructure and related solution for application toolsets.
  • Helped team to on-board data, create various knowledge objects, Install and maintain the Splunk Apps.
  • Creating Application on Splunk to analyse the data.
  • Manage Splunk configurations files like input, props, transforms and lookups.
  • Configured Maven for Java automation projects and developed Maven Project Object Model (POM).
  • Setup automated cron jobs to upload data into database, generate graphs, bar charts, upload these chartsto wiki and backup the database.
  • Followed AGILE development methodology to develop the application.
  • Used core java concepts like Collections, Generics, Exception handling, IO, Concurrency to develop business logic.
  • Developed the business tier using Core Java and the HTTP interfaces using Servlets.
  • Software Development in Linux Environment, utilized XShell to build, deploy Java applications
  • Developed Founctional Package with Java, Erlang and Python
  • Added several options to the application to choose particular algorithm for data and address generation.
  • Maintained the versions using GIT and sending the release notes for each release.
  • Supported the issues seen in the tool across all the teams for several projects.

Environment: Python 3.2, Django, HTML5/CSS, MS SQL Server 2013, MySQL, JavaScript, Eclipse, Linux, Shell Scripting, PyCharm, Urllib, jQuery, GitHub, Angular.JS, Jira

Software Developer

Confidential, Greenville, SC


  • Design, develop, test, deploy and maintain the website.
  • Interaction with client to understand expectations and requirements.
  • Designed and developed the UI of the website using HTML, AJAX, CSS and JavaScript.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Designed and developed data management system using MySQL.
  • Rewrite existing Java application in Python.
  • Wrote python scripts to parse XML documents and load the data in database.
  • Performed testing using Django's Test Module.
  • Assisting junior Perl developers in the development of scripts
  • Developed a fully automated continuous integration system using GIT, Gerrit, Jenkins, MySQL and custom tools developed in Python and Bash.
  • Managed, developed, and designed a dashboard control panel for customers and Administrators using Django, Oracle DB, PostgreSQL.
  • Extensively used python modules such as requests, urllib, urllib2 for web crawling.
  • Implemented configuration changes for data models.
  • Used Pandas datamining library for statistics Analysis &NumPY for Numerical analysis.
  • Managed large datasets using Panda data frames and MySQL.
  • Exported/Imported data between different data sources using SQL Server Management Studio.
  • Maintained program libraries, users' manuals and technical documentation. Responsible for debugging and troubleshooting the web application.
  • Successfully migrated all the data to the database while the site was in production.
  • Developed GUI using webapp2 for dynamically displaying the test block documentation and other features of python code using a web browser.

Environment: Python 2.7, Django 1.8, CSS, HTML, JavaScript, JQuery, webbapp2, AJAX, MYSQL, Linux, Heroku, GIT, urllib, urllib2, Oracle DB, PostgreSQL, and VMWare API

Hire Now