We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Overland Park, KS


  • Over 7 + years of expertise in Hadoop, Big Data Analytics and Linux including design, installation, configuration and management of Apache Hadoop Clusters, Mapr, and Hortonworks& Cloudera Hadoop Distribution.
  • Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
  • Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
  • Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Used POSTGRESQL a system of data management was designed and developed where queries were optimized to improve the performance .
  • Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
  • Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
  • Experience in setting up automatic failover control and manual failover control using ZooKeeper and quorum journal nodes.
  • Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
  • Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle,JSON, MS-Sql.
  • Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
  • Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
  • Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
  • Worked on Apache Flink to implements the transformation on data streams for filtering, aggregating, update state.
  • Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
  • Excellent knowledge on standard FlowFile processors in NIFI that are used for data routing, transformation and Mediation between systems. e.g.: "GETFILE", "PUTKAFKA", "GETFILE", "PUTFILE", "PUTHDFS" and etc.
  • Experience in performance tuning of Hadoop cluster using various JVM metrics
  • Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
  • Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
  • Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
  • Working knowledge of Kafka for real-time data streaming and event based architecture.
  • Knowledge of minor and major upgrades of Hadoop and eco-system components.
  • Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
  • Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Expertise in Writing, Reviewing and Executing Test Cases.
  • Conceptual Knowledge in Load testing, Black Box testing, and Performance testing, and Stress Testing.
  • Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
  • Hands on experience in deploying AWS services using Terraform.
  • Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
  • Experienced in DNS, NIS, NFS, Solr, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.


BIG Data Ecosystem: Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Solr, Hue, Knox, NiFi, HDFS, MapReduce

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

BIG Data Security: Redaction, Sentry, Ranger, Navencrypt, Kerberos, AD, LDAP, KTS, KMS, SSL/TLS, Cloudera Navigator, Hortonworks

Programming Languages: Scala, Python, Java, SQL, PL/SQL, Hive-QL, Pig Latin

No SQL Databases: Cassandra, MongoDB


Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Databases: Oracle DB2, SQL Server, MySQL, Teradata, Postgre SQL, Cassandra

Tools: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Operating Systems: Unix, Linux and Windows

Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic


Hadoop Consultant

Confidential, Overland Park, KS


  • Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.
  • Generated Python Django Forms to record data of online users.
  • Used L2 for technical support level than Tier I and therefore costs more as the technicians are more experienced and knowledgeable on a particular product or service.
  • Knowledgeable with continuous deployment using Heroku, Jenkins and Ansible
  • Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
  • Worked on cloud based Architecture for Big Data with Azure.
  • Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
  • Worked on Hadoop cluster with 100 nodes on Cloudera distribution 5.6
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
  • Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
  • Monitoring workload, job performance and capacity planning using Cloudera Manager.
  • Written Flume configuration files to store streaming data in HDFS & upgraded Kafka to
  • As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Used Python scripts to update content in the database and manipulate files.
  • Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices
  • Do analytics using Map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
  • Imported the data from various formats like JSON, Sequential, Text, CSV, XML Flat files, AVRO, and Parquet to HDFS cluster with compression with MapReduce programs.
  • Used Restful Web Services API to connect with the MapR table.
  • Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
  • Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
  • Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
  • Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
  • Create multiple groups and set permission polices for various groups in AWS
  • Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Worked on deployment on AWS EC2 instance with Postgres RDS and S3 file storage.
  • Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
  • Load the data into Spark RDD and do in memory computation to generate output response.

Environment : MapR 5.2 & 6.0 , Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, AWS, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, CentOS and other UNIX utilities, Cloudera Manager , Yarn, Falcon, Kerberos, Ansible, Cloudera 5.6 & 6.0, Azure, HDFS, CDH4.7, Hadoop-2.0.0 HDFS , Impala, Pig, Python Scripting

Hadoop Admin

Confidential, Tyson, VA


  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
  • Implemented Spark using python and Spark SQL for faster testing processing the data.
  • Maintained GIT repositories for DevOps environment: automation code and configuration.
  • Parsed high-level design specification to simple ETL coding and mapping standards and cluster co-ordination services through Zookeeper.
  • Involved in Devops tools such as Jenkins, Nexus, Chef and Ansible for build and deploy applications.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Building and Implementing ETL process developed using Big data tools such as Spark(scala/python), Nifi.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
  • Successfully migrated the Elasticsearch database from SQLite to MySQL to PostgreSQL with complete data integrity.
  • Worked on Cloudera to analyze data present on top of HDFS.
  • Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
  • Knowledge on in HDFS data storage and support for Azure Data Lake.
  • Responsible for copying 210 TB of Hbase table from Production to DR cluster.
  • Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 170+ servers and involved in developing manifests.
  • Installed and configured Cloudera with REHL and responsible for maintaining cluster.
  • Used Flink Streaming for pipelined Flink engine to process data streams to deploy new API including definition of flexible windows.
  • Configured AWS IAM and Security Groups.
  • Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
  • Upgraded Ambari Ambari 2.2.0, Ambari SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.

Environment : Cloudera, HDFS, Map R 5.1, YARN, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Nifi, Python, Puppet/Chef, LDAP/AD, Oracle Enterprise Manager (OEM), MySQL, PostgreSQL, Teradata , Azure, Kafka, Apache Hadoop, Apache Flink, Ansible.

Hadoop Admin

Confidential, Irvine, CA


  • Involved in improving the performance and optimization of the existing algorithms using Spark.
  • Migrated Hive QL queries on structured data into Spark QL to improve performance.
  • Configured the Kerberos and installed MIT ticketing system.
  • Hands on experience with Docker Puppet, Chef, Ansible, AWS CloudFormation, AWS CloudFront.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Key role in migrating production and development Hortonworks Hadoop clusters to a new cloud based cluster solution.
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.3, HDP 2.4.3
  • Cluster maintenance as well as commission and decommission of nodes.
  • Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Implemented Secondary sorting to sort reducer output globally in Map reduce.
  • Set up Hortonworks Infrastructure from configuring clusters to Node.
  • Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
  • Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, Hortonworks YARN, and zookeeper. Strong knowledge of hive's analytical functions.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Experience with automated CM and maintained a CI/CD pipeline, deployment tools such as Chef, Puppet, or Ansible.
  • Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on Error handling techniques and tuning the ETL flow for better performance and worked Extensively TAC (Admin Console), where we Schedule Jobs in Job Conductor.

Environment: Hive, Pig, Java, SQL, Sqoop, Nifi, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Hortonworks, Oracle and Unix/Linux, Hadoop, MapR 5.0, HDFS.

Linux / Hadoop Admin



  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Installed single Node machines for stake holders with Hortonworks HDP Distribution.
  • Configured Zoo keeper to implement node coordination, in clustering support.
  • Experienced in providing security for Hadoop Cluster with Kerberos.
  • Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
  • Used Ganglia and Nagios to monitor the cluster around the clock.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Monitored multiple clusters environments using Metrics and Nagios.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Configuring and Troubleshooting of various services like NFS, SSH, Telnet, FTP on UNIX platform.
  • Experience in system authentication on RHEL servers using Kerberos, DAP.
  • Installing and Configuring DNS on RHEL servers.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting File Access Permissions.
  • Troubleshooting the backup issues by analyzing the NetBackup logs.
  • Configuring and troubleshooting DHCP on RHEL servers.
  • Installation, configuration and building RHEL servers troubleshooting and maintaining.
  • Software installation of Enterprise Security Manager on RHEL servers.
  • Performing Shell, Python and Perl Scripting for automation crucial tasks.

Environment: LINUX, Java, AWS, Hortonworks , Sqoop, Oozie, Pig, Hive, HBase, Flume, Eclipse, Cassandra HDP 2.0, Hadoop, Big Data, HDFS, Map Reduce. Web sphere, NFS, DNS, Red Hat Linux servers, Oracle RAC, VMware, DHCP, RHEL, Solaris, VMware, Apache.

Hire Now