We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Sunnyvale, CA


  • 7 + years of experience in IT specializing in Hadoop Administration, AWS infrastructure setup, DevOps and Software testing.
  • Good knowledge on spark components like Spark SQL, MLib, Spark Streaming and GraphX.
  • Working knowledge of Kafka for real - time data streaming and event based architecture.
  • Knowledge of minor and major upgrades of Hadoop and eco-system components.
  • Hands-on experience in AWS cloud landscape including EC2, Identity and Access management (IAM), S3, Cloud Watch, VPC, RDS.
  • Managed Data with Volumes and worked with Snapshots, Mirror Volumes, Data protection and Scheduling in MapR.
  • Implemented and managed for Devops infrastructure, Terraform, Jenkins, Puppet and Ansible implementation, Responsible for CI infrastructure and CD infrastructure and process and deployment strategy.
  • Comprehensive Knowledge of Linux kernel tuning, patching and extensive knowledge of Linux system imaging/mirroring using System Imager.
  • Experienced with deployments, maintenance and troubleshooting applications on Microsoft Azure Cloud infrastructure. Excellent knowledge of NOSQL databases like HBase, Cassandra.
  • Hands on Experience on Data Extraction, Transformation, Loading Data, Data Visualization using hortonworks Platform HDFS, Hive, Sqoop, Hbase, Oozie, Beeline, Docker, Yarn, impala, spark, Scala, Vertica, Oracle ,JSON, MS-Sql.
  • Experience in writing Shell scripts using ksh, bash, and perl, for process automation of databases, applications, backup and scheduling.
  • Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
  • Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
  • Experience working on Big data with Azur e.
  • Hands on experience in installation, configuration, management and support of full stack Hadoop Cluster both on premise and cloud using Hortonworks and Cloudera bundles.
  • Conceptual Knowledge in Load testing, Black Box testing, and Performance testing, and Stress Testing.
  • Analyzed and tuned performance for spark jobs in EMR, understanding the type and size of the input processed using specific instance types.
  • Hands on experience in deploying AWS services using Terraform.
  • Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
  • Experienced in DNS, NIS, NFS, FTP, NIS+, Samba Server, LDAP/AD remote access, security management, and system troubleshooting skills.
  • Hands-on experience in Azure Cloud Services (PaaS & IaaS)
  • Experienced in Hadoop Big Data Integration with ETL on performing data extract, loading and transformation process for ERP data.
  • Set up Disks for MapR, Handled Disk Failures configured storage pools and worked with a Logical Volume Manager.
  • Experience in AWS CloudFront, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
  • Good understanding on cluster capacity planning and configuring the cluster components based on requirements.
  • Collecting and aggregating a large amount of log data using Apache Flume and storing data in HDFS for further analysis.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Knowledge of securing the Hadoop cluster using Kerberos and Sentry.
  • Implemented Capacity schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users .
  • Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari.
  • Involved in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster monitoring and troubleshooting.
  • Worked on Apache Flink to implements the transformation on data streams for filtering, aggregating, update state.
  • Excellent knowledge on standard FlowFile processors in NIFI that are used for data routing, transformation and Mediation between systems. e.g.: "GETFILE", "PUTKAFKA", "GETFILE", "PUTFILE", "PUTHDFS" and etc.
  • Experience in performance tuning of Hadoop cluster using various JVM metrics.


BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Hue, Knox, NiFi

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks

No SQL Databases: HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate


Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle … DB2, SQL Server, MySQL, Teradata, Postgre SQL, Cassandra

Tools: and IDE: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure HDInsight

Operating Systems: Unix, Linux and Windows

Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic


Hadoop Consultant

Confidential, Sunnyvale, CA


  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
  • Worked on Hadoop cluster with 450 nodes on Cloudera distribution 7.7.0
  • Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
  • Do analytics using M ap reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
  • Troubleshooting the Azure Development, configuration and Performance issues.
  • Imported the data from various formats like JSON, Sequential, Text, CSV, XML Flat files, AVRO, and Parquet to HDFS cluster with compression with MapReduce programs.
  • Used Restful Web Services API to connect with the MapR table.
  • Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
  • Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
  • Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
  • Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
  • Create multiple groups and set permission polices for various groups in AWS
  • Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Worked on deployment on AWS EC2 instance with Postgres RDS and S3 file storage.
  • Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
  • Utilized Azure HDInsight to monitor and manage the Hadoop Cluster.
  • Load the data into Spark RDD and do in memory computation to generate output response.
  • Used L2 for issue which is cannot be handled by L1 it is forwarded to L2.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
  • Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
  • Monitoring workload, job performance and capacity planning using Cloudera Manager.
  • Written Flume configuration files to store streaming data in HDFS & u pgraded Kafka to
  • As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Used Python scripts to update content in the database and manipulate files.
  • Generated Python Django Forms to record data of online users.
  • Used L2 for technical support level than Tier I and therefore costs more as the technicians are more experienced and knowledgeable on a particular product or service.
  • Installed SonarQube as a Docker container on openstack, Azure, AWS EC2 and integrated it with Jenkins.
  • Knowledgeable with continuous deployment using Heroku, Jenkins and Ansible
  • Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Worked on cloud based Architecture for Big Data with Azure.
  • Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.

Environment : Ansible, Cloudera 5.6, Azure, HDFS, CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, AWS, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, Yarn, Falcon, Kerberos, Impala, Pig, Python Scripting, MySQL,Perl, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager

Sr. Hadoop Admin

Confidential, New York, NY


  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes .
  • Setting up test cluster with new services like Grafana and integrating with Kafka and Hbase for intense monitoring.
  • Involved in Cluster Level Security, Security of perimeter (Authentication- Cloudera Manager, Active directory and Kerberoes) Access (Authorization and permissions- Sentry) Visibility (Audit and Lineage - Navigator) Data (Data Encryption at Rest).
  • Analysis of raw data which was scheduled to dump in Azure blob storage(HDInsight cluster).
  • Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.
  • Building and Implementing ETL process developed using Big data tools such as Spark(scala/python), Nifi.
  • Worked with Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's.
  • Drive end to end deployment of various Components on the Azure Platform
  • Worked on Cloudera to analyze data present on top of HDFS.
  • Responsible for copying 800 TB of HDFS snapshot from Production cluster to DR cluster.
  • Responsible for copying 210 TB of Hbase table from Production to DR cluster.
  • Production experience in large environments using configuration management tools like Chef and Puppet supporting Chef Environment with 170+ servers and involved in developing manifests.
  • Installed and configured Cloudera with REHL and responsible for maintaining cluster.
  • Used Flink Streaming for pipelined Flink engine to process data streams to deploy new API including definition of flexible windows.
  • Configured AWS IAM and Security Groups.
  • Administering 150+ Hadoop servers which need java version updates, latest security patches, OS related upgrades and taking care of hardware related outages.
  • Upgraded Ambari Ambari 2.2.0, Ambari SOLR update from 4.10.3 to Ambari INFRA which is SOLR 5.5.2.
  • Installed application on AWS EC2 instances and also configured the storage on S3 buckets.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Knowledge on in HDFS data storage and support for Azure Data Lake.
  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Involved in loading data from UNIX file system to HDFS. Created root cause analysis (RCA) efforts for the high severity incidents.
  • Implemented Spark using python and Spark SQL for faster testing processing the data.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Experience in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.

Environment : Cloudera, Kafka, Apache Hadoop, Apache Flink, Ansible, HDFS, Map R, YARN, Sqoop, Flume, Oozie, Zookeeper, Kerberos, Sentry, Pig, Spark, Hive, Docker, Hbase, Nifi, Python, Puppet/Chef, LDAP/AD, Oracle Enterprise Manager (OEM), MySQL, PostgreSQL, Teradata , Azure

Hadoop Admin

Confidential, San Jose, CA


  • Responsible for Cluster maintenance. Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files.
  • Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, Hortonworks YARN, and zookeeper. Strong knowledge of hive's analytical functions.
  • Experience in dealing with Windows Azure IAAS - Virtual Networks, Virtual Machines, Cloud Services.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Introduced SmartSense and got optimal recommendations from the vendor and even for troubleshooting the issues.
  • Experience with automated CM and maintained a CI/CD pipeline, deployment tools such as Chef, Puppet, or Ansible.
  • Deployed Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Integration of HDInsight with Azure Data Lake store (ADLS) and blob storage.
  • Cluster maintenance as well as commission and decommission of nodes.
  • Collaborate with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Implemented Secondary sorting to sort reducer output globally in M ap reduce.
  • Set up Hortonworks Infrastructure from configuring clusters to Node.
  • Installing, configuring and administering Jenkins CI tool on AWS EC2 instances.
  • Worked on Error handling techniques and tuning the ETL flow for better performance and worked Extensively TAC (Admin Console), where we Schedule Jobs in Job Conductor.
  • Migrated Hive QL queries on structured data into Spark QL to improve performance.
  • Involved in improving the performance and optimization of the existing algorithms using Spark.
  • Configured the Kerberos and installed MIT ticketing system.
  • Hands on experience with Docker Puppet, Chef, Ansible, AWS CloudFormation, AWS CloudFront.
  • Secured the Hadoop cluster from unauthorized access by Kerberos, LDAP integration and TLS for data transfer among the cluster nodes.
  • Installing and configuring CDAP, an ETL tool in the development and Production clusters.
  • Integrated CDAP with Ambari to for easy operations monitoring and management.
  • Used CDAP to monitor the datasets and workflows to ensure smooth data flow.
  • Key role in migrating production and development Hortonworks Hadoop clusters to a new cloud based cluster solution.
  • Monitor Hadoop cluster and proactively optimize and tune cluster for performance.
  • Connected to the HDFS using the third-party tools like Teradata SQL assistant using ODBC driver.
  • Upgraded HDP (HortonWorks Data Platform) multiple times starting from HDP 1.7, HDP 2.3, HDP 2.4.3.

Environment: Hadoop, MapR, HDFS, Hive, Pig, Java, SQL, Sqoop, Nifi, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java (jdk 1.6), Eclipse, Hortonworks, Oracle and Unix/Linux.

Hadoop Admin



  • Implemented Kerberos for authenticating all the services in Hadoop Cluster.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
  • Implemented Security in Web Applications using Azure and Deployed Web Applications to Azure.
  • Used Apache Spark API over Hortonworks Hadoop YARN cluster to perform analytics on data in Hive.
  • Monitored multiple clusters environments using Metrics and Nagios.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Implemented and maintained the monitoring and alerting of production and corporate servers/storage using AWS Cloud watch.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Enabling the monitoring solution for HDInsight, Interactive query, Spark & HBase clusters.
  • Deployed a Hadoop cluster and integrated with Nagios and Ganglia.
  • Installed single Node machines for stake holders with Hortonworks HDP Distribution.
  • Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
  • Used Windows Azure to deploy the application on the cloud and managed the session.
  • Configured Zoo keeper to implement node coordination, in clustering support.
  • Experienced in providing security for Hadoop Cluster with Kerberos.
  • Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
  • Used Ganglia and Nagios to monitor the cluster around the clock.
  • Resolving tickets submitted by users, P1 issues, troubleshoot the error documenting, resolving the errors.
  • Worked on a live 52 node Hadoop Cluster running Hortonworks Data Platform.
  • Converted MapReduce jobs into Spark transformations and actions using Spark.
  • Resolving tickets submitted, P1 issues, troubleshoot the error, and documenting resolved errors.
  • Adding new Data Nodes when needed and running balancer.

Environment: Hadoop, Big Data, HDFS, Map Reduce, Hortonworks, Sqoop, Oozie, Pig, Hive, HBase, Flume, LINUX, Java,AWS, HDinsight, Eclipse, Cassandra HDP 2.0 .

Linux Admin



  • Configuring and troubleshooting DHCP on RHEL servers.
  • Installation, configuration and building RHEL servers troubleshooting and maintaining.
  • Software installation of Enterprise Security Manager on RHEL servers.
  • Performing Shell, Python and Perl Scripting for automation crucial tasks.
  • Patching RHEL servers for security, OS patches and upgrades.
  • Experience in system authentication on RHEL servers using Kerberos, DAP.
  • Installing and Configuring DNS on RHEL servers.
  • Managing Disk File Systems, Server Performance, Users Creation and Granting File Access Permissions.
  • Troubleshooting the backup issues by analyzing the NetBackup logs.
  • Configuring and Troubleshooting of various services like NFS, SSH, Telnet, FTP on UNIX platform.
  • Monitoring Disk, CPU and Memory & Performance of servers. Configuring LVM's

Environment : RHEL, Solaris, VMware, Apache, JBOSS, Web Logic, System Authentication, Web sphere, NFS, DNS, SAMBA, Red Hat Linux servers, Oracle RAC, VMware, DHCP.

Hire Now