We provide IT Staff Augmentation Services!

Hadoop Consultant Resume

Chicago, IL


  • Over 7+ years of professional IT experience in analysis, design, and development using Hadoop, Java J2EE and SQL.
  • 5+ years of experience in developing large scale applications using Hadoop and Other Big data tools.
  • Created a complete processing engine based on Cloudera distribution.
  • Hands - on experience with Production Hadoop applications such as administration, configuration management, debugging and performance tuning.
  • Experience in developing solutions to analyze large data security efficiently with Kerberos.
  • Experience with new Hadoop 2.0 architecture YARN (MRV2) and developing YARN Applications on it.
  • Excellent Knowledge on Hadoop architecture as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Excellent hands on with importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and Hive using Sqoop.
  • Knowledge in Kafka installation & integrational with Spark Streaming.
  • Hands-on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
  • Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
  • Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Experience in converting MapReduce applications to Spark.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Good knowledge in using job scheduling and workflow designing tools like Oozie.
  • Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager, Hortonworks, Presto and Ambari.
  • Have good experience creating real time data streaming solutions using Spark/Storm, Kafka and Flume.
  • Very good understanding on NOSQL databases like MongoDB and HBase.
  • Extensive experience in creating Class Diagrams, Activity Diagrams, Sequence Diagrams using Unified Modeling Language(UML).
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Installed and configured Hadoop, MapReduce, HDFS developed multiple MapReduce jobs in java for data cleaning and Up gradation Cloudera from 5.5 to 6.0 version.
  • Good understanding of Data Mining and Machine Learning techniques.
  • Experience in handling messaging services using Apache Kafka.
  • Experiences in fine-tuning Map reduce jobs for better scalability and performance.
  • Created custom python/shell scripts to import data via SQOOP from various SQL databases such as Teradata, SQL Server, and Oracle.
  • Experience on NoSQL Databases such as HBase and Cassandra.
  • Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle, SQL Server, MySQL
  • Working experience in Development, Production and QA Environments.
  • Experienced in SDLC, Agile (SCRUM) Methodology, And Iterative Waterfall.
  • Well experienced in building servers like DHCP, PXE with kickstart, DNS and NFS and used them in building infrastructure in Linux Environment and working with Puppet for application deployment.
  • Experienced in Linux Administration tasks like IP Management (IP Addressing, Sub netting, Ethernet Bonding, and Static IP).
  • Good communication and interpersonal skills, a committed team player and a quick learner.


Big Data Technologies: Apache Hadoop, Map-Reduce, Cloudera 4.3.2, HDFS, Cloudera Impala, Hortonworks, HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, Hue, Presto, Ranger, Zeppelin, OOZIE, Kerberos.

Languages: Core Java, J2EE, SQL, PL/SQL, Unix Shell Scripting, Perl, Python, Shell.

Web Technologies: JSP, EJB 2.0, JNDI, JMS, JDBC, HTML, JavaScript

Web/Application servers: Tomcat, JBoss 5.1.0

Databases: Oracle 11G/10G, SQL Server, DB2, Sybase, Teradata

Frame Works: Hadoop, MapReduce, MVC, Struts 2.x/1.x

IDE: IntelliJ IDEA 7.2, EditPlus3, Eclipse3.5, NetBeans6.5, TOAD, PL/SQL, Teradata

Version Control: VSS Visual Source Safe, Subversion, CVS

Testing Technologies: JUnit 4/3.8

Office Packages: MS-Office 2010, 2007, 2003 and Vision

Operating Systems: MS-DOS, Windows XP, Windows 7, UNIX and Linux


Hadoop Consultant

Confidential, Chicago, IL


  • Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
  • Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
  • Good understanding on Architecture, Configurations and Security of Cassandra with Falcon. Data read path and write path for Cassandra.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Developed simple and complex MapReduce programs in Java for Data Analysis.
  • Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
  • Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
  • Monitoring workload, job performance and capacity planning using Cloudera Manager.
  • Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
  • Written Flume configuration files to store streaming data in HDFS.
  • Upgraded Kafka to
  • As an admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
  • Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
  • Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
  • Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
  • Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
  • Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
  • Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
  • Used Python scripts to update content in the database and manipulate files.
  • Generated Python Django Forms to record data of online users.
  • Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
  • Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.

Environment: Cloudera 4.3.2, HDFS, CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, Hue, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Jira, Zeppelin, Web Logic 8.1 Kafka, Ranger, Yarn, Falcon, Kerberos, Impala, Pig, Python Scripting, MySQL,Perl, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager.

Hadoop Administrator

Confidential, Minneapolis, MN


  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Installed and Configured Sqoop to import and export the data into MapR-FS, HBase and Hive from Relational databases.
  • Administering large MapR Hadoop environments build and support cluster set up, performance tuning and monitoring in an enterprise environment.
  • Installed and Configured MapR-zookeeper, MapR-cldb, MapP-job tracker, MapR-task tracker, MapR-resource manager, MapR-node manager, MapR-fileserver, and MapR-webserver.
  • Worked independently with Cloudera support for any issue/concerns with Hadoop cluster.
  • Point of Contact for Vendor escalation Cloudera Manager Up gradation from 5.3 to 5.5 versions.
  • Installed and configured Knox gateway to secure HIVE through ODBC, WebHcat and Oozie services.
  • Load data from relational databases into MapR-FS filesystem and HBase using Sqoop.
  • Setting up MapR metrics with NoSQL database to log metrics data.
  • Close monitoring and analysis of the MapReduce job executions on cluster at task level.
  • Optimized Hadoop clusters components to achieve high performance.
  • Monitored multiple Hadoop clusters environments using Ganglia and Nagios.
  • Integrated HDP clusters with Active Directory and enabled Kerberos for Authentication.
  • Worked on commissioning & decommissioning of Data Nodes, NameNode recovery, capacity planning.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Worked on creating the Data Model for HBase from the current Oracle Data model.
  • Implemented High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
  • Leveraged Chef to manage and maintain builds in various environments.
  • Planned for hardware and software installation on production cluster and communicated with multiple teams to get it done.
  • Performed troubleshooting, fixed and deployed many Python bug fixes of the applications and involved in fine tuning of existing processes followed advance patterns and methodologies.
  • Monitoring the Hadoop cluster functioning through MCS.
  • Worked on NoSQL databases including HBase.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Worked with Linux server admin team in administering the server hardware and operating system.
  • Worked closely with data analysts to construct creative solutions for their analysis tasks.
  • Managed and reviewed Hadoop and HBase log files.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
  • Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
  • Automated workflows using shell scripts pull data from various databases into Hadoop.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop 1.2.1, Map Reduce, Hive 0.10.0, Perl, Pig 0.11.1, Kerberos, Ranger, Presto, Hue, Oozie 3.3.0, H base 0.94.11, Sqoop1.4.4, Flume 1.4.0, Zeppelin, Perl Java, Python, SQL, PL/SQL, Oracle 10g, Eclipse

Hadoop Administrator/ Tester



  • Experienced on setting up Hortonworks cluster and installing all the ecosystem components through Ambari and manually from command line.
  • Cluster maintenance, Monitoring, commissioning and decommissioning of data nodes, troubleshooting, manage and review log files.
  • Actively involved in installation performance tuning, patching, regular backups, user account administration, upgrades and documentation.
  • Installation of new components and removal of them through Ambari.
  • Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency.
  • Used Cloudera Navigator for data governance: Audit and Linage.
  • Periodically reviewed Hadoop related logs and fixed errors.
  • Commissioned new cluster nodes for increased capacity and decommissioned servers with hardware problems.
  • Responsible for adding new eco system components, like storm, flume, knox with required custom configurations based on the requirements and Hadoop daemons.
  • Developed Python, Shell Scripts and Power shell for automation purpose.
  • Implemented Kerberos Security Authentication protocol for existing cluster.
  • Worked with Ranger, Knox configuration to provide centralized security to Hadoop services.
  • Created independent libraries in Python which can be used by multiple projects which have common functionalities.
  • Hands on experience with NoSQL databases like Hbase, Cassandra and MongoDB.
  • Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Performing Linux systems administration on production and development servers (Red Hat Linux, CentOS and other UNIX utilities).

Environment: Hadoop HDFS, MapReduce, Hortonworks, Falcon, Cloudera, Ambari, Ranger, Knox, Puppet, Hive, Pig, Kafka, Oozie, Sqoop, Shell, Python, MongoDB, Apache HBase.

Hadoop Admin



  • Solid Understanding of Hadoop HDFS, Map-Reduce and other Eco-System Projects.
  • Installation and Configuration of Hadoop Cluster.
  • Plugin allows Hadoop MapReduce programs, HBase, Pig and Hive to work unmodified and access files directly. The plugin also provided data locality for Hadoop across host nodes and virtual machines.
  • Developed map Reduce jobs to analyze data and provide heuristics reports.
  • Adding, Decommissioning and rebalancing nodes.
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics.
  • Rack Aware Configuration.
  • Configuring Client Machines.
  • Configuring, Monitoring and Management Tools.
  • HDFS Support and Maintenance.
  • Cluster HA Setup.
  • Applying Patches and Perform Version Upgrades.
  • Incident Management, Problem Management and Change Management.
  • Performance Management and Reporting.
  • Contributed to building hands-on tutorials for the community to learn how to setup Hortonworks Data Platform (powered by Hadoop) and Hortonworks Data flow.
  • Developed and designed automation framework using Python and Shell scripting.
  • Recover from Name Node failures.
  • Schedule Map Reduce Jobs -FIFO and FAIR share.
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop.
  • Integration with RDBMS using swoop and JDBC Connectors.
  • Working with Dev Team to tune Job Knowledge of Writing Hive Jobs.

Environment: HDP 2.2, Python, HBase, Kafka, HDFS, Yarn, Hortonworks, MongoDB, Hive, Oozie, Pig, Sqoop, Shell Scripting, Python, MySQL, RHEL, CentOS, Ambari.

Linux Administrator



  • Configuring and tuning system and network parameters for optimum performance.
  • Gained knowledge on troubleshooting and problem solving skills, including application and network-level troubleshooting ability.
  • Gained knowledge and experience on writing shell scripts to automate the tasks.
  • Identifying and triaging outages monitor and remediate systems and network performance.
  • Installing, Upgrading and Managing Hadoop Cluster on Hortonworks.
  • Developing tools to automate the deployment, administration, and monitoring of a large-scale Linux environment.
  • Performing server tuning, operating system upgrades.
  • Participating in the planning phase for system requirements on various projects for deployment of business functions.
  • Participating in 24x7 on-call rotation and maintenance windows.
  • Communication & coordination with internal / external groups and operations.

Environment: Windows 2008/2007 server, Unix Shell Scripting, SQL Manager Studio, Red Hat Linux, Microsoft SQL Server 2000/2005/2008, MS Access, Hortonworks, NoSQL, Linux/Unix, Putty Connection Manager, Putty, SSH.

Hire Now