Hadoop Administrator Resume
Chicago, IL
PROFESSIONAL EXPERIENCE:
- Having 8+ years of IT Experience in Analysis, Design, Development, Implementation and Testing of enterprise wide application, Data Warehouse, Client Server Technologies and Web - based Applications.
- Over 4 Years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, Hbase, PIG, SQOOP, NAGIOS, Spark, Impala, OOZIE, and Flume Big Data and Big Data Analytics.
- Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
- Installation, configuration, supporting and managing Hortonworks Hadoop cluster.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, NameNode, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
- Good experience in installation, configuration and management of clusters in Cloudera (CDH4) & Hortonworks (HDP2.2) distributions using CM and Ambari 2.2.
- Experience in Hadoop architecture and its various components such as HDFS, Job Tracker, Task Tracker, NameNode, Data Node and MapReduce concepts.
- Good understanding and hands on experience of Hadoop Cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.
- Have extensively worked on Pivotal HD (3.0) and Hortonworks (HDP 2.3), MapR, EMR and Cloudera (CDH5) distributions.
- Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
- Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
- Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Experience in configuration and management of security for Hadoop cluster using Kerberos.
- Hands on experience on configuring a Hadoop cluster on Amazon Web Services (AWS).
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Collected logs of data from various sources and integrated into HDFS Using Flume
- Provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack cloud platform.
- Implementing a Continuous Integrations and Continuous Delivery framework using Jenkins, Puppet, and Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
- Experience in designing data models for databases and Data Warehouse/Data Mart/ODS for OLAP and OLTP environments.
- Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux 4, 5 and 6.
PROFESSIONAL EXPERIENCE
Hadoop Administrator
Confidential - Chicago, IL
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Installed and configured a HortonWorks HDP 2.2 using Ambari and manually through command line.
- Implemented NiFi configuration in a kerborized cluster.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Apache Pig, Apache HBase and Apache Sqoop.
- Configured Docker container for branching purposes.
- Installed, Configured, Tested Datastax Enterprise Cassandra multi-node cluster which has 4 Datacenters and 5 nodes each.
- Successfully upgraded Hortonworks Hadoop distribution stack from 2.7.1 to 2.7.2.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Developing data pipeline using Flume, Sqoop, Pig and Java mapreduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Create AWS instances and create a working cluster of multiple nodes in cloud environment.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Azure Cloud Infrastructure design and implementation utilizing ARM templates.
- Implanted a continuous Delivery pipeline with Docker, Jenkins and GitHub.
- Create hadoop powered big data solution and services through Azure HDinsight.
- Experience on working with Hortonworks Sandbox distribution and its various versions HDP 2.4.0, HDP 2.5.0.
- Created MapR DB tables and involved in loading data into those tables.
- Successfully secured the Kafka cluster with Kerberos.
- Generated real time reports by using Elasticsearch and Kibana.
- Create AWS instances and create a working cluster of multiple nodes in cloud environment.
- Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
- Designed Azure storage for the Kafka topics and merge and loaded into couchbase with constant query components.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model
- Worked with developer teams on Nifi workflow to pick up the data from rest API server, from data lake as well as from SFTP server and send that to Kafka broker.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Starting, stopping and restarting the Cloudera manager servers whenever there are any changes or any errors.
- Responsible for commissioning and decommissioning Data nodes, Troubleshooting, Manage & review data backups, Manage & review Hadoop Log Files.
- Presented Demo on Microsoft Azure, an overview of cloud computing with Azure.
- Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized Performance Tuning of Hadoop Cluster.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
- Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics and Implemented Cassandra connector for Spark in Java.
- Manage and review Hadoop Log files as a part of administration for troubleshooting purposes
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Monitored and configured a Test Cluster on Amazon Web Services with EMR, EC2 instances for further testing process and gradual migration.
- Configured Zookeeper to implement node coordination, in clustering support.
- Automated the configuration management for several servers using Chef and Puppet.
- Successfully Generated consumer group lags from Kafka using their API
- Installation and configuration of Hortonworks distribution HDP 2.2.x/2.3.x with Ambari.
- Managing and reviewing Hadoop and HBase log files.
- Worked with Nifi for managing the flow of data from source to HDFS.
- Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari. Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
- Set up automated monitoring for Hadoop cluster using Ganglia, which helped figure out the load distribution, memory usage and provided an indication for more space.
- Implemented FIFO schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
- Involved in Data audit Cloudera manager and engaging in services and monitoring information
- Development web service using Windows Communication Foundation and .Net to receive and process XML files and deploy on Cloud Service on Microsoft Azure.
Environment: HDFS, HBase, Sqoop, Flume, Zoo keeper, Kerberos, cluster health, RedHat Linux, Impala, Cloudera Manager, Azure, Elasticsearch, Kibana Hortonwork 2.5, Chef, Puppet, Ambari, NiFi Kafka Cassandra, Ganglia and Cloudera Manager, Agile/scrum.
Hadoop/ Cloudera Admin
Confidential - San Jose, CA
Responsibilities:
- Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
- Involved in start to end process of Hadoop cluster setup which includes Configuring and Monitoring the Hadoop Cluster.
- Used Kafka to allow a single cluster to serve as the central data backbone for a large organization.
- Developed and designed system to collect data from multiple portals using Kafka and then process it using spark.
- Used Cronjob to backup Hadoop Service databases to S3 buckets.
- Experienced in provisioning and managing multi-datacenter Cassandra cluster on public cloud environment Amazon Web Services(AWS) - EC2.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration, and installation on Kafka.
- Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggy Banks and other sources.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Deployed a Hadoop cluster using cdh4 integrated with Nagios and Ganglia.
- Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with cdh4.
- Implemented MapR token based security.
- Worked on creation of custom Docker container images, tagging and pushing the images.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to MapR.
- Built quick reports/dashboards from internal MSTR data sources like Intelligent Cubes using MicroStrategy Visual HD Insights.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Efficient in SCALA Programming on top of Spark Frameworks.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
- Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Good knowledge in adding security to the cluster using Kerberos and Sentry.
- Secure Hadoop clusters and CDH applications for user authentication and authorization using Kerberos deployment.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Implemented APACHE IMPALA for data processing on top of HIVE.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
- Scheduled jobs using OOZIE workflow.
- Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
- Installed MIT Kerberos for authentication of application and Hadoop service users.
- Used Cronjob to backup Hadoop Service databases to S3 buckets.
- Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading.
- Supported technical team in management and review of Hadoop logs.
- Experience in managing the Hadoop MapR infrastructure with MCS.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
- Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
- Used Cloudera Navigator for data governance: Audit and Linage.
- Managed Kubernetes charts using Helm and created reproducible builds of the Kubernetes applications, managed Kubernetes manifest files and Managed releases of Helm packages. Monitoring performance and tuning configuration of services in Hadoop Cluster.
- Design and Configure the Cluster with the services required (HDFS, Hive, Hbase, Oozie, Zookeeper).
- Experience in managing and analyzing Hadoop log files to look troubleshooting issues.
- Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
Environment: Hadoop, Cloudera, Spark Hive, HBase, SQL, NiFi Flume, Kafka, Spark Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, Kubernetes, MapR, Java, Jenkins, Azure, GitHub, MySQL, Hortonwork, MapR NoSQL, MongoDB, Shell Script, python.
Hadoop Administrator
Confidential - St. Louis, MO
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
- Deployed a Hadoop cluster and integrated with Nagios and Ganglia.
- Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
- Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
- Created HD Insight cluster in Azure (Microsoft Specific tool) was part of the deployment and Component unit testing using Azure Emulator.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- I was involved in loading and transmitting data into HDFS and Hive using Sqoop and Kafka.
- Highly capable in scheduling jobs with Oozie scheduler.
- Knowledge on bootstrapping, removing, replicating the nodes in Cassandra and Solr clusters.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Monitored multiple clusters environments using Metrics and Nagios.
- Worked on the MapR clusters and fine-tuned them to run spark jobs efficiently
- Involved in creating the Azure Services with Azure Virtual Machine.
- Experienced in providing security for Hadoop Cluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Developed Pig UDFs to pre-process data for analysis
- Worked with business teams and created Hive queries for ad hoc access.
- Responsible for creating Hive tables, partitions, loading data and writing hive queries.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Worked on analyzing Data with HIVE and PIG.
Environment: Hadoop, Hive, AWS, Flume, HDFS, Spark, Kafka, Hive, Sqoop, Oozie, Hadoop Distribution of HortonWorks, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Cassandra.
Hadoop Admin
Confidential - Sunnyvale, CA
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in Installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager.
- Capable to handle Hadoop cluster installations in various environments such as Unix, Linux and Windows, able to implement and execute Pig Latin scripts in Grunt Shell.
- Experienced with file manipulation, advanced research to resolve various problems and correct integrity for critical Big Data issues with NoSQL Hadoop HDFS Database.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in the installation of CDH3 and up-gradation from CDH3 to CDH4.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in Installing the Oozie workflow engine in order to run multiple Hive and Pig jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, CDH3, MongoDB, Cassandra, HBase, Java, Eclipse, Oracle and Unix/Linux.
System/Hadoop Admin
Confidential
Responsibilities:
- Install, Configure and maintain Single-node and Multi-node cluster Hadoop cluster.
- Setup cluster environment for Highly Available systems.
- Hadoop cluster configuration & deployment to integrate with systems hardware in the data center.
- Troubleshooting, diagnosing, tuning, and solving Hadoop issues.
- Monitor a Hadoop cluster and execute routine administration procedures.
- Managing hadoop Services like Namenode, Datanode, Jobtracker, Tasktracker etc.
- Installed Apache Hadoop 2.5.2 and Apache Hadoop 2.3.0 on Linux Dev servers
- Installed PIG, HIVE on multi-node cluster.
- Integrating PIG, Hive, Sqoop on Hadoop.
- Monthly Linux server maintenance, shutting down essential Hadoop name node and data node.
- Made Hadoop cluster secured by implementing Kerberos.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Balancing Hadoop cluster using balancer utilities to spread data across the cluster equally.
- Implemented data ingestion techniques like Pig and Hive on production environment.
- Commissioning and decommissioning of Hadoop nodes.
- Involved in Cluster Capacity planning along with expansion of the existing environment.
- Regular health checkups of the system using Hadoop metrics - Scripted.
- Providing 24X7 support to Hadoop environment.
Environment: Hadoop 1.x and 2.x, MapReduce, HDFS, Hive, SQL, Cloudera Manager, Pig, Sqoop, Oozie, CDH3 and CDH4, Apache Hadoop.
Linux/System Admin
Confidential
Responsibilities:
- Worked as System administration, maintenance and monitoring various day-to-day operations.
- Well-Trained and worked Primarily on RHEL 5.x Operating Systems.
- Experienced in Installation of Linux operating systems, applying Read, Write, and Execute file permission and on File system issues and Disk management.
- Worked through Creating, Managing and modifying the user accounts, groups and access levels on Linux. Worked on package management using RPM and YUM.
- Provided technical support by troubleshooting issues with various Servers on different platforms.
- Notify server owner if there was a failover or crash. Also notify Unix Linux Server Support L3.
- Monitored CPU loads, restart processes, check for file systems.
- Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
- Worked on Planning, configuring storage using LVM and applying patches on Linux machines.
- Experienced on creating volume groups and Logical volumes on Linux.
- Worked on Installation and Configuration of SAMBA server, DNS server APACHE server.
- Worked on using tar command for Data Compressing, Backup and recovery.
- Experienced in developing scripts in PERL and SHELL to automate the process, like Preparation of operational testing scripts for Log check, Backup and recovery and Failover.
- Monitored server and application performance and tuned I/O, memory and Installation of SSH and configuring of keys base authentication.
Environment: Linux, Red Hat 5.x, DNS, YUM, RPM, LVM, PERL, SHELL, Samba, Apache, Tomcat Web-Sphere.
TECHNICAL SKILLS:
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Spark, hortonwork, Ambari, Knox, Phoniex, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), chef, Nagios,NiFi.
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Servers: Web logic server, WebSphere and Jboss.
Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.
Tools: Interwoven Teamsite, GMS, ElasticSerach, Kibana BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Jira, Ranger Test NG, Junit.
Database: MySQL, NoSQL, JDBC/ODBC, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.
Processes: Incident Management, Release Management, Change Management.