Hadoop Administrator Resume
St Louis, MO
SUMMARY:
- Having 8+ years of IT Experience in Analysis, Design, Development, Implementation and Testing of enterprise wide application, Data Warehouse, Client Server Technologies and Web - based Applications.
- Over 4 Years of experience in dealing with Apache Hadoop components like HDFS, MapReduce, HIVE, Hbase, PIG, SQOOP, NAGIOS, Spark, Impala, OOZIE, and Flume Big Data and Big Data Analytics.
- Experienced in administrative tasks such as Hadoop installation in pseudo distribution mode, multimode cluster and installation of Apache Ambari in Hortonworks Data Platform (HDP2.5).
- Installation, configuration, supporting and managing Hortonworks Hadoop cluster.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Name Node, Job Tracker, DataNode, Task Tracker and Map Reduce concepts.
- Good experience in installation, configuration and management of clusters in Cloudera (CDH4) & Hortonworks (HDP2.2) distributions using CM and Ambari 2.2.
- Experience in Hadoop architecture and its various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Good understanding and hands on experience of Hadoop Cluster capacity planning, performance tuning, cluster monitoring, troubleshooting.
- Have extensively worked on Pivotal HD (3.0) and Hortonworks (HDP 2.3), MapR, EMR and Cloudera (CDH5) distributions.
- Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
- Experience in deployment of Hadoop Cluster using Puppet tool.
- Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
- Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Experience in configuration and management of security for Hadoop cluster using Kerberos.
- Hands on experience on configuring a Hadoop cluster on Amazon Web Services (AWS).
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Collected logs of data from various sources and integrated into HDFS Using Flume
- Provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack cloud platform.
- Implementing a Continuous Integrations and Continuous Delivery framework using Jenkins, Puppet, and Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
- Having Strong Experience in LINUX/UNIX Administration, expertise in Red Hat Enterprise Linux 4, 5 and 6.
TECHNICAL SKILLS:
Big Data Tools: HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Spark, hortonwork, Ambari, Knox, Phoniex, Impala, Storm.
Hadoop Distribution: Cloudera Distribution of Hadoop (CDH), chef, Nagios,NiFi.
Operating Systems: UNIX, Linux, Windows XP, Windows Vista, Windows 2003 Server
Servers: Web logic server, WebSphere and Jboss.
Programming Languages: Java, Pl SQL, Shell Script, Perl, Python.
Tools: Interwoven Teamsite, GMS, ElasticSerach, Kibana BMC Remedy, Eclipse, Toad, SQL Server Management Studio, Jenkins, GitHub, Jira, Ranger Test NG, Junit.
Database: MySQL, NoSQL, JDBC/ODBC, Couchbase, InfluxDB, Teradata, HBase, MongoDB, Cassandra, Oracle.
Processes: Incident Management, Release Management, Change Management.
WORK EXPERIENCE:
Hadoop Administrator
Confidential, St. Louis, MO
Responsibilities:
- Primary tasks and responsibilities center on O&M support of a Secure (Keberized) Cloudera distribution of Hadoop systems.
- Installing and Configuring Systems for use with Cloudera distribution of Hadoop (consideration given to other variants of Hadoop such as Apache, MapR, Hortonworks, Pivotal, etc.)
- Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and maintain them. Implemented NiFi configuration in a kerborized cluster.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Apache Pig, Apache HBase and Apache Sqoop.
- Working on Hadoop cluster with 450 nodes on Cloudera distribution 7.7.0.
- Installed, Configured, Tested Datastax Enterprise Cassandra multi-node cluster which has 4 Datacenters and 5 nodes each.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
- Developing data pipeline using Flume, Sqoop, Pig and Java mapreduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Used Oozie workflows to automate jobs on Amazon EMR.
- Worked with Couchbase support team for sizing the Couchbase cluster.
- Create AWS instances and create a working cluster of multiple nodes in cloud environment.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Azure Cloud Infrastructure design and implementation utilizing ARM templates.
- Implanted a continuous Delivery pipeline with Docker, Jenkins and GitHub.
- Create hadoop powered big data solution and services through Azure HDinsight.
- Created MapR DB tables and involved in loading data into those tables.
- Successfully secured the Kafka cluster with Kerberos.
- Generated real time reports by using Elasticsearch and Kibana.
- Create AWS instances and create a working cluster of multiple nodes in cloud environment.
- Maintaining the Operations, installations, configuration of 100+ node clusters with MapR distribution.
- Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
- Designed Azure storage for the Kafka topics and merge and loaded into couchbase with constant query components.
- Experience in AWS Cloud Front, including creating and managing distributions to provide access to S3 bucket or HTTP server running on EC2 instances.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model
- Worked with developer teams on Nifi workflow to pick up the data from rest API server, from Data Lake as well as from SFTP server and send that to Kafka broker.
- Creating event processing data pipelines and handling messaging services using Apache Kafka.
- Starting, stopping and restarting the Cloudera manager servers whenever there are any changes or any errors.
- Co-ordinate with Hortonworks to fix un-resolved issues on the platform
- Responsible for commissioning and decommissioning Data nodes, Troubleshooting, Manage & review data backups, Manage & review Hadoop Log Files.
- Presented Demo on Microsoft Azure, an overview of cloud computing with Azure.
- Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized Performance Tuning of Hadoop Cluster.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File System.
- Strong exposure in Automation of maintenance tasks in Bigdata environment through Cloudera Manager API.
- Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Created instances in AWS as well as migrated data to AWS from data Center using snowball and AWS migration service
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Created POC to store Server Log data into Cassandra to identify System Alert Metrics and Implemented Cassandra connector for Spark in Java.
- Manage and review Hadoop Log files as a part of administration for troubleshooting purposes
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Monitored and configured a Test Cluster on Amazon Web Services with EMR, EC2 instances for further testing process and gradual migration.
- Configured Zookeeper to implement node coordination, in clustering support.
- Automated the configuration management for several servers using Chef and Puppet.
- Successfully Generated consumer group lags from Kafka using their API.
- Used Agile/scrum Environment and used Jenkins, GitHub for Continuous Integration and Deployment. Projects also have other application integration to BI-DARTT.
- Managing and reviewing Hadoop and HBase log files.
- Worked with Nifi for managing the flow of data from source to HDFS.
- Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios. Monitored workload, job performance and capacity planning using Ambari.
- Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs.
- Implemented FIFO schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
- Involved in Data audit Cloudera manager and engaging in services and monitoring information
- Development web service using Windows Communication Foundation and .Net to receive and process XML files and deploy on Cloud Service on Microsoft Azure.
Environment: Cloudera 7.7, HDFS, HBase, Sqoop, Flume, Zoo keeper, Kerberos, cluster health, RedHat Linux, Impala, Azure, Elasticsearch, Kibana Hortonwork 2.5, Chef, Puppet, Ambari, NiFi Kafka Cassandra, Ganglia and Agile/scrum.
Hadoop Admin
Confidential, San Jose, CA
Responsibilities:
- I was involved in projects that deals with administration, maintenance and support of database, BI and data warehousing system for Hadoop Administration and DevOps related daily activities.
- Involved in start to end process of Hadoop cluster setup which includes Configuring and Monitoring the Hadoop Cluster.
- Experience on working with Hortonworks Sandbox distribution and its various versions HDP 2.4.0, HDP 2.5.0.
- Installation and configuration of Hortonworks distribution HDP 2.2.x/2.3.x with Ambari.
- Used Kafka to allow a single cluster to serve as the central data backbone for a large organization.
- Developed and designed system to collect data from multiple portals using Kafka and then process it using spark.
- Maintaining the Operations, installations, configuration of 150+ node clusters with MapR distribution.
- Set up Hortonworks Infrastructure from configuring clusters to Node.
- Used Cronjob to backup Hadoop Service databases to S3 buckets.
- Installed and configured Drill, Fuse and Impala on MapR-5.1.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Experienced in provisioning and managing multi-datacenter Cassandra cluster on public cloud environment Amazon Web Services (AWS) - EC2.
- Spin up EMR clusters with required EC2 Instance types understanding job type and data size.
- Involved in upgrading Hadoop Cluster from HDP 2.5 to HDP 2.6.
- Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration, and installation on Kafka.
- Created system security supporting multi-tier software delivery system by utilizing Active Directory and Kerberos.
- Created the AWS VPC network for the Installed Instances and configured the Security Groups and Elastic IP's accordingly
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggy Banks and other sources.
- Involved Configuration and installation of Couchbase 2.5.1 NoSQL instances on AWS EC2 instances (Amazon Web Services) of RHEL 6/7, Creation of buckets, documents, loading the documents, backups and recovery, sizing the Couchbase.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Deployed a Hadoop cluster using cdh4 integrated with Nagios and Ganglia.
- Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with cdh4.
- Implemented MapR token based security.
- Worked on creation of custom Docker container images, tagging and pushing the images.
- Configured Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from many different sources to MapR.
- Worked with Nifi for managing the flow of data from source to HDFS. Worked with Nifi for managing the flow of data from source to HDFS.
- Built quick reports/dashboards from internal MSTR data sources like Intelligent Cubes using MicroStrategy Visual HD Insights.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Efficient in SCALA Programming on top of Spark Frameworks.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Monitored workload, job performance and capacity planning using the Cloudera Manager Interface.
- Wrote Lambda functions in python for AZURE Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Created and maintained Technical documentation for launching HADOOP Clusters and for Monitoring Hive, Yarn, Impala queries.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Load Balancing HDFS block data.
- Good knowledge in adding security to the cluster using Kerberos and Sentry.
- Secure Hadoop clusters and CDH applications for user authentication and authorization using Kerberos deployment.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Implemented APACHE IMPALA for data processing on top of HIVE.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system. Worked with NoSQL database Hbase to create tables and store data.
- Scheduled jobs using OOZIE workflow.
- Day to day monitoring and maintenance of couchbase clusters in Production
- Installed and configured RHEL6 EC2 instances for Production, QA and Development environment.
- Installed MIT Kerberos for authentication of application and Hadoop service users.
- Used Cronjob to backup Hadoop Service databases to S3 buckets.
- Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading.
- Supported technical team in management and review of Hadoop logs.
- Experience in managing the Hadoop MapR infrastructure with MCS.
- Used AWS S3 and Local Hard Disk as underlying File System (HDFS) for Hadoop.
- Configured CDH Dynamic Resource Pools to schedule and allocate resources to YARN applications.
- Created Cluster utilization reports for capacity planning and tuning resource allocation for YARN Jobs.
- Used Cloudera Navigator for data governance: Audit and Linage.
- Managed Kubernetes charts using Helm and created reproducible builds of the Kubernetes applications, managed Kubernetes manifest files and managed releases of Helm packages. Monitoring performance and tuning configuration of services in Hadoop Cluster.
- Design and Configure the Cluster with the services required (HDFS, Hive, Hbase, Oozie, and Zookeeper).
- Involved in doing AGILE (SCRUM) practices and planning of sprint attending daily AGILE (SCRUM) meetings and SPRINT retrospective meetings to produce quality deliverables within time.
- Experience in managing and analyzing Hadoop log files to look troubleshooting issues.
- Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
Environment: Hadoop, Cloudera, Spark Hive, HBase, SQL, NiFi Flume, Kafka, Spark Oozie and Sqoop, Linux, MapReduce, HDFS, Teradata Splunk, Kubernetes, Agile/Scrum, MapR, Java, Jenkins, Azure, GitHub, MySQL, Hortonwork, MapR NoSQL, MongoDB, Shell Script, python.
Hadoop Administrator
Confidential, Philadelphia, PA
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs.
- Deployed a Hadoop cluster and integrated with Nagios and Ganglia.
- Optimized the Cassandra cluster by making changes in Cassandra properties and Linux (Red Hat) OS configurations.
- Extensively involved in cluster capacity planning, Hardware planning, Installation, Performance tuning of the Hadoop cluster.
- Created HD Insight cluster in Azure (Microsoft Specific tool) was part of the deployment and Component unit testing using Azure Emulator.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Extracted files from CouchDB through Sqoop and placed in HDFS and processed.
- I was involved in loading and transmitting data into HDFS and Hive using Sqoop and Kafka.
- Highly capable in scheduling jobs with Oozie scheduler.
- Knowledge on bootstrapping, removing, replicating the nodes in Cassandra and Solr clusters.
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name node recovery, Capacity planning, Cassandra and slots configuration.
- Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters on public cloud environment Amazon Web Services (AWS) - EC2 and on private cloud infrastructure.
- Monitored multiple clusters environments using Metrics and Nagios.
- Worked on the MapR clusters and fine-tuned them to run spark jobs efficiently
- Involved in creating the Azure Services with Azure Virtual Machine.
- Experienced in providing security for Hadoop Cluster with Kerberos.
- Dumped the data from MYSQL database to HDFS and vice-versa using SQOOP.
- Used Ganglia and Nagios to monitor the cluster around the clock.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Developed Pig UDFs to pre-process data for analysis
- Worked with business teams and created Hive queries for ad hoc access.
- Responsible for creating Hive tables, partitions, loading data and writing hive queries.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Configured Zoo keeper to implement node coordination, in clustering support.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Worked on analyzing Data with HIVE and PIG.
Environment: Hadoop, Hive, AWS, Flume, HDFS, Spark, Kafka, Hive, Sqoop, Oozie, Hadoop Distribution of HortonWorks, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Cassandra.
Linux/Hadoop Admin
Confidential, New York, NY
Responsibilities:
- Designed and implemented complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
- Used Sqoop to migrate data to and from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
- Designed, planned and delivered a proof of concept and business function/division based implementation of a Big Data roadmap and strategy project.
- Design and Configure the Cluster with the services required (HDFS, Hive, Hbase, Oozie, and Zookeeper).
- Involved in running Hadoop jobs for processing millions of records of text data.
- The Hive tables created, as per requirement, were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
- Analyzed Cassandra database and compared with other open-source NoSQL databases to determine which one best suited the current requirements.
- Transformed the data using Hive, Pig for BI team to perform visual analytics, according to the client requirement.Managed and maintained the existing Informatica interfaces: Tidal, Powercenter and OBDC connections etc.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Responsible to manage data coming from different sources.
- Experience with using and setting up Scribe, Flume, Sqoop for data transfer from Data Centers.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Script.
- Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, MySQL, LINUX, Java (jdk1.7), HBase and Big Data.
System/Hadoop Admin
Confidential
Responsibilities:
- Install, Configure and maintain Single-node and Multi-node cluster Hadoop cluster.
- Setup cluster environment for Highly Available systems.
- Hadoop cluster configuration & deployment to integrate with systems hardware in the data center.
- Troubleshooting, diagnosing, tuning, and solving Hadoop issues.
- Monitor a Hadoop cluster and execute routine administration procedures.
- Managing hadoop Services like Namenode, Datanode, Jobtracker, Tasktracker etc.
- Installed Apache Hadoop 2.5.2 and Apache Hadoop 2.3.0 on Linux Dev servers
- Installed PIG, HIVE on multi-node cluster.
- Integrating PIG, Hive, Sqoop on Hadoop.
- Monthly Linux server maintenance, shutting down essential Hadoop name node and data node.
- Made Hadoop cluster secured by implementing Kerberos.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Balancing Hadoop cluster using balancer utilities to spread data across the cluster equally.
- Implemented data ingestion techniques like Pig and Hive on production environment.
- Commissioning and decommissioning of Hadoop nodes.
- Involved in Cluster Capacity planning along with expansion of the existing environment.
- Regular health checkups of the system using Hadoop metrics - Scripted.
- Providing 24X7 support to Hadoop environment.
Environment: Hadoop 1.x and 2.x, MapReduce, HDFS, Hive, SQL, Cloudera Manager, Pig, Sqoop, Oozie, CDH3 and CDH4, Apache Hadoop.
Linux/System Admin
Confidential
Responsibilities:
- Worked as System administration, maintenance and monitoring various day-to-day operations.
- Well-Trained and worked Primarily on RHEL 5.x Operating Systems.
- Experienced in Installation of Linux operating systems, applying Read, Write, and Execute file permission and on File system issues and Disk management.
- Worked through Creating, Managing and modifying the user accounts, groups and access levels on Linux. Worked on package management using RPM and YUM.
- Provided technical support by troubleshooting issues with various Servers on different platforms.
- Notify server owner if there was a failover or crash. Also notify Unix Linux Server Support L3.
- Monitored CPU loads, restart processes, check for file systems.
- Installing, Upgrading and applying patches for UNIX, Red Hat/ Linux, and Windows Servers in a clustered and non-clustered environment.
- Worked on Planning, configuring storage using LVM and applying patches on Linux machines.
- Experienced on creating volume groups and Logical volumes on Linux.
- Worked on Installation and Configuration of SAMBA server, DNS server APACHE server.
- Worked on using tar command for Data Compressing, Backup and recovery.
- Experienced in developing scripts in PERL and SHELL to automate the process, like Preparation of operational testing scripts for Log check, Backup and recovery and Failover.
- Monitored server and application performance and tuned I/O, memory and Installation of SSH and configuring of keys base authentication.
Environment: Linux, Red Hat 5.x, DNS, YUM, RPM, LVM, PERL, SHELL, Samba, Apache, Tomcat Web-Sphere.
