We provide IT Staff Augmentation Services!

Hadoop (cloudera) Admin Resume

5.00/5 (Submit Your Rating)

St Louise, MO

SUMMARY:

  • 8 years of experience with proven expertise in system development activities including requirement analysis, design, implementation and supporting with emphasis on Hadoop (HDFS, Map Reduce, Pig, Hive, Hbase, Oozie, Flume, Sqoop, Solr, Storm, AztiveMQ, Kafka and Zookeeper) technologies and Object Oriented, SQL.
  • Strong exposure in Bigdata architecture and effectively managed and monitored the Hadoop eco systems.
  • Build, deploy and management of large scale Hadoop based data Infrastructure.
  • Capacity planning and Architecture setup for Bigdata applications.
  • Strong exposure in Automation of maintenance tasks in Bigdata environment through Cloudera Manager API.
  • Having good knowledge of Oracle9i, 10g, 11g as Database and excellent in writing the SQL queries and scripts.
  • Worked on setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine - grained access to AWS resources to users
  • Experience in Building S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
  • Ability to handle a team of developers and co-ordinate smooth delivery of the project.
  • Cloudera certified administrator for Apache Hadoop (CCAH)
  • Experience in troubleshooting errors in HBase Shell/API, Pig, Hive and MapReduce job failures.
  • Extensive hands on experience in writing complex MapReduce jobs, Pig Scripts and Hive data modeling.
  • Expertise in troubleshooting complex system issues such as high-load, memory and CPU usage and provide solutions based on the root cause.
  • Configured Resource management in Hadoop through dynamic resource allocation.
  • Maintenance and Management of 300+ nodes Hadoop environment with 24x7 on-call support.
  • Cloudera certified administrator for Apache Hadoop (CCAH)
  • Experienced in installing, configuring, and administrating Hadoop cluster of major distributions.
  • Excellent experience in schedulers like Control-M and Tidal schedulers.
  • Hands on experience on ActiveMQ, SQS and Kafka messaging queues.
  • Very good knowledge on Ansible using YAML
  • Working experience on Hortonworks (HDP) and Cloudera distribution.
  • Experience on building dashboards for operations from FS Image to project existing and forecasted data growth.
  • Built various automation plans from operations stand point.
  • Worked with tableau team to build dashboards over Hive data.
  • Had good working experience on Hadoop architecture, HDFS, Map Reduce and other components in the Cloudera - Hadoop eco system.
  • Experience in Importing and Exporting Data between different Database Tables like MySQL, Oracle and HDFS using Sqoop.
  • Hands on Experience in Installing, Configuring and using Hadoop Eco System Components like HDFS, Hadoop Map Reduce, Yarn, Zookeeper, Sentry, Sqoop, Flume, Hive, HBase, Pig, Oozie.
  • Hands on experience on configuring a Hadoop cluster in a professional environment and Amazon Web Service (AWS) using an EC2 instance.
  • Good Experience in writing complex SQL queries with databases like DB2, Oracle 10g, MySQL, SQL Server and MS SQL Server 2005/2008.
  • Extensive Experience in developing test cases, performing Unit Testing and Integration Testing using source code management tools such as GIT, SVN and Perforce.
  • Strong team player, ability to work independently and in a team as well, ability to adapt to a rapidly changing environment, commitment towards learning.
  • Ability to blend technical expertise with strong Conceptual, Business and Analytical skills to provide quality solutions and result-oriented problem solving technique and leadership skills.

TECHNICAL SKILLS:

BIG Data Ecosystem: HDFS, MapReduce, Spark, Pig, Hive, Hbase, sqoop, zookeeper, Sentry, Ranger, Storm, Kafka, Oozie, flume, Apache Flink, Docker, Hue, Knox, NiFi

BIG Data Security: Kerberos, AD, LDAP, KTS, KMS, Redaction, Sentry, Ranger, Navencrypt, SSL/TLS, Cloudera Navigator, Hortonworks

No SQL Databases: HBase, Cassandra, MongoDB

Programming Languages: Java, Scala, Python, SQL, PL/SQL, Hive-QL, Pig Latin

Frameworks: MVC, Struts, Spring, Hibernate

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Version control: SVN, CVS, GIT

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Business Intelligence Tools: Talend, Informatica, Tableau

Databases: Oracle DB2, SQL Server, MySQL, Teradata

Tools: Eclipse, IntelliJ, NetBeans, Toad, Maven, Jenkins, ANT, SBT

Cloud Technologies: Amazon Web Services (Amazon RedShift, S3), Microsoft Azure Insight

Sun Solaris, HP: UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8/10

Other Tools: GitHub, Maven, Puppet, Chef, Clarify, JIRA, Quality Center, Rational Suite of Products, MS Test Manager, TFS, Jenkins, Confluence, Splunk, NewRelic

WORK EXPERIENCE:

Hadoop (Cloudera) Admin

Confidential, St. Louise, MO

Responsibilities:

  • Hadoop installation, Configuration of multiple nodes using Cloudera platform.
  • Worked on setting up new CDH Hadoop cluster for POC purpose and installed third party tools.
  • Strong exposure in Configuration management tools like Ansible for configuration deployment.
  • Strong exposure in Automation of maintenance tasks in Bigdata environment through Cloudera Manager API.
  • Exposure to Cloud based Hadoop deployment using AWS and built Hadoop clusters using EC2 and EMR.
  • Worked on AWS i3's and i4's Benchmarking for an assessment on our Cassandra clusters.
  • Worked on Apache Cassandra Upgrade from 3.0.10 to 3.0.14
  • Adding and Decommissioning Nodes for Clusters scaling.
  • Knowledge about Splunk architecture and various components (indexer, forwarder, search head, deployment server), Universal and Heavy forwarder.
  • Created Dashboards for various types of business users in the organization and worked on creating different Splunk Knowledge objects like Macros, IFX, Calculated fields, Tags, Event Types, and Lookups.
  • Created TalenD Spark jobs, which collects data from regular relational database, and load the data in to HBase
  • Created sample flows in TalenD, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
  • Dev cluster migration from on premise to AWS EMR and benchmark the performance in EMR.
  • Developed spark and Scala transformation jobs to process the data based on the downstream requirements.
  • Developed WTX transformation logic in spark hence saved licensing cost to client.
  • MYSQL replication is configured for High availability and used as external database for CDH services.
  • Used TalenD tool to create workflows for processing data from multiple source systems.
  • Install and configured tools like SAS Viya and Securonix in Hadoop environment.
  • Configured High Availability for Hadoop services and setting up Load Balancers for Bigdata services.
  • Worked on installing and configuring of CDH 5.8, 5.9 and 5.10 Hadoop Cluster on AWS using Cloudera Director.
  • Worked on installing Universal Forwarders and Heavy Forwarders to bring any kind of data fields into Splunk.
  • Planning, Installing and Configuring Hadoop Cluster in Cloudera Distributions.
  • Administration, installing, upgrading and managing distributions of Hadoop (CDH5, Cloudera manager), HBase. Managing, monitoring and troubleshooting Hadoop Cluster using Cloudera Manager.
  • Performed installation and configuration of Hadoop Cluster of 90 Nodes with Cloudera distribution with CDH4.
  • Responsible for maintaining 24x7 production CDH hadoop clusters running spark, hbase, hive, MapReduce with over 300 nodes with multiple petabytes of data storage.
  • Commissioned and Decommissioned nodes on CDH5 Hadoop cluster on Red hat LINUX.
  • Changing the configurations based on the requirements of the users for the better performance of the jobs.
  • Installed and configured application performance management tool like unravel and integrated with CDH Hadoop Cluster.
  • Setup automation scripts to spin and add a new Virtual Edgenode to Hadoop cluster for customers.
  • Deployed a Hadoop cluster using CDH4 integrated with Nagios and Ganglia.
  • Management of CDH cluster with LDAP and Kerberos integrated.
  • Automated scripts for on board access to new users to Hadoop applications and setup Sentry Authorization.
  • Expertise in troubleshooting complex Hadoop job failures and provide solution.
  • Worked with application teams to install Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, CDH, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, AWSYarn, Control-M, HBase, Shell Scripting.

Hadoop(AWS)Administrator

Confidential, St. Louise, MO

Responsibilities:

  • Manage Critical Data Pipelines that power analytics for various business units
  • Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
  • Worked on Performance tuning on Hive SQLs.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS resulted from MR jobs.
  • Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Maintained Hortonworks cluster with HDP Stack 2.4.2 managed by Ambari 2.2.
  • Built a Production and QA Cluster with the latest distribution of Hortonworks - HDP stack 2.6.1 managed by Ambari 2.5.1 on AWS Cloud
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked on Kerberos Hadoop cluster with 250 nodes cluster.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Used Hive and created Hive tables, loaded data from Local file system to HDFS.
  • Experience working on Spark and Scala.
  • Developed Spark SQL jobs that read data from Data Lake using Hive transform and save it in Hbase.
  • Created user accounts and given users the access to the Hadoop cluster.
  • Performed HDFS cluster support and maintenance tasks like adding and removing nodes without any effect to running nodes and data.
  • Monitoring and controlling local file system disk space usage, log files, cleaning log files with automated scripts.
  • Loaded data from different source (database & files) into Hive using TalenD tool.
  • Push data as delimited files into HDFS using TalenD Big data studio.
  • Load and transform data into HDFS from large set of structured data /Oracle/SQL Server using TalenD Big data studio.
  • As a Hadoop admin, monitoring cluster health status on daily basis, tuning system performance related configuration parameters, backing up configuration xml files.
  • Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
  • Involved in moving all log files generated from various sources to HDFS for further processing.
  • Involved in collecting metrics for Hadoop clusters using Ganglia.
  • Supported Data Analysts in running MapReduce Programs.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Responsible for deploying patches and remediating vulnerabilities.
  • Experience in setting up Test, QA, and Prod environment.
  • Involved in loading data from UNIX file system to HDFS.
  • Created root cause analysis (RCA) efforts for the high severity incidents.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
  • Coordinating with On-call Support if human intervention is required for problem solving
  • Make sure that the analytics data is available on-time for the customers which in turn provides them insight and helps them make key business decisions.
  • Aimed Confidential providing a delightful data experience to our customers who are the different business groups across the organization.
  • Worked on Alert mechanism to support production cluster/workflows in effective manner and daily running jobs in effective manner to meet SLA.
  • Involved in providing operational support to the platform and also following best practices to optimize the performance of the environment.
  • Involved in release management process to deploy the code to production.
  • Involved with various teams on and offshore for understanding of the data that is imported from their source.
  • Provided updates in daily SCRUM and self planning on start of sprint and provided the planned task using JIRA. In sync up with team in order to pick priority task and update necessary documentation in WIKI.
  • Weekly meetings with Business partners and active participation in review sessions with other developers and Manager.
  • Documenting the procedures performed for the project development.

Environment: Hadoop, Hive, Pig, Tableau, Netezza, Oracle, HDFS, MapReduce, Yarn, Sqoop, Oozie, Zookeeper, Tidal, CheckMK, Graphana, Vertica

Hadoop Admin

Confidential, Hoffman Estates, IL

Responsibilities:

  • Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Pig, Zookeeper and Sqoop
  • Wrote Pig scripts to load and aggregate the data.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hbase database and Sqoop
  • Performed Splunk administration tasks such as installing, configuring, monitoring, and tuning.
  • Extensively involved in Installation and configuration of Cloudera distribution Hadoop NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
  • Importing And Exporting Data from MySQL/Oracle to HiveQL using SQOOP.
  • Worked on installing cluster, adding and removing of DataNodes
  • Responsible for operational support of Production system
  • Administer and configure Splunk components like Indexer, Search Head, Heavy forwarder etc.; deploy Splunk across the UNIX and Windows environment; Optimized Splunk for peak performance by splitting Splunk indexing and search activities across different machines.
  • Setup Splunk forwarders for new application tiers introduced into an existing application.
  • Experience in working with Splunk authentication and permissions and having significant experience in supporting large-scale Splunk deployments.
  • Onboarding of new data into Splunk. Troubleshooting Splunk and optimizing performance.
  • Actively involved in standardizing Splunk Forwarder deployment, configuration, and maintenance across various Operating Systems.
  • Analyzing the source data to know the quality of data by using Talend Data Quality.
  • Experienced in using debug mode of talend to debug a job to fix errors.
  • Load and transform large sets of structured, semi structured and unstructured data
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Installed and configured Hive.
  • Load the data into Spark RDD and data frames, different segmented RDD's are joined to produce logical data, then performed dedupe logic, final output stored in to HDFS and exposed using Hive external table.
  • Environment: MapR, Hadoop, HDFS, Sqoop, HBase, Hive, SQL, Oracle Talend, TAC, bash shell, Spark, Scala.
  • Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml and hadoop-env.xml based upon the job requirement.
  • Troubleshoot Map/Reduce jobs.
  • Design/Build a non Vnode Cassandra Ring for a Service assurance application on VM's for non-prod and Physical machines for a Production Ring.
  • Administered and Maintain Kafka Cluster as part of Cassandra integration.
  • Build bash scripts for Proactive monitoring on the Cassandra cluster by moving the Cassandra Mbeans to a monitoring Tool Cacti and setup the alerts for tasks like Threadpools/Read/Write Latencies/Compaction Statistics
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Loading log data directly into HDFS using Flume
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Balanced cluster after adding/removing nodes or major data cleanup
  • Created and modified scripts (mainly bash) to accommodate the administration of daily duties.
  • Generate datasets and load to Hadoop ecosystem
  • Involved in creating Hive tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Cluster co-ordination services through ZooKeeper.
  • Used Hive and Pig to analyze data from HDFS
  • Wrote Pig scripts to load and aggregate the data
  • Used Sqoop to import the data into SQL Database.
  • Used Java to develop User Defined Functions (UDF) for Pig Scripts.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie

Hadoop Admin

Confidential, San Francisco, CA

Responsibilities:

  • Involved in Installing, Configuring Hadoop Eco System and Cloudera Manager using CDH4.
  • Good understanding and related experience with Hadoop stack - internals, Hive, Pig and MapReduce, involved in defining job flows.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Load large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Implemented HBase Co-processors to notify Support team when inserting data into HBase Tables.
  • Solved small file problem using Sequence files processing in Map Reduce.
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.
  • Performed cluster co-ordination through Zookeeper.
  • Involved in support and monitoring production Linux Systems.
  • Expertise in Archive logs and monitoring the jobs.
  • Monitoring Linux daily jobs and monitoring log management system.
  • Expertise in troubleshooting and able to work with a team to fix large production issues.
  • Expertise in creating and managing DB tables, Index and Views.
  • User creation and managing user accounts and permissions on Linux level and DB level.
  • Extracted large data sets from different sources with different data-source formats which include relational databases, XML and flat files using ETL extra processing.

Environment: Cloudera Distribution CDH 4.1, Apache Hadoop, Hive, MapReduce, HDFS, PIG, ETL, HBase, Zookeeper

Hadoop(Linux) Administrator

Confidential

Responsibilities:

  • Worked as Administrator for Monsanto's Hadoop Cluster (120 nodes).
  • Performed Requirement Analysis, Planning, Architecture Design and Installation of the Hadoop cluster.
  • Updating kernel & security patches in Amazon linux environment, Handling out of memory issues in Linux kernels during rebalance in Kafka cluster.
  • Suggested and implemented best practices to optimize performance and user experience.
  • Implemented Cluster Security using Kerberos and HDFS ACLs.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Monitoring Hadoop cluster using tools like Nagios, Ganglia and Cloudera Manager.
  • Managed and reviewed Hadoop Log Files.
  • Setup data authorization roles for Hive and Impala using Apache Sentry.
  • Improved the Hive Query performance through Distributed Cache Management and converting tables to ORC format.
  • Configured TLS/SSL based data transport encryption.
  • Monitoring Job performance and doing analysis.
  • Monitoring, troubleshooting and reviewing of Hadoop log files.
  • Adding and decommission of nodes from the cluster.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Transfer the data from HDFS TO MongoDB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
  • Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
  • Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
  • Written Flume configuration files to store streaming data in HDFS.
  • As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
  • Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
  • Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
  • Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
  • Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
  • Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration. Worked on YUM configuration and package installation through YUM.
  • Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
  • Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.

Environment: CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Zookeeper-3.4.5, Hue-2.5.0, Jira, Web Logic 8.1 Kafka, Yarn, Impala, Chef, Rhel, Pig, Scripting, MySQL, Red Hat Linux, CentOS and other UNIX utilities.

Linux Administrator

Confidential

Responsibilities:

  • Worked on daily basis on user access and permissions, Installations and Maintenance of Linux Servers.
  • Installed upgraded packages patches on Redhat servers using Yum, Rpm and third-party application software.
  • Configure, troubleshoot and manage TCP/IP networking on systems
  • Experience in creating, cloning and deleting virtual machines in VMware
  • Install, configure and maintain network services including NFS, FTP, HTTPD, Tomcat, SSH
  • Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot.
  • Monitored System activity, Performance and Resource utilization.
  • Maintained Raid-Groups and LUN Assignments as per agreed design documents.
  • Performed all System administration tasks like cron jobs, installing packages and patches.
  • Used LVM extensively and created Volume Groups and Logical volumes.
  • Performed RPM and YUM package installations, patch and other server management.
  • Configured Linux guests in a VMware ESX environment.
  • Built, implemented and maintained system-level software packages such as OS, Clustering, disk, file management, backup, web applications, DNS.
  • Performed scheduled backup and necessary restoration.
  • Configured Domain Name System (DNS) for hostname to IP resolution.
  • Troubleshot and fixed the issues Confidential User level, System level and Network level by using various tools and utilities. Schedule backup jobs by implementing cron job schedule during non-business hour.

Environment: Hadoop, HDFS, Yarn, Pig, Hive, Sqoop, Oozie, Control-M, HBase, Shell Scripting, AWS, Ubuntu, Linux Red Hat.

We'd love your feedback!