Sr. Hadoop Administrator Resume Irving, TX - Hire IT People

SUMMARY

Around 9 years of overall experience with 5 years of solid Hadoop Administration experience building, operationalizing and managing small to medium Hadoop clusters using distributions like CDH 5.x, 4.x, HDP2.x, EMR
Expertise building Cloudera, Hortonworks Hadoop clusters on bare metal and Amazon EC2 cloud.
Experience in using various Hadoop services such as MapReduce, HDFS, Hive, Pig, Zookeeper, HBase, Sqoop, YARN, Spark, Kafka, Oozie, and Flume.
Experience Capacity Planning, validating hardware and software requirements, building and configuring small, medium size clusters, smoke testing, managing and performance tuning the Hadoop clusters
Upgrading (Rolling, Express) expertise using Ambari & Cloudera Manager for clusters
Expertise in implementing Kerberos Security to Hadoop clusters.
Experienced in installation and configuration of Knox.
Successfully implemented Apache Ranger and used Sentry which is the Centralized Authorization framework for security, auditing and management of Hadoop clusters
Enabled HDFS, Hive, HBASE plugins in Ranger and defined policies for core services as part of Ranger Policy Management
Expertise using Talend, Pentaho, Sqoop for ETL operations
Expertise in collecting logs of data from various sources and integrated into HDFS using Flume.
Expertise working on the use cases related to Data Ingestion, Data Analytics, Data Retrieval, Real - time Streaming Data and Sensor Data using large data sets.
Hands on experience on shell scripting.
Extensive Knowledge of Data Ware housing Concepts.
Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
Experience in writing scripts for Automation and used automation tools like puppet and chef.
Experience working in Agile/Waterfall methodologies.
Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
Good communication skills, Interpersonal skills, Team co-ordination and versed with Software Development processes.
Diverse background with fast learning skills and creative analytical skills.

TECHNICAL SKILLS

HDFS, Yarn, Apache Spark, MapReduce, Hive, Impala, Pig, Kafka, NiFi, Phoenix, Zookeeper.
Cassandra, Mongo DB, HBase
Cloudera Manager, Ambari, Nagios
AWS (EC2, S3 & EMR), Rackspace
Chef, Puppet
Solr search, Elastic Search
Linux, Unix Windows
Oracle, MS SQL Server, Netezza, SAS
Shell Scripting, Python, Java, Scala
Git Hub, SVN

PROFESSIONAL EXPERIENCE

Confidential, Irving, TX

Sr. Hadoop Administrator

Responsibilities:

Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, Hortonworks & Cloudera Hadoop Distribution.
Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters on 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
Install OS and administrated Hadoop stack with CDH5.9 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
Experience in innovative, and where possible, automated approaches for system administration tasks.
Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data in Hadoop and NoSQL with data in Oracle Database
Worked on setting up of Hadoop ecosystem & Kafka Cluster on AWS EC2 Instances.
Worked with Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
Designed a workflow that will help the Cloud-transition management decide the correct queries to by run for Google Big Query. (For each and every query executed in Google Big Query cost is applied)
Designed and Implemented MongoDB cloud Manger for Google Cloud
Migrated Data from Oracle & SQL Server Database by Reverse Engineering to MongoDB Database.
Designed and Implemented MongoDB Cloud Manger for Google cloud
Experience in automation of code deployment across multiple cloud providers such as Amazon Web Services, Google Cloud, VMWare and OpenStack
Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
Deploy Kubernetes in both AWS and Google cloud. Setup cluster, replicator. Deploy multiple containers .
Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to Hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
Extensively worked on the ETL mappings, analysis and documentation of OLAP reports
Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created Hive tables on top.

Environment: HBase, Hadoop 2.2.4, Hive, Kerberos, Kafka, YARN, Spark, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, CentOS, RedHat, Windows, Yarn, Sqoop, Cassandra.

Confidential, Chattanooga, TN

Hadoop/BigData Administrator

Responsibilities:

Installation, configuration and Administration of CDH5.x Hadoop cluster
Run the benchmark tools to test the cluster performance
Configure the Hadoop properties based on the benchmark result
Tuning the cluster based on the benchmark results
Monitoring System Metrics and logs for any problems
Currently working as admin in Hortonworks (HDP) distribution for 4 clusters ranges from POC to PROD.
Installation and configuration of Hadoop ecosystem components like HBase, Hive, Pig, Sqoop, Spark, Zookeeper etc. as per requirement.
Provided support to users for diagnosing, reproducing and fixing Hadoop related issues.
Work on CDH environments and took ownership of problem isolation and resolution, and whenever case arises do the bug reporting.
Experienced in installation, configuration, troubleshooting and maintenance of Kafka & Spark clusters.
Experience in setting up Kafka cluster on AWS EC2 Instances.
Expertise building Cloudera, Hortonworks hadoop clusters on bare metal and Amazon EC2 cloud.
Ensure that critical user issues are addressed quickly and effectively.
Apply troubleshooting techniques to provide solutions to our user's individual needs.
Troubleshoot, diagnose and potentially escalate user inquiries during their engineering and operations efforts.
Investigate product related issues both for individual customers and for common trends that may arise.
Setting up new Hadoop users with HDFS maintenance and support. Keeping a track of Hadoop Cluster connectivity and security.
Worked on setting up Apache NiFi and used NiFi in orchestrating data pipeline.

Environment: Cloudera Manager, HDFS, Yarn, Spark, Kafka, Hive, Pig, Sqoop, Kerberos, Centri, NiFi, Hortonworks HDP, Oracle, Netezza, Tableau, Python, Java 8.0, Log4J, GIT, AWS, S3, EC2, JIRA

Confidential, Dallas, TX

Hadoop/BigData Administrator

Responsibilities:

Worked on Capacity Planning, Configuration, Operationalizing, Management small to medium sized BIGDATA Hadoop Clusters
Built 80 node & 20 node HDP Hadoop clusters using Ambari
Supported in administering ~450 nodes HDP Hadoop cluster using Ambari
Defined Hardware and Software prerequisites for small to medium hadoop clusters
Expertise building Cloudera, Hortonworks hadoop clusters on bare metal and Amazon EC2 cloud.
Currently working as admin in Hortonworks (HDP) distribution for 4 clusters ranges from POC to PROD.
Performed Performance tuning of hadoop cluster components: HDFS, MapReduce2, YARN, Hive, HBase,
Performed Cluster Upgrade (HDP 2.2.4.2, to HDP 2.3, Ambari 2.0 to Ambari 2.2.1.x)
Performed backup and monitor of Postgres: Ambari and MySQL: Hive, Oozie, Ranger databases
Performed upgrading the clusters using Rolling Upgrade and Express Upgrade
Enabled services (Kafka, Spark) to suit the real-time data processing requirements for Adworks data sets
Monitor Hadoop cluster job performance and capacity planning
Configure Hadoop security aspects including Kerberos setup and RBAC authorization using Ranger
Installed and configured Knox Gateway to simplify access and enhance security of Hadoop Cluster.
Worked with the business team in understanding and implementing various BIGDATA hadoop use cases and POC's
Performed diagnosing, troubleshooting and fixing the hadoop infrastructure issues
Configured Rack Awareness on HDP clusters
Extensively worked in using DistCp to data transfer Tera bytes of data between the secured and unsecured clusters

Environment: HDP2.3,2.2.4.2/Ambari 2.1.1.x, 2.0, Bash, Ansible, Python, Centos 6.x, Nagios, Graphite, HDFS, Yarn, Kafka, Spark, Hive, Pig, Sqoop, Oozie, Hortonworks HDP, Knox, Kerberos, Ranger, Cassandra, Java 1.7, Log4J, Junit, MRUnit, Git, JIRA

Confidential

Hadoop Administrator

Responsibilities:

Installation, Configuration and Management of Hartonworks HDP Hadoop cluster
Involved in the User Creation and Management in IBD platform for various GE's business units.
Expertise in the validation of Hive, HBase, Sqoop, Flume, Oozie jobs
Worked on Kerberos Security deployment on Hadoop Cluster
Enabled Namenode and high availability in IBD environments.
Used ETL Systems as the Staging Systems for User Management, Data Loading, Data Analytics, Hadoop Cluster access for end users
Implemented enforcing the web User authentication for Namenode, Oozie webconsoles
Coordinate with operational Hadoop support team.
Manage and Review Hadoop log files.
Used GitLab UI with Puppet as the platform to manage the Hadoop users
Defined and implemented the logging retention period for Hadoop components
Address the performance tuning of Hadoop ETL processes against very large data set work directly with statistically on implementing solutions involving predictive analytics.
Develop Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.

Environment: Ambari, Hadoop, Nagios, Zabbix, Spark, Kafka, Storm, Shark, Hive, Pig, Sqoop, MapReduce, Kerberos, Ranger, Kibana, Talend, Oracle, Teradata, SAS, Tableau, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA

Confidential

System Admin/DevOps

Responsibilities:

Responsible for the implementation and on-going administration of Hadoop infrastructure including the installation, configuration and upgrading of Cloudera & Hortonworks distribution of Hadoop.
File system, cluster monitoring, and performance tuning of Hadoop ecosystem
Resolve issues involving MapReduce, Yarn, Sqoop job failures;
Analyze multi-tenancy job execution issues and resolve
Design and manage backup and disaster recovery solution for Hadoop clusters
Work on Unix operating systems to efficiently handle system administration tasks related to Hadoop clusters
Manage the Apache Kafka environment
Participate and manage the data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra and Mongodb
Evaluate the administration and operational practices, and evolve automation procedures (Using scripting languages such as Shell, Python, Chef, Puppet, Ruby etc.)
Worked with data delivery teams to setup new Hadoop users. Includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
Enabling security/authentication using Kerberos and ACL authorizations using Apache Sentry.
Create and document best practices for Hadoop and big data environment
Participate in new data product or new technology evaluations; manage the certification process and evaluate and implement new initiatives in technology and process improvements
Interact with Security Engineering to design solutions, tools, testing and validation for controls
Advance the cloud architecture for data stores;
Work with engineering team with automation;
Help operationalize Cloud usage for databases and for the Hadoop platform
Engage vendors for feasibility of new tools, concepts and features, understand their pros and cons and prepare the team for rollout
Analyze vendor suggestions/recommendations for applicability to environment and design implementation details
Perform short and long term system/database planning and analysis as well as capacity planning
Integrate/collaborate with application development and support teams on various IT projects

Environment: Cloudera CDH, Hortonworks HDP, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Storm, Kerberos, DataGuise, Tableau.

We provide IT Staff Augmentation Services!

Sr. Hadoop Administrator Resume

Irving, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship