Sr. Hadoop Administrator Resume
Irving, TX
SUMMARY
- Around 9 years of overall experience with 5 years of solid Hadoop Administration experience building, operationalizing and managing small to medium Hadoop clusters using distributions like CDH 5.x, 4.x, HDP2.x, EMR
- Expertise building Cloudera, Hortonworks Hadoop clusters on bare metal and Amazon EC2 cloud.
- Experience in using various Hadoop services such as MapReduce, HDFS, Hive, Pig, Zookeeper, HBase, Sqoop, YARN, Spark, Kafka, Oozie, and Flume.
- Experience Capacity Planning, validating hardware and software requirements, building and configuring small, medium size clusters, smoke testing, managing and performance tuning the Hadoop clusters
- Upgrading (Rolling, Express) expertise using Ambari & Cloudera Manager for clusters
- Expertise in implementing Kerberos Security to Hadoop clusters.
- Experienced in installation and configuration of Knox.
- Successfully implemented Apache Ranger and used Sentry which is the Centralized Authorization framework for security, auditing and management of Hadoop clusters
- Enabled HDFS, Hive, HBASE plugins in Ranger and defined policies for core services as part of Ranger Policy Management
- Expertise using Talend, Pentaho, Sqoop for ETL operations
- Expertise in collecting logs of data from various sources and integrated into HDFS using Flume.
- Expertise working on the use cases related to Data Ingestion, Data Analytics, Data Retrieval, Real - time Streaming Data and Sensor Data using large data sets.
- Hands on experience on shell scripting.
- Extensive Knowledge of Data Ware housing Concepts.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Experience in writing scripts for Automation and used automation tools like puppet and chef.
- Experience working in Agile/Waterfall methodologies.
- Experience in monitoring and troubleshooting issues with Linux memory, CPU, OS, storage and network.
- Good communication skills, Interpersonal skills, Team co-ordination and versed with Software Development processes.
- Diverse background with fast learning skills and creative analytical skills.
TECHNICAL SKILLS
- HDFS, Yarn, Apache Spark, MapReduce, Hive, Impala, Pig, Kafka, NiFi, Phoenix, Zookeeper.
- Cassandra, Mongo DB, HBase
- Cloudera Manager, Ambari, Nagios
- AWS (EC2, S3 & EMR), Rackspace
- Chef, Puppet
- Solr search, Elastic Search
- Linux, Unix Windows
- Oracle, MS SQL Server, Netezza, SAS
- Shell Scripting, Python, Java, Scala
- Git Hub, SVN
PROFESSIONAL EXPERIENCE
Confidential, Irving, TX
Sr. Hadoop Administrator
Responsibilities:
- Experience in architecting, designing, installation, configuration and management of Apache Hadoop Clusters, Hortonworks & Cloudera Hadoop Distribution.
- Responsible for installing, configuring, supporting and managing of Cloudera Hadoop Clusters.
- Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion SOLR and HBase for and real-time querying
- Experienced in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters on 200+ nodes in different environments such as Development, Test and Production (Operational & Analytics) environments.
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberized environments.
- Involved in deploying a Hadoop cluster using Hortonworks Ambari HDP 2.2 integrated with Sitescope for monitoring and Alerting.
- Install OS and administrated Hadoop stack with CDH5.9 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Experience in managing the Hadoop cluster with IBM Big Insights, Hortonworks Distribution Platform.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Worked on Installing Cloudera Manager, CDH and install the JCE Policy File to Create a Kerberos Principal for the Cloudera Manager Server, Enabling Kerberos Using the Wizard.
- Experience in innovative, and where possible, automated approaches for system administration tasks.
- Experience on Ambari (Hortonworks) for management of Hadoop Ecosystem.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Working on Oracle Big Data SQL. Integrate big data analysis into existing applications
- Using Oracle Big Data Appliance Hadoop and NoSQL processing and also integrating data in Hadoop and NoSQL with data in Oracle Database
- Worked on setting up of Hadoop ecosystem & Kafka Cluster on AWS EC2 Instances.
- Worked with Different Relational Database systems like Oracle/PL/SQL. Used Unix Shell scripting, Python and Experience working on AWS EMR Instances.
- Designed a workflow that will help the Cloud-transition management decide the correct queries to by run for Google Big Query. (For each and every query executed in Google Big Query cost is applied)
- Designed and Implemented MongoDB cloud Manger for Google Cloud
- Migrated Data from Oracle & SQL Server Database by Reverse Engineering to MongoDB Database.
- Designed and Implemented MongoDB Cloud Manger for Google cloud
- Experience in automation of code deployment across multiple cloud providers such as Amazon Web Services, Google Cloud, VMWare and OpenStack
- Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing. Mentored EQM team for creating Hive queries to test use cases.
- Deploy Kubernetes in both AWS and Google cloud. Setup cluster, replicator. Deploy multiple containers .
- Sqoop configuration of JDBC drivers for respective relational databases, controlling parallelism, controlling distchache, controlling import process, compression codecs, importing data to Hive, HBase, incremental imports, configure saved jobs and passwords, free form query option and trouble shooting.
- Collection and aggregation of large amounts of streaming data into HDFS using Flume Configuration of Multiple Agents, Flume Sources, Sinks, Channels, Interceptors defined channel selectors to multiplex data into different sinks and log4j properties
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports
- Monitoring the health of the cluster and setting up alert scripts for memory usage on the edge nodes.
- Experience on Linux systems administration on production and development servers (Red Hat Linux, Cent OS and other UNIX utilities). Worked on NoSQL database like HBase and created Hive tables on top.
Environment: HBase, Hadoop 2.2.4, Hive, Kerberos, Kafka, YARN, Spark, Impala, SOLR, Java Hadoop cluster, HDFS, Ambari, Ganglia, CentOS, RedHat, Windows, Yarn, Sqoop, Cassandra.
Confidential, Chattanooga, TN
Hadoop/BigData Administrator
Responsibilities:
- Installation, configuration and Administration of CDH5.x Hadoop cluster
- Run the benchmark tools to test the cluster performance
- Configure the Hadoop properties based on the benchmark result
- Tuning the cluster based on the benchmark results
- Monitoring System Metrics and logs for any problems
- Currently working as admin in Hortonworks (HDP) distribution for 4 clusters ranges from POC to PROD.
- Installation and configuration of Hadoop ecosystem components like HBase, Hive, Pig, Sqoop, Spark, Zookeeper etc. as per requirement.
- Provided support to users for diagnosing, reproducing and fixing Hadoop related issues.
- Work on CDH environments and took ownership of problem isolation and resolution, and whenever case arises do the bug reporting.
- Experienced in installation, configuration, troubleshooting and maintenance of Kafka & Spark clusters.
- Experience in setting up Kafka cluster on AWS EC2 Instances.
- Expertise building Cloudera, Hortonworks hadoop clusters on bare metal and Amazon EC2 cloud.
- Ensure that critical user issues are addressed quickly and effectively.
- Apply troubleshooting techniques to provide solutions to our user's individual needs.
- Troubleshoot, diagnose and potentially escalate user inquiries during their engineering and operations efforts.
- Investigate product related issues both for individual customers and for common trends that may arise.
- Setting up new Hadoop users with HDFS maintenance and support. Keeping a track of Hadoop Cluster connectivity and security.
- Worked on setting up Apache NiFi and used NiFi in orchestrating data pipeline.
Environment: Cloudera Manager, HDFS, Yarn, Spark, Kafka, Hive, Pig, Sqoop, Kerberos, Centri, NiFi, Hortonworks HDP, Oracle, Netezza, Tableau, Python, Java 8.0, Log4J, GIT, AWS, S3, EC2, JIRA
Confidential, Dallas, TX
Hadoop/BigData Administrator
Responsibilities:
- Worked on Capacity Planning, Configuration, Operationalizing, Management small to medium sized BIGDATA Hadoop Clusters
- Built 80 node & 20 node HDP Hadoop clusters using Ambari
- Supported in administering ~450 nodes HDP Hadoop cluster using Ambari
- Defined Hardware and Software prerequisites for small to medium hadoop clusters
- Expertise building Cloudera, Hortonworks hadoop clusters on bare metal and Amazon EC2 cloud.
- Currently working as admin in Hortonworks (HDP) distribution for 4 clusters ranges from POC to PROD.
- Performed Performance tuning of hadoop cluster components: HDFS, MapReduce2, YARN, Hive, HBase,
- Performed Cluster Upgrade (HDP 2.2.4.2, to HDP 2.3, Ambari 2.0 to Ambari 2.2.1.x)
- Performed backup and monitor of Postgres: Ambari and MySQL: Hive, Oozie, Ranger databases
- Performed upgrading the clusters using Rolling Upgrade and Express Upgrade
- Enabled services (Kafka, Spark) to suit the real-time data processing requirements for Adworks data sets
- Monitor Hadoop cluster job performance and capacity planning
- Configure Hadoop security aspects including Kerberos setup and RBAC authorization using Ranger
- Installed and configured Knox Gateway to simplify access and enhance security of Hadoop Cluster.
- Worked with the business team in understanding and implementing various BIGDATA hadoop use cases and POC's
- Performed diagnosing, troubleshooting and fixing the hadoop infrastructure issues
- Configured Rack Awareness on HDP clusters
- Extensively worked in using DistCp to data transfer Tera bytes of data between the secured and unsecured clusters
Environment: HDP2.3,2.2.4.2/Ambari 2.1.1.x, 2.0, Bash, Ansible, Python, Centos 6.x, Nagios, Graphite, HDFS, Yarn, Kafka, Spark, Hive, Pig, Sqoop, Oozie, Hortonworks HDP, Knox, Kerberos, Ranger, Cassandra, Java 1.7, Log4J, Junit, MRUnit, Git, JIRA
Confidential
Hadoop Administrator
Responsibilities:
- Installation, Configuration and Management of Hartonworks HDP Hadoop cluster
- Involved in the User Creation and Management in IBD platform for various GE's business units.
- Expertise in the validation of Hive, HBase, Sqoop, Flume, Oozie jobs
- Worked on Kerberos Security deployment on Hadoop Cluster
- Enabled Namenode and high availability in IBD environments.
- Used ETL Systems as the Staging Systems for User Management, Data Loading, Data Analytics, Hadoop Cluster access for end users
- Implemented enforcing the web User authentication for Namenode, Oozie webconsoles
- Coordinate with operational Hadoop support team.
- Manage and Review Hadoop log files.
- Used GitLab UI with Puppet as the platform to manage the Hadoop users
- Defined and implemented the logging retention period for Hadoop components
- Address the performance tuning of Hadoop ETL processes against very large data set work directly with statistically on implementing solutions involving predictive analytics.
- Develop Hadoop monitoring processes (capacity, performance, consistency) to assure processing issues are identified and resolved swiftly.
Environment: Ambari, Hadoop, Nagios, Zabbix, Spark, Kafka, Storm, Shark, Hive, Pig, Sqoop, MapReduce, Kerberos, Ranger, Kibana, Talend, Oracle, Teradata, SAS, Tableau, Java 7.0, Log4J, Junit, MRUnit, SVN, JIRA
Confidential
System Admin/DevOps
Responsibilities:
- Responsible for the implementation and on-going administration of Hadoop infrastructure including the installation, configuration and upgrading of Cloudera & Hortonworks distribution of Hadoop.
- File system, cluster monitoring, and performance tuning of Hadoop ecosystem
- Resolve issues involving MapReduce, Yarn, Sqoop job failures;
- Analyze multi-tenancy job execution issues and resolve
- Design and manage backup and disaster recovery solution for Hadoop clusters
- Work on Unix operating systems to efficiently handle system administration tasks related to Hadoop clusters
- Manage the Apache Kafka environment
- Participate and manage the data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra and Mongodb
- Evaluate the administration and operational practices, and evolve automation procedures (Using scripting languages such as Shell, Python, Chef, Puppet, Ruby etc.)
- Worked with data delivery teams to setup new Hadoop users. Includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Enabling security/authentication using Kerberos and ACL authorizations using Apache Sentry.
- Create and document best practices for Hadoop and big data environment
- Participate in new data product or new technology evaluations; manage the certification process and evaluate and implement new initiatives in technology and process improvements
- Interact with Security Engineering to design solutions, tools, testing and validation for controls
- Advance the cloud architecture for data stores;
- Work with engineering team with automation;
- Help operationalize Cloud usage for databases and for the Hadoop platform
- Engage vendors for feasibility of new tools, concepts and features, understand their pros and cons and prepare the team for rollout
- Analyze vendor suggestions/recommendations for applicability to environment and design implementation details
- Perform short and long term system/database planning and analysis as well as capacity planning
- Integrate/collaborate with application development and support teams on various IT projects
Environment: Cloudera CDH, Hortonworks HDP, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Storm, Kerberos, DataGuise, Tableau.