- Senior Hadoop Administrator with extensive experience in MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume.
- Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
- Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Extending HIVE and PIG core functionality by using custom UDFs.
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice - versa.
- Experienced in running MapReduce and Spark jobs over YARN.
- Has worked on Hadoop Security with MIT Kerberos, Ranger with LDAP.
- Excellent communication and leadership skills.
- Horton Works
- Map Reduce
- Apache Spark
- Shell Script
- ORACLE 10g
- Web services
- Load Runner
- M Office
Sr. Hadoop Administrator
- Working on a project called Datalake project which is a multitenant platform for Analytics.
- Responsible for Hadoop cluster maintenance, monitoring, commissioning/decommissioning data nodes, troubleshooting, cluster planning.
- Work with the Data Science team to gather requirements for various data mining projects.
- Installed 5 Hadoop clusters for different teams; developed a Data lake which serves as a Base layer to store and do analytics for Developers.
- Built automation frameworks for data ingestion, processing in Python and Scala with NoSQL.
- Involved in implementing security on Hortonworks Hadoop Cluster.
- Responsible for upgrading Hortonworks Hadoop HDP2.2.0 and Mapreduce 2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources; performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
- Manage Amazon Web Services (AWS) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built .designing cloud-hosted solutions, specific AWS product suite experience.
- Performed major upgrade in production environment from HDP 1.3 to HDP 2.2. As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitor multiple Hadoop clusters environments using Ganglia and Nagios. Monitor workload, job performance and capacity planning using Ambari. Install and configure Hortonworks and Cloudera distributions on single node clusters for POCs.
- Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
- Work with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, HBase, DEVOPS, ANT, Maven, Chef, Puppet, Devops, Jenkins, Clear case.
- Worked on a live Big Data Hadoop production environment with 200 nodes.
- Worked in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Developed Use cases and Technical prototyping for implementing PIG, HDP, HIVE and HBASE.
- Analyzed the alternatives for NOSQL Data stores and intensive documentation for HBASE vs. Accumulo data stores.
- Communicated with developers using in-depth knowledge of Cassandra Data Modeling for converting some of the applications to use Cassandra instead of Oracle.
- Responsible for design and development of Big Data applications using Hortonworks Hadoop.
- Worked on data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Maintained and troubleshot Hadoop core and ecosystem components (HDFS, Map/Reduce, Name node, Data node, Job tracker, Task tracker, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, and Fair Scheduler).
- Hands on experience installing, configuring, administering, debugging and troubleshooting Apache and Datastax Cassandra clusters.
- Led the evaluation of Big Data software like Splunk.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs, to enable healthy operation of Map reduce jobs to push the data from SQL to Nosql store.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Environment: Hadoop, Map Reduce, HDFS, Pig, GIT, JENKINS, Puppet, Chef, Maven Spark, Yarn, HBase, CDH 5.4, Oozie, MapR, NoSQL, ETL, MYSQL, Windows, Shell Scripting, Teradata.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Visualized the HDFS data to customer using BI tool with the help of HIVE ODBC Driver.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked big data processing of clinical and non-clinical data using MapR.
- Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Responsible for importing log files from various sources into HDFS using Flume.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Used Hive and Pig to generate BI reports.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats.
- Experienced with different kind of compression techniques like LZO, GZIP, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
Environment: Hadoop, HDFS, HBase, MongoDB, MapReduce, Java, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, MySQL.
- Installed and configured and maintained Veritas Net Backup 6.0/5.x.
- Installed and customized Window 2003 servers.
- Configured and Administered NFS, NIS, NIS+, LDAP, DNS, Samba and Sendmail Servers.
- Working knowledge of VMware (Virtualization).
- Upgraded VMware server 2.x to 3.x.
- Oracle installation & system level support to clients.
- Installed and configured the iPlanet (Sun One) Web servers & setup firewall filtering with Squid Proxy server for web caching on Sun Solaris.
- Wrote GWT code to create presentation layer using GWT widgets and event handlers.
- Used SVN, CVS, and CLEARCASE as a version control tools.
- Automated build process by writing ANT build scripts.
- Involved in doing AGILE (SCRUM) practices and planning of sprints.
Environment: Redhat Linux, VXFS, Confidential P Series AIX servers, Veritas Volume Manager, Netbackup.
- Developed project user guide documents which help in knowledge transfer to new testers and solution repository document which gives quick resolution of any issues occurred.
- Wrote, optimized, and troubleshot dynamically created SQL within procedures.
- Created database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
- Responsible for creating virtual environments for the rapid development.
- Managed Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
- Identified operational needs of various departments and developing customized software to enhance System's productivity.
- Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
- Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
- Maintained the MySQL server and Authentication.
- Appropriately documented various administrative & technical issues.