- Over 7+ years of experience in IT field including 3 years of experience in Hadoop Administration in diverse industries which includes hands on experience in Big data ecosystem related technologies.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- I have good experience with design, management, configuration and troubleshooting of distributed production environments based on Apache Hadoop/ HBase etc
- Experience in Hadoop Ecosystem including HDFS, Hive, Pig, Hbase, Sqoop and knowledge of Map - Reduce framework.
- Working experience on designing and implementing complete end to end Hadoop Infrastructure.
- Good Experience in Hadoop cluster capacity planning and designing Name Node, Secondary Name Node, Data Node, Job Tracker, Task Tracker.
- Hands on experience in installation, configuration, management and development of big data solutions using Apache, CLOUDERA (CDH3, CDH4) and Hortonworks distributions.
- Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- In-depth knowledge of modifications required in static IP (interfaces), hosts, setting up password-less SSH and Hadoop configuration for Cluster setup and maintenance.
- Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
- In which my responsibilities are collecting information from, and configuring, network devices, such as servers, printers, hubs, switches, and routers on an Internet Protocol (IP) network.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain.
- Extensive experience in data analysis using tools like Syncsort and HZ along with Shell Scripting and UNIX.
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4
Monitoring Tools::Cloudera Manager, Ambari, Nagios, Ganglia
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Programming Languages: C, Java, SQL, and PL/SQL.
Front End Technologies: HTML, XHTML, XML.
Application Servers: Apache Tomcat, Weblogic Server, Websphere
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2.
NoSQL Databases: Hbase, Cassandra, MongoDB
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows 8.
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP.
Confidential - San Ramon, CA
- Worked on setting up Hadoop cluster for the Production Environment.
- Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Installed, configured and deployed a 50 node MapR Hadoop Cluster for Development and Production
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Responsible for Cluster maintenance, commissioning and decommissioning Data nodes, Cluster Monitoring, Troubleshooting, Manage and review data backups, Manage & review Hadoop log files.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Configured, installed, monitored MapR Hadoop on 10 AWS ec2 instances and configured MapR on Amazon EMR making AWS S3 as default filesystem for the cluster
- Involved in architecting Hadoop clusters using major Hadoop Distributions - CDH3 & CDH4.
- Monitoring systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
- Used Informatica Power Center to create mappings, mapplets, User defined functions, workflows, worklets, sessions and tasks.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
- Used Informatica Data Explorer (IDE) to find hidden data problems.
- Utilized Informatica Data Explorer (IDE) to analyze legacy data for data profiling.
- Development of Informatica mappings and workflows using Informatica 7.1.1.
- Worked on Identifying and eliminating duplicates in datasets thorough IDQ 8.6.1 components.
- Optimized the full text search function by connecting MongoDB and ElasticSearch.
- Utilized AWS framework for content storage and ElasticSearch for document search.
- Developed a framework for the automation testing on the ElasticSearch index Validation. Java, MySQL.
- Created User defined types to store specialized data structures in Cloudera.
- Wrote a technical paper and created slideshow outlining the project and showing how Cloudera can be potentially used to improve performance.
- Setting up monitoring tools for Hadoop monitoring and alerting. Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper.
- Write scripts to automate application deployments and configurations. Hadoop cluster performance tuning and monitoring. Troubleshoot and resolve Hadoop cluster related system problems.
- As a admin followed standard Back up policies to make sure the high availability of cluster.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
- Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
- Screen Hadoop cluster job performances and capacity planning.
- Monitored Hadoop cluster connectivity and security and also involved in management and monitoring Hadoop log files.
- Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.
Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, mySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Cloudera, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.
- Working on multiple projects spanning from Architecting Hadoop Clusters, Installation, Configuration and Management of Hadoop Cluster.
- Designed and developed Hadoop system to analyze the SIEM (Security Information and Event Management) data using MapReduce, HBase, Hive, Sqoop and Flume.
- Developed custom writable MapReduce JAVA programs to load web server logs into HBase using flume.
- Worked on Hadoop CDH upgrade from CDH3.x to CDH4.x
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like MapReduce, Pig, Hive, Sqoop) as well as system specific jobs.
- Developed entire data transfer model using Sqoop framework.
- Integrated Kafka with Flume in sand box Environment using Kafka source and Kafka sink.
- Configured flume agent with flume syslog source to receive the data from syslog servers.
- Implemented the Hadoop Name-node HA services to make the Hadoop services highly available.
- Exporting data from RDBMS to HIVE, HDFS and HIVE, HDFS to RDBMS by using SQOOP.
- Installed and managed multiple hadoop clusters - Production, stage, development.
- Installed and managed production cluster of 150 Node cluster with 4+ PB.
- Performance tuning for infrastructure and Hadoop settings for optimal performance of jobs and their throughput.
- Involved in analyzing system failures, identifying root causes, and recommended course of actions and lab clusters.
- Designed the Cluster tests before and after upgrades to validate the cluster status.
- Regular Maintenance of Commissioned/decommission nodes as disk failures occur using Cloudera Manager.
- Documented and prepared run books of systems processes and procedures for future references.
- Performed Benchmarking and performance tuning on the Hadoop infrastructure.
- Automated data loading between production and disaster recovery cluster.
- Migrated hive schema from production cluster to DR cluster.
- Worked on Migrating application by doing Poc's from relation database systems.
- Helping users and teams with incidents related to administration and development.
- Onboarding and training on best practices for new users who are migrated to our clusters.
- Guide users in development and work with developers closely for preparing a data lake.
- Migrated data from SQL Server to HBase using Sqoop.
- Log data Stored in HBase DB is processed and analyzed and then imported into Hive warehouse, which enabled end business analysts to write HQL queries.
- Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF's in Hive querying.
- Created Hive external tables for loading the parse data using partitions.
- Developed various workflows using custom MapReduce, Pig, Hive and scheduled them using Oozie.
- Responsible for Installing, setup and Configuring Apache Kafka and Apache Zookeeper.
- Extensive knowledge in troubleshooting code related issues.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Auto Populate Hbase tables with data.
- Designed and coded application components in an agile environment utilizing test driven development approach.
Environment: Hadoop, HDFS, Map Reduce, Shell Scripting, spark, Splunk, solr, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zoo keeper, Base, cluster health, monitoring security, Redhat Linux, impala, Cloudera Manager, Hortonworks.
- Day - to-day administration on Sun Solaris, RHEL 4/5 which includes Installation, upgrade & loading patch management & packages
- Responsible for monitoring overall project and reporting status to stakeholders.
- Developed project user guide documents which help in knowledge transfer to new testers and solution repository document which gives quick resolution of any issues occurred in the past thereby reducing the number of invalid defects.
- Identify repeated issues in production by analyzing production tickets after each release and strengthen the system testing process to arrest those issues moving to production to enhance customer satisfaction
- Designed and coordinated creation of Manual Test cases according to requirement and executed them to verify the functionality of the application.
- Manually tested the various navigation steps and basic functionality of the Web based applications.
- Experience interpreting physical database models and understanding relational database concepts such as indexes, primary and foreign keys, and constraints using Oracle.
- Writing, optimizing, and troubleshooting dynamically created SQL within procedures
- Creating database objects such as Tables, Indexes, Views, Sequences, Primary and Foreign keys, Constraints and Triggers.
- Responsible for creating virtual environments for the rapid development.
- Responsible for handling the tickets raised by the end users which includes installation of packages, login issues, access issues User management like adding, modifying, deleting, grouping
- Responsible for preventive maintenance of the servers on monthly basis. Configuration of the RAID for the servers. Resource management using the Disk quotas.
- Responsible for change management release scheduled by service providers.
- Generating the weekly and monthly reports for the tickets that worked on and sending report to the management.
- Managing Systems operations with final accountability for smooth installation, networking, and operation, troubleshooting of hardware and software in LINUX environment.
- Identifying operational needs of various departments and developing customized software to enhance System's productivity.
- Established/implemented firewall rules, Validated rules with vulnerability scanning tools.
- Proactively detecting Computer Security violations, collecting evidence and presenting results to the management.
- Accomplished System/e-mail authentication using LDAP enterprise Database.
- Implemented a Database enabled Intranet web site using LINUX, Apache, MySQL Database backend.
- Installed Cent OS using Pre-Execution environment boot and Kick-start method on multiple servers. Monitoring System Metrics and logs for any problems.
- Running Cron-tab to back up Data. Applied Operating System updates, patches and configuration changes.
Environment: Windows 2008/2007 server, Unix Shell Scripting, SQL Manager Studio, Red Hat Linux, Microsoft SQL Server 2000/2005/2008, MS Access, NoSQL, Linux/Unix, Putty Connection Manager, Putty, SSH.