- Over 8 years of experience in Information Technology with a strong background in Database development, Database Management, Deployments, release management, Implementing High Availability, managing very large environments, Application development and Data warehousing
- Experience in implementing Data Warehousing/ETL solutions for different domains like financial, telecom, loyalty, Retail and insurance verticals
- Experience in operating and managing large clusters with 650+ nodes and 4+ Peta Bytes of storage.
- Hands on experience in setting up, configuring Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, IMPALA, OOZIE, HIVE, SQOOP, PIG, SPARK, FLUME, KAFKA, SENTRY services.
- Experience in planning, implementing, testing and documenting the performance benchmarking to Hadoop platform.
- Helped in planning, development and architecture of Hadoop ecosystem.
- Experience in both On - Premises and Cloud space: AWS, GCP, and AZURE.
- Experience with securing Hadoop clusters by implementing Kerberos KDC installation, LDAP Integration, data transport encryption with TLS, and data-at-rest encryption with Cloudera Navigator Encrypt.
- Experience on Design, configure and manage the backup and disaster recovery using Data Replication, Snapshots, and Cloudera BDR utilities.
- Implemented Role based authorization for HDFS, HIVE, and IMPALA using Apache Sentry.
- Good Knowledge on Implementing and using Cluster monitoring tools like Cloudera Manager, Ganglia and Nagios.
- Experienced in implementing and supporting auditing tools like Cloudera Navigator.
- Knowledge on implementing external authentication with Identity providers like Okta, IDP using SAML.
- Experience in implementing High Availability features for services like Namenode, HUE, and IMPALA.
- Hands-on experience in Deploying and using automation tools like PUPPET for cluster configuration management.
- Experience in creating cookbooks/playbooks and documentations for Installation, upgrades and support projects.
- Participated in the application on-boarding meetings along with Application owners, Architects and helps them to identify/review the technology stack, use case and estimation of resource requirements.
- Experience in documenting standard practices and compliance policies.
- Participated and lead the upgrades of CDH4 and CDH5.
- Fix the issues by interacting with dependent and support teams and log the cases based on the priorities.
- Assisted in tuning the performance of the Hadoop ecosystem as well as monitoring.
- Hands on experience in performing functional testing and helps application teams/users to in corporate third party tools with Hadoop environment.
- Experience in analyzing Log files and finding the root cause and then involved in analyzing failures, identifying root causes and taking/recommending course of actions.
- Experience in Data Warehousing and ETL processes.
- Knowledge of integration with Reporting tools like Tableau, Micro-Strategy and Datameer.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing job counters and application logs files.
- Experienced in tuning performance for various services
- Experienced in job scheduling and monitoring tools like Control M, Nagios and Ganglia.
- Additional responsibilities include interacting with offshore team on a daily basis, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
- Good Experience in managing Linux platform servers
- Effective problem solving skills and ability to learn and use new technologies/tools quickly
Big Data Technologies: HDFS, Hive, MapReduce, Cassandra, Pig, Hcatalog, Phoenix, Falcon, Scoop, Flume, Zookeeper, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.5.
Programing Languages: C, Core Java, SQL, and PL/SQL
Data Ingestions And ETL Tools: Flume, Sqoop, storm, Kafka
Business Intelligence Tools: MSBI, Stack(SSIS,SSRS),Visual Studio 2013/2011/2008/2005
Databases: Oracle 11g, MySQL, MS SQL Server, Confidential DB2.
Relational Databases: MYSQL
NoSQL Databases: HBase, MongoDB, Cassandra.
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP, Windows 8.
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH.
R Programming: R Scripting, R Studio, R md, tidyr, Dplyr, ggplot2, Apache Spark
Others: Shiny, I Python Notebook, Apache Zeppelin, VMware
Hadoop System Administrator
Confidential, Redmond, WA
- Created a new project solution based on the company's technology direction ensured that infrastructure services are projected based on current standard
- Upgrading the cluster from CDH 4. x to CDH 5.x And Implemented HA for namenode and HUE using Cloudera manager
- Created and configured cluster monitoring service activity monitor, service monitor, report manager, event server and alert publisher.
- Created cookbooks/playbooks and documentations for special tasks
- Configured HA proxy for IMPALA service
- Writing desktop scripts for synchronizing the data within and across clusters.
- Created snapshot's for in cluster backup of the data instance.
- Created SQOOP scripts for ingesting data from Transactional systems to Hadoop.
- Regularly accessing JIRA and Service now tools and other internal issue trackers for the Project development.
- Conducted Technology Evaluation sessions for Big Data, Data Governance, Hadoop and Amazon Web Services, Tableau and R, Data Analysis, Statistical Analysis, Data Driven Business Decision
- Integrated Tableau, Teradata, DB2, ORACLE via ODBC/JDBC drivers with Hadoop
- Worked with application teams to install the operating system, Hadoop updates, patches, version upgrades as required.
- Created scripts for automating balancing data across the cluster using the HDFS load balancer utility.
- Created POC for implementing streaming use case with Kafka and HBase services.
- Working experience of maintaining MySQL database creation and setting up the users and maintain the backup of databases.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Integrated is an existing LLE and Production cluster with LDAP.
- Implemented TLS for CDH Services and for Cloudera Manager.
- Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Managed the backup and disaster recovery for Hadoop data. Coordinated root cause analysis efforts to minimize future system issues
- Served as lead technical infrastructure Architect and Big Data subject matter expert.
- Deployed Big Data solutions in the cloud. Built, configured, monitored and managed end to end Big Data applications on Amazon Web Services (AWS)
- Screen Hadoop cluster job performances and capacity planning
- Spinning clusters in Azure using Cloudera director. Implemented this for POC for the cloud migration project.
- Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts
- Defined Migration strategy to move the application to the cloud. Developed architecture blueprints and detailed documentation. Created bill of materials, including required Cloud Services (such as EMR, EC2, S3 etc.) and tools, experience in scheduling cron jobs on EMR
- Created bash scripts frequently, depending on the project requirements
- Work on GCP Cloud architecture design patterns
- Guided application teams on choosing the right file formats in Hadoop file systems Text, Avro, Parquet and compression techniques such as Snappy, bz2, LZO
- Improved communication between teams in the matrix environment which led to increase in number of simultaneous projects and average billable hours
- Substantially improved all areas of the software development life cycle for the company products, introducing frameworks, methodologies, reusable components and best practices to the team
- Implemented VPC, Auto scaling, S3, EBS, ELB, Cloud Formation templates and Cloud Watch services from AWS
Technical Environment: Over 1500 nodes, Approximately 5 PB of data, Cloudera's distribution Hadoop (CDH) 5.5, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Cloudera Navigator, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Kafka, Storm, Cobbler, Puppet
Confidential, Bellevue, WA
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Used Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data
- Designed, planned and delivered a proof of concept and business function/division based implementation of a Big Data roadmap and strategy project
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Involved in MapReduce Converged Data Platform was built with the idea of data movement in mind, with a real-time
- Involved in exporting the analyzed data to the databases such as Teradata, MySQL and Oracle use Sqoop for visualization and to generate reports for the BI team.
- Worked on an Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data in a timely manner
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC connector.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
- Running Periodic Map-Reduce jobs to load data from Cassandra into Hadoop
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
- Experience in analyzing Cassandra database and comparing it with other open-source NoSQL databases to find which one of them best suits the current requirements.
- Transformed the data using Hive, Pig for BI team to perform visual analytics, according to the client requirement.
- Developed scripts and automated data management from end to end and sync up b/w all the Clusters
- Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users.
Environment: Cloudera CDH 3/4 Distribution, HDFS, MapReduce, Cassandra, Hive, Oozie, Pig, Shell Scripting, MySQL
Hadoop Infrastructure Administrator
Confidential, San Jose, CA
- Installed, configured and maintained 70 node - Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, Hbase and Sqoop.
- Extensively worked with Cloudera Distribution Hadoop CDH 5.x
- Extensively involved in cluster capacity planning, hardware planning, installation, performance tuning of the Hadoop cluster.
- Worked on installing cluster, commissioning & decommissioning of DataNode, NameNode recovery etc.,
- Installed and configured Hue interface for UI access of Hadoop components like hive, pig, Oozie, Sqoop, Hbase, file browser etc.,
- Installed Cloudera Navigator to configure, collect and view audit events such as timestamp, operation, users.
- Timely and reliable support for all production and development environment: deploy, upgrade, operate and troubleshoot.
- Refactor existing Opcode Chef Automation code.
- Built and deployed a Chef Server in AWS for infrastructure automation.
- Helped in Hive queries tuning for performance gain.
- Configured Data lake which serves as a base layer to store and do analytics on data flowing from multiple sources into Hadoop Platform
- Provide support to developers, install their custom software's, upgrade Hadoop components, solve their platform issues, and help them troubleshooting their long running jobs.
- Daily status checks for Oozie workflow and monitor Cloudera manager and check data node status to ensure nodes are up and running.
- Expertise in performing Hadoop cluster tasks like commissioning and decommissioning of nodes without any effect to running jobs and data.
- Used Sqoop import and export extensively.
- Scheduling production batch jobs using Control-M
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing key tab using key tab tools.
- Configured NameNode high availability and Resource Manager high availability
- Resolved various issues faced by users, which are related to platform.
- Act as point of contact for workflow failure/hitches.
- Worked round the clock especially during deployments.
- Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper using these tools Ganglia and Nagios.
Technical Environment: CDH 5.4.3 and 4.x, Cloudera Manager CM 5.1.1, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Chef, RedHat/Centos 6.5, Control-M
Linux System Administrator
- Administration of RHEL 5/6 which includes installation, testing, tuning, upgrading and loading patches, troubleshooting server issues.
- Configure and automate the deployment of Linux and VMware infrastructure through our existing Kickstart infrastructure.
- Configure Linux guests in a VMware ESX environment.
- Understand server virtualization technology such as VMware.
- Worked on Cisco USC, virtual infra on VMware, Storage migration and installations.
- Installing, configuring, custom building Oracle10g and preparing servers for database installation which includes adding kernel parameters, software installation, permissions etc.
- Implemented multi-tier application provisioning in OpenStack cloud, integrating it with Puppet.
- Involved in integrated VSphere hypervisor with OpenStack.
- Configure and maintained FTP, DNS, NFS and DHCP servers.
- Configuring, maintaining and troubleshooting of local development servers.
- Performed configuration of standard Linux and network protocols, such as SMTP, DHCP, DNS, LDAP, NFS, SMTP, HTTP, SNMP and others.
- Written shell scripting for automation.
- Developed puppet recipes for automation of Hadoop Installation and configuration of nodes.
- Worked on virtual and physical Linux host for decommission.
- Server Administrator Tomcat, Tomcat serving dynamic servlet and JSP requests.
- Managing Cron jobs, batch processing and job scheduling.
- Worked on planning for the recovery of critical IT systems and services in a fallback situation following a disaster that overwhelms the resilience arrangements.
- Monitoring system activities like CPU, memory, disk and swap space usage to avoid any performance issues.
- Tuning the Kernel parameters for the better performance of applications like Oracle.
- Provided 24X7 on-calls production and customer support including trouble shooting problems.
Technical Environment: LINUX, FTP, Shell, UNIX, VMware, NFS, TCP/IP, Puppet, Oracle, Red Hat Linux.