- 10+ yrs experience in deploying and managing the multi - node development, testing and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Cloudera Manager and HortonworksAmbari).
- Skilled in cloud computing(AWS,AZURE), information management, application development, virtualization, business process management, business rules management and big data management. Demonstrated mastery in evaluating requirements for business application integration and service activation. Proven mentor and training with expertise in communicating across organizational levels and with cross-functional teams to drive shared vision and foster culture of excellence.
- Strong knowledge on Hadoop HDFS architecture and Map-Reduce framework.
- Experienced in providing cloud based solutions environments for highly high-available in customer driven businesses .
- Widely experienced in cloud deployment tools like cloudbreak, cloudera navigator, databrics, apache NIFI, streamsets, bluedata along with containerization and dockerization of clusters for datascience workbench.
- Devops mastry for automating major day to day task using ansible, jenkins and git for configuration management.
- Extensive experience in monitoring clusters using cloud and apache based monitoring tools along with ticketing system like service now and salesforce to honor customer SLA's
- Experience in monitoring, capacity planning backup and recovery, security integration.
- Experience in benchmarking, performing backup and disaster recovery of Name Node metadata and important sensitive data residing on cluster.
Sr. Hadoop / Big Data Architect
Confidential, San Diego
- Applied advanced systems / infrastructure concepts to define, design and implement highly complex
- Hadoop systems, services and technology solutions.
- Responsible for implementation and ongoing administration of Hadoop infrastructure.
- Working with data delivery teams to setup new Hadoop users, Cluster maintenance
- Screen Hadoop cluster job performances and capacity planning. Monitor Hadoop cluster connectivity and security. Manage and review Hadoop log files.
- HDFS support and maintenance. Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Lead a team of systems / infrastructure professionals. System and Application Security
- Maintained complex security systems.
- Interprets and adopts campus, medical center or Office of the President, system and regulation - based security policies to control access to networked resources.
- Provided recommendations and requirements on network access controls.
- Coordinated with Linux support, Developers and key Data owners to maintain a HIPAA compliant Hadoop environment through access controls such as data encryption as rest and in transit, HDFS access control via Sentry, Kerberos and SSSD and access control to other Hadoop Applications such as Cloudera Manager, Cloudera Navigator and Hue via LDAP. Responsible to guarantee users receive least privileged access.
- Specifies, writes and executes highly complex software and scripts to support systems management, log analysis and other system administration duties for multiple, highly integrated systems. Design Oozie scheduler system workflows to manage Apache Hadoop jobs supporting
- ELT ingest activities (using Hive, Sqoop and other tools), HDFS activities (using HDFS and distcp) and development activities (using Java and other too
Sr Devops / Big Data Architect
- Build Apache Kafka Multinode Cluster, Implemented Kafka Manager to monitor multiple Clusters. Build Apache Storm Multininode Cluster
- Build Apache Cassandra Cluster, Datastax enterprise Clusters. Enabling and Configuring SOLR. Build Apache Spark Multinode Cluster. Working on replacing storm with SparkStreaming. Installing Exhibitor for zookeeper monitoring, Burrow
- Performance tuning of Kafka, Storm Clusters. Benchmarking Realt time streams
- In - depth knowledge on Architecture, read/write paths,Managing the cluster, Upgrading the Clusters
- Automation using Puppet.
- Developing Kafka prod-cons and Storm Topologies and deployments. Realtime production Monitoring and issue breaking.
- POC and evaluation of Blue Prism Automation
- Monitoring of server and application performance using tools like Hubble, Splunk, Epic & NmSys
Lead BigData Consultant
- Manage multiple clusters using Pivotal HD.Manage and Operate Peta Byte Cluster. Automate installation of Hadoop using Chef,puppet,icm - client.
- Performance tuning of HDFS, YARN.
- Manage and monitor cluster using nagios, check mk. Manage PCC command center.
- Writing Utilities for the big data platform, code review, define standards, best practices,Operations, regular maintenance, software upgrades.
- Work on the assigned ServiceNow tickets to resolution with the client
- Responsible for managing support cases on a daily basis including triage, isolating and diagnosing the problem, ensuring issues are reproducible, and subsequent resolution of the issue.
- Coordination between different teams involving release management, configuration management and Change Management.
- Responsible for building the knowledge base to prevent recurrence of escalation for previously resolved case.
- Continuous improvement of Incident Handle Times, First Contact Resolution, Escalation Rates, Self-Service/ Community Experience
- Help customer setup Ecosystem as per their use cases and Debugged Ecosystems issues (HDFS, MapReduce, Yarn, Talend, Kerberos, Sqoop, GemFire XD, Hawq, TABLEAU,, Hive, Hbase, Oozie, flume, Hue, Pig Drill).
- Build Apache Kafka Multinode Cluster.Implemented Kafka Manager to monitor multiple Clusters. Build Storm Multininode Cluster
- Build Datastax enterprise Clusters.
- Build Apache Spark Multinode Cluster. Working on replacing storm with SparkStreaming. High-availability of all the applications on the production cluster and 24X7 technical support. Participate in a 24x7 on-call rotation. Downtime management.
- Participate and provide feedback for capacity planning. Ensuring Support SLAs are met.
- Defining processes around Change Management, Release Management, Application Transition across the Hadoop Platform.
- Process definitions and ensuring best practices / guidelines are met.
Big Data/Systems Architect
- Provided hands - on subject matter expertise to build and implement Hadoop-based Big Data solutions. Research, evaluate, architect, and deploy new tools, frameworks, and patterns to build sustainable Big Data platforms for our clients.
- Designed and implemented complex highly scalable statistical models and solutions that comply with security requirements.
- Identifed gaps and opportunities for the improvement of existing client solutions Interact with, collaborate with, and guide clients, including at the executive level Defined and developed APIs for integration with various data sources in the enterprise
- Actively collaborated with other architects and developers in developing client solutions
- More focus on the implementation of Security in the Project. There are three different ways of getting inside the cluster. From the Host level, and the Web application levels) . Primarily focused on authorizing and authenticating. Also Implemented Data-in-Rest and Data-in-Motion Encryption and have provided security and Encryption at all levels
- End-to-End Clusters Implementations for Development, QA and Production
- Teamed up with Cloud,Security, Release and DatabaseArchitect in designing the company big data cluster architecture with security integration of different components involved with Hadoop Cluster
- Implementation of End to End Security to comply with the company security policies.
- Used Gazzang for Data at Rest Encryption. Implemented Ztrustee server and Zncrypt, enabled Process based Encryption.
- Designed the authorization of access for the Users using SSSD and integrating with ActiveDirectory Integrated all the clusters Kerberos with Company’s LDAP/Active Directory, and created USERGROUPS and PERMISSIONS for authorized access in to the cluster.
- Created Junction Files in Tivoli Access Manager. Integrated HUE/Cloudera Manager with SAML. Collaborated and Guided different teams for successful deployment in to the Production Cluster. Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Involved in Performance testing of the Production Cluster Using TERAZEN, TERASORT, TERAVALIDATE.
- Used TestDFSIO to validate READ/WRITE performance per second and also match the execution time for different file sizes and block sizes
- Did Performance Testing by calculating the Resources used on the memory. Have tuned the performance on a timely basis depending on the load of the cluster.
- Used MySQL to hold all the databases of the cluster. Implemented High Availability of the Database.
Senior Hadoop Consultant
- Monitored workload, job performance and capacity planning using Cloudera Manager
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Retrieved data from HDFS into relational databases with Sqoop.Parsed cleansed and mined useful and meaningful data in HDFS using Map-Reduce for further analysis
- Fine tuning hive jobs for optimized performance
- Partitioned and queried the data in Hive for further analysis by the BI team. Extending the functionality of Hive and Pig with custom UDF’ s and UDAF’s.
- Involved in extracting the data from various sources into Hadoop HDFS for processing. Wrote pig scripts for advanced analytics on the data for recommendations.
- Effectively used Sqoop to transfer data between databases and HDFS. Worked on streaming the data into HDFS from web servers using flume.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- Developed Map-Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Implemented complex map reduce programs to perform joins on the Map side using distributed cache. Designed and implemented custom writable, custom input formats, custom partitions and custom comparators.
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions.
- Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF’s.
- Designed and implemented PIG UDFS for evaluation, filtering, loading and storing of data
- Responsible for building a cluster for Storing 380TB Transactional data with an inflow of 10GB data every day.Performed various configurations, which includes, networking and IPTable, resolving hostnames, user accounts and file permissions, http, ftp, SSH key less login.Implemented authentication service using
- Kerberos authentication protocol.
- Performed benchmarking on the Hadoop cluster using different benchmarking mechanisms. Tuned the cluster by COMMISIONING and DECOMMISIONING the Data Nodes.
- Worked on performing MINOR UPGRADE from CDH3-u4 to CDH3-u6. Upgraded (MAJOR) the Hadoop cluster from cdh3 to cdh4.
- Deployed HIGH AVAILABILITY on the Hadoop cluster quorum journal nodes. Implemented automatic failover zookeeper and ZOOKEEPER failover controller.
- Configured GANGLIA which include installing GMOND and GMETAD daemons which collects all the metrics running on the distributed cluster and presents them in real-time dynamic web pages which would further help in debugging and maintenance.Deployed Network file system for Name Node Metadata backup. Performed cluster back using DISTCP, Cloudera manager BDR and parallel ingestion.
- Designed and allocated HDFS quotas for multiple groups.
- Configured and deployed hive metastore using MySQL and thrift server.
- Used hive schema to create relations in pig using Hcatalog.Development of Pig scripts for handling the raw data for analysis.Deployed Sqoop server to perform imports from heterogeneous data sources to HDFS. Configured flume agents to stream log events into HDFS for analysis.
- Configured Oozie for workflow automation and coordination.
- Custom monitoring scripts for NAGIOS to monitor the daemons and the cluster status. Custom shell scripts for automating redundant tasks on the cluster.
- Involved in requirements gathering, designing and developing the applications.
- Prepared UML diagrams for the project use case.
- Worked with Java String manipulations, to parse CSV data for applications.
- Worked with Java database connections to read, write data using Java applications.
- Developed user interface static and dynamic Web Pages using JSP, HTML and CSS.
- Involved in structuring Wiki and Forums for product documentation
- Involved in R&D, set up and designing Mediawiki, Phpbb and Joomla content management systems.
- Worked on incorporating Ldap service and Single sign on for the CMS web portal.
- Maintained the customer support portal.
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers, remote installation of Linux using PXE boot. Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
- Monitoring the System activity, Performance, Resource utilization.
- Responsible for maintenance Raid-Groups, LUN Assignments as per agreed design documents. Performed all System administration tasks like cron jobs, installing packages, and patches.
- Extensive use of LVM, creating Volume Groups, Logical volumes.
- Performed RPM and YUM package installations, patch and other server management.
- Performed scheduled backup and necessary restoration.
- Configured Domain Name System (DNS) for hostname to IP resolution
- Troubleshooting and fixing the issues at User level, System level and Network level by using various tools and utilities.
- Schedule backup jobs by implementing cron job schedule during non business hours.