- 13 years of Professional experience in IT background which includes 6 years in Hadoop Technologies and extensive experience in Linux flavors
- Experience in implementing Data Warehousing/ETL solutions for different domains like financial, telecom, loyalty, Retail and insurance verticals
- Experience in operating and managing large clusters with 650+ nodes and 4+ Peta Bytes of storage.
- Hands on experience in setting up, configuring Hadoop ecosystem components like Hadoop, MapReduce, HDFS, HBase, IMPALA, OOZIE, HIVE, SQOOP, PIG, SPARK, FLUME, KAFKA, SENTRY services.
- Experience in planning, implementing, testing and documenting the performance benchmarking to Hadoop platform.
- Helped in planning, development and architecture of Hadoop ecosystem.
- Experience in both On - Premisis and Cloud space: AWS, GCP, and AZURE.
- Experience with securing Hadoop clusters by implementing Kerberos KDC installation, LDAP Integration, data transport encryption with TLS, and data-at-rest encryption with Cloudera Navigator Encrypt.
- Experience on Design, configure and manage the backup and disaster recovery using Data Replication, Snapshots, Cloudera BDR utilities.
- Implemented Role based authorization for HDFS, HIVE, IMPALA using Apache Sentry.
- Good Knowledge on Implementing and using Cluster monitoring tools like Cloudera Manager,Ganglia and Nagios.
- Experienced in implementing and supporting auditing tools like Cloudera Navigator.
- Knowledge on implementing external authentication with Identity providers like Okta, IdP using SAML.
- Experience in implementing High Availability features for services like Namenode, HUE, IMPALA.
- Hands-on experience in Deploying and using automation tools like PUPPET for cluster configuration management.
- Experience in creating cookbooks/playbooks and documentations for Installation, upgrades and support projects.
- Participated in the application on-boarding meetings along with Application owners, Architects and helps them to identify/review the technology stack, use case and estimation of resource requirements.
- Experience in documenting standard practices and compliance policies.
- Participated and lead the upgrades of CDH4 and CDH5.
- Fix the issues by interacting with dependent and support teams and log the cases based on the priorities.
- Assisted in tuning the performance of the Hadoop ecosystem as well as monitoring.
- Hands on experience in performing functional testing and helps application teams/users to in corporate third party tools with Hadoop environment.
- Experience in analyzing Log files and finding the root cause and then involved in analyzing failures, identifying root causes and taking/recommending course of actions.
- Experience in Data Warehousing and ETL processes.
- Knowledge of integration with Reporting tools like Tableau, Micro-Strategy and Datameer.
- Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing job counters and application logs files.
- Experienced in tuning performance for various services
- Experienced in job scheduling and monitoring tools like Control M, Nagios and Ganglia.
- Additional responsibilities include interacting with offshore team on a daily basis, communicating the requirement and delegating the tasks to offshore/on-site team members and reviewing their delivery.
- Good Experience in managing Linux platform servers
- Effective problem-solving skills and ability to learn and use new technologies/tools quickly
- Good scripting knowledge in Bash shell scripting.
- Experience in working on ITIL tools JIRA, SUCCEED and SERVICE-NOW tools for change management and support processes.
- Has good experience, excellent communication and interpersonal skills which contribute to timely completion of project deliverables well ahead of schedules
- Experience in providing 24x7X365, on-call and weekend production support.
Operating Systems/Platforms: UNIX& Linux (CentOS 6 & RHEL6), CentOS, Ubuntu 14. x, AIX, Windows
Programming Languages: C, C++, Java, Pig Latin, SQL, HQL
Cloud Computing Services: VMware, AWS, Google Cloud, Microsoft Azure
CRM Package: Siebel 7. x, Siebel 8. x
SQL & NOSQL Data Storage: PostgreSQL, MYSQL, Cassandra, MongoDB, Teradata, Oracle
Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, SQOOP, OozieYARN, flume, Impala, Ganglia, Storm, Cassandra, Sentry, Kafka.:
Management Tool: Cloudera Manager, Ambari
Application Servers: WebLogic 11g, 12c, Tomcat 5. x and 6.x
ETL Tool: Informatics 8. x and 9. x, BODS 4.0/4.1, Talend
Reporting tools: BI Publisher,, Web Intelligence, Tableau, Micro Strategy, Datameer
SCM Tools: Perforce, Team track, VSS, Harvest, SVN and HP Quality Centre, Jira
Methodology: Agile SDLC, UML
Scripting language: Bash, Perl, Pig, Python, Puppet
Security: Kerberos, Sentry, LDAP, AD, SSL/TLS, REST Encryption
Protocols: TCP/IP, UDP, SNMP, Socket Programming, Routing Protocol
Hadoop Administrator/ Hadoop Infrastructure Architect
Confidential - Hoffman Estates, IL
- Created a new project solution based on the company's technology direction ensured that infrastructure services are projected based on current standard
- Upgrading the cluster from CDH 4. x to CDH 5.x.
- Implemented HA for namenode and HUE using Cloudera manager
- Created and configured cluster monitoring service activity monitor, service monitor, report manager, event server and alert publisher.
- Created cookbooks/playbooks and documentations for special tasks
- Configured HA proxy for IMPALA service
- Writing desktop scripts for synchronizing the data within and across clusters.
- Created snapshot’s for in cluster backup of the data instance.
- Created SQOOP scripts for ingesting data from Transactional systems to Hadoop.
- Regularly accessing JIRA and Service now tools and other internal issue trackers for the Project development.
- Conducted Technology Evaluation sessions for Big Data, Data Governance, Hadoop and Amazon Web Services, Tableau and R, Data Analysis, Statistical Analysis, Data Driven Business Decision
- Integrated Tableau, Teradata, DB2, ORACLE via ODBC/JDBC drivers with Hadoop
- Worked with application teams to install the operating system, Hadoop updates, patches, version upgrades as required.
- Created scripts for automating balancing data across the cluster using the HDFS load balancer utility.
- Created POC for implementing streaming use case with Kafka and HBase services.
- Working experience of maintaining MySQL database creation and setting up the users and maintain the backup of databases.
- Implemented Kerberos Security Authentication protocol for existing cluster.
- Integrated is existing LLE and Production clusters with LDAP.
- Implemented TLS for CDH Services and for Cloudera Manager.
- Working with data delivery teams to set up new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Managed the backup and disaster recovery for Hadoop data. Coordinated root cause analysis efforts to minimize future system issues
- Served as lead technical infrastructure Architect and Big Data subject matter expert.
- Deployed Big Data solutions in the cloud. Built, configured, monitored and managed end to end Big Data applications on Amazon Web Services (AWS)
- Screen Hadoop cluster job performances and capacity planning
- Spinning clusters in Azure using Cloudera director. Implemented this for POC for the cloud migration project.
- Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure, highly scalable and flexible systems that handled expected and unexpected load bursts
- Defined Migration strategy to move the application to the cloud. Developed architecture blueprints and detailed documentation. Created bill of materials, including required Cloud Services (such as EMR, EC2, S3 etc.) and tools, experience in scheduling cron jobs on EMR
- Created bash scripts frequently, depending on the project requirements
- Work on GCP Cloud architecture design patterns
- Guided application teams on choosing the right file formats in Hadoop file systems Text, Avro, Parquet and compression techniques such as Snappy, bz2,LZO
- Improved communication between teams in the matrix environment which led to increase in number of simultaneous projects and average billable hours
- Substantially improved all areas of the software development life cycle for the company products, introducing frameworks, methodologies, reusable components and best practices to the team
- Implemented VPC, Auto scaling, S3, EBS, ELB, Cloud Formation templates and Cloud Watch services from AWS
Environment: Over 1500 nodes, Approximately 5 PB of data, Cloudera's distribution Hadoop (CDH) 5.5, HA name node, map reduce, Yarn, Hive, Impala, Pig, Sqoop, Flume, Cloudera Navigator, Control-M, Oozie, Hue, White elephant, Ganglia, Nagios, HBase, Cassandra, Kafka, Storm, Cobbler, Puppet
Sr. Hadoop Consultant
Confidential - Chicago, IL
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including Pig, Hive, Sqoop, Oozie and Zookeeper.
- Used Sqoop to migrate data to and fro from HDFS and My SQL or Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data
- Designed, planned and delivered a proof of concept and business function/division based implementation of a Big Data roadmap and strategy project
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Involved in MapReduce Converged Data Platform was built with the idea of data movement in mind, with a real-time
- Involved in exporting the analyzed data to the databases such as Teradata, MySQL and Oracle use Sqoop for visualization and to generate reports for the BI team.
- Worked on an Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data in a timely manner
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using the Hive ODBC connector.
- Running Periodic Map-Reduce jobs to load data from Cassandra into Hadoop
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
- Experience in analyzing Cassandra database and comparing it with other open-source NoSQL databases to find which one of them best suits the current requirements.
- Transformed the data using Hive, Pig for BI team to perform visual analytics, according to the client requirement.
- Developed scripts and automated data management from end to end and sync up b/w all the Clusters
- Implemented Fair schedulers on the Job Tracker to share the resources of the cluster of the Map Reduce jobs given by the users
Environment: Cloudera CDH 3/4 Distribution, HDFS, MapReduce, Cassandra, Hive, Oozie, Pig, Shell Scripting, MySQL
Sr. Hadoop Administrator
Confidential, New York, NY
- Strong working experience with open source technology
- Store unstructured data in semi structure in HDFS using HBase
- Used Change management and Incident management process following the company standards
- Implemented partitioning, dynamic partitions and buckets in HIVE
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager
- Demonstration of the Live Proof Of Concept Demo to Clients
- Supported technical team members in the management and review of Hadoop log files and data backups.
- Suggested improvement processes for all process automation scripts and tasks.
Environment: Hadoop, Map Reduce, Hive, Pig, Oozie, HBase, Sqoop, Flume, Java,MySQL, Eclipse, UNIX Script
Confidential, Los Angeles, CA
- Prepared test packs, release notes, deployment checklists
- Maintaining Siebel Server Components & Parameters
- Deployed the new releases in different Environments
- Remote Administration - Managing & monitoring all the Remote components.
- Attending the P1/P2 calls to resolve the issues
Environment: Siebel 8. x, Oracle, AIX, Harvest, OBIEE
- Perform Siebel upgrades, patch installation
- Analyzing and resolving the Production Incidents tickets adhering to SLA’s.
- Prepared deployment plans for major upgrade releases
- Monitoring components, Server utilizations and Troubleshooting issues as Server
Environment: Siebel 8.x, Oracle, Linux
- Compilation of the srf in both English and Spanish
- Responsible for Migrating of Repository
- Siebel Installations, Upgrades, Applying patches & Repository Migrations
- Health checks the Production servers
Environment: Siebel 8. x, Oracle, Linux
- Responsibilities undertaken as Siebel Server Administration
- Installation and configuration of third party products
- Non-Repository Migrations to various testing environments and troubleshooting Migration Issues
- Setup & Maintenance of offshore Develops data (Data Extracts &synchronization)
- User Creation at Application and Database Level
Environment: Siebel7x, UNIX, Solaris Servers, Windows 2000, Oracle10g
- Involved in maintenance / enhancement of the Wyse Linux V6.
- Have worked on tailoring the Linux to suit the hardware requirements of Wyse thin Clients
- Involved in BUG fixing and improvement of the overall Linux Product.
- Fixed problems arising due to the window manager issues. (qvwm window manager)
Environment: Linux, GTK, C++, Java, GNU C Compiler