- Over 7 years of experience in large - scale Information Technology projects.
- Around 3 years of experience in implementation of Big Data analytics based on Hadoop ecosystem.
- Maintained Hadoop clusters for Development/Quality/Production environments.
- Experience with Hortonworks & Cloudera Manager Administration.
- Excellent knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Resource Manager, Node Manager & MapReduce.
- Hands on experience on Installation and configuration of Apache Hadoop cluster on Linux platform.
- Adept in installing, configuring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Pig, Hive, Sqoop, Zookeeper, Flume and Kafka.
- Good experience in implementing Kerberos & Ranger in Hadoop Ecosystem.
- Provided level 1, 2 & 3 technical supports.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database systems and vice-versa.
- Strong knowledge of Hadoop platforms and other distributed data processing platforms.
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari & Cloudera Manager.
- Investigated on new technologies like Spark to catch up with industry developments.
- Good Knowledge of Informatica and SAS.
- Experience creating reports using Tableau.
- Strong knowledge in writing Shell Scripts.
- Demonstrated leadership capability and ability to effectively communicate with all levels of management.
- Worked with business users to extract clear requirements to create business value.
- Worked with big data teams to move ETL tasks to Hadoop.
- Exceptionally well organized that demonstrates self-motivation, learning, creativity & initiatives, extremely dedicated & possess skills in actively learning new technologies within short span of time.
- Achieved SLA/SLO standards.
- Strong experience in 24x7 environments
- Good experience in both Waterfall & Agile methodologies.
Hadoop: HDFS, MapReduce, Yarn, Hive, HBase, Impala, Hue, HCatalog, Pig, Sqoop, Flume, Oozie, Zookeeper, Ranger, Kerberos, Knox, Cassandra
Operating System: UNIX, Linux, Windows vista/XP/ 2000/98/95/7/8, MS DOS
Database Languages: SQL, PL/SQL
NoSQL Databases: HBase, Cassandra
Hadoop Cloud: AWS, Rackspace, S3
Hadoop Monitoring & Management: Ganglia, Ambari, Nagios, Ambari, Cloudera Manager
Hadoop Distributions: Hortonworks HDP, AWS, Cloudera Manager
Other Tools: MS-OFFICE - Excel, Word, PowerPoint, Adobe Photoshop, Macromedia Flash
Confidential, Oaks, PA
- Helped the team to increase cluster size from 35 nodes to 80+ nodes. The configuration for additional data nodes was managed using Puppet.
- Performed both major and minor upgrades to the existing Hortonworks Hadoop cluster.
- Integrated Hadoop with Active Directory and enabled Kerberos for Authentication.
- Applied patches and bug fixes on Hadoop Clusters.
- Worked on setting up High Availability for major production cluster and designed automatic failover.
- Performance tune Hadoop cluster to achieve higher performance.
- Experience in setting up a new cluster from existing cluster using Ambari blueprint.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data.
- Configured Hive metastore with MySQL, which stores the metadata for Hive tables.
- Experience in setting up Queues so as the application teams SLA is met and finally ensured in having a robust capacity scheduling.
- Worked on Apache Ranger for HDFS, HBase, Hive access and permissions to the users through active directory.
- Worked on different compression codecs and ORC file format.
- Worked on integration of Hiveserver2 with Tableau.
- Benchmarking Hadoop cluster using DFSIO, Teragen and Terasort.
- Installed, configured Disaster Recovery using Falcon.
- Enabled Kerberos for Hadoop cluster Authentication and integrate with active directory for managing users and application groups.
- Moved data from HDFS to RDBMS and vice-versa using SQOOP.
- Performed Analyzing/Transforming data with Hive and Pig.
- Mentoring & supporting team members in managing Hadoop cluster
- Reviewing and updating all configuration & documentation of Hadoop clusters as part of continuous improvement processes.
- Performed data completeness, correctness, data transformation & data quality testing using available tools & techniques
- Wrote Shell Scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Developed some Map reduce jobs in java for data cleaning and preprocessing.
- Provided rotational on call support and educated users on new implementations as per maintenance.
Environment: Hortonworks Data Platform, Linux, HBase, Hive, Pig, Kerberos, Ranger, Knox, Sqoop, Oozie, Zookeeper, Capacity Scheduler, Fairscheduler, Shell Scripting, MySQL, Ganglia, Nagios, Git & Jenkins tools, RHEL, CentOS etc.
Confidential, Sacramento, CA
Hadoop Technical Consultant
- Installation and Administration of a Hadoop cluster using Cloudera Distribution.
- Debugging and troubleshooting the issues in development and Test environments.
- Commissioned data nodes as per the cluster capacity and decommissioned when the hardware degraded or failed.
- Designed, configured and managed the backup and disaster recovery for HDFS data.
- Designed workflow by scheduling Hive processes for file data, which is streamed into HDFS using Flume.
- Design & Develop ETL workflow using Oozie for business requirements, which includes automating the extraction of data from MySQL database into HDFS using Sqoop scripts.
- Performed systems monitoring, upgrades, performance tuning.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Performed migration of data across clusters using DISTCP.
- Implemented schedulers on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
- Cloudera Enterprise Navigator for Hadoop Audit files and Data Lineage.
- Monitoring Hadoop Clusters using CM UI, Ganglia and 24x7 on call support.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Worked on Impala performance tuning with different workloads and file formats.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MR, HIVE, SQOOP and Pig Latin.
- Worked on providing User support and application support on Hadoop Infrastructure
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Preformed Linux server administration to the server hardware and operating system.
Environment: CDH, Cloudera Manager, RHEL, Hive, Pig Latin, Oozie, SQOOP, HBase, Cassandra, AWS, S3, Ganglia.
Confidential, Dallas, TX
- Installed, configured and deployed a 30 node Cloudera Hadoop cluster for development and production.
- Performance tune Hadoop cluster to achieve higher performance.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
- Developed Hive queries for Analysis across different banners.
- Worked on pulling the data from relational databases, Hive into the Hadoop cluster using the Sqoop import for visualization and analysis.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Used Ganglia and Nagios for monitoring the cluster around the clock.
- Used Nagios plugins to monitor Hadoop Name Node Health status, number of Task trackers running, number of Data nodes running.
- Designed and implemented a distributed network monitoring solution based on Nagios and Ganglia using puppet.
- CRON job automation for backups and recurring jobs needed for other applications.
- 24x7 on call support on a rotational basis.
Confidential, New York
LINUX System Administrator
- Installation and configuration of Red Hat Linux, Solaris, Fedora and Centos OS on new server builds as well as during the upgrade situations.
- Log management like monitoring and cleaning the old log files.
- System audit report like no. of logins, success & failures, running cron jobs.
- System performance for hourly basis or daily basis.
- Remotely coping files using sftp, ftp, scp, winscp, and filezilla.
- Created user roles and groups for securing the resources using local OS authentication.
- Configuring & monitoring DHCP server.
- Taking backup using tar and recovering during the data loss.
- Experience in writing bash scripts for job automation.
- Maintaining relations with project managers, DBA's, Developers, Application support teams and Operational support teams to facilitate effective project deployment.
- Manage system installation, troubleshooting, maintenance, performance tuning, managing storage resources, network configuration to fit application and database requirements.
- Responsible for modifying and optimizing backup schedules and developing shell scripts for it.
- Performed regular installation of patches using RPM and YUM.
- Experience managing various file systems using LVM and SVM and also configured file systems through network using NFS, NAS, SAN methodologies and installed RAID devices.
Confidential, Boston, MA
- Installing and maintain the Linux servers.
- Good knowledge on Apache Web Server, Mail Servers, Database Servers, DNS Servers, Backup Management, Server Security Management, Live migration between servers remotely, Virtualization (Open VZ) and RAID devices.
- MySQL Database - Tuning MySQL server parameters, repairing tables if crashed, error logging, database Maintenance, ensuring remote MySQL availability.
- Monitoring System Metrics and logs for any problems.
- Running cron-tab & autosys jobs to back up MySQL data.
- Installing and updating packages using YUM.
- Performance Monitoring - DDOS monitoring, Load Balancing, Remote Desktops, VNC, etc.
- Application Protocols - Excellent understanding of SMTP, HTTP, FTP, PPTP and Open VPN Systems/Hardware - IPMI, KVMs.
- Security - IP Tables, CSF, SSH. Preventing DDOS attack prevention using IP Tables and other security tools.