- 10 years of professional IT experience which includes 7+ years of proven experience in Hadoop Administration using Hortonworks (HDP), Cloudera Distributions.
- Expertise in HDFS Architecture and Cluster concepts.
- Strong in shell scripting and programming
- Experienced in installation, configuration, supporting and monitoring 30+node Hadoop cluster using Hortonworks HDP 2.6 and Cloudera CDH5 distributions.
- Experience in upgrading Hortonworks Hadoop HDP2.2.0 and MapReduce 2.0 with YARN in Multi Clustered Node environment.
- Maintained and optimized AWS infrastructure (EMR EC2 AMI, IAM roles for users/system).
- Knowledge on Hadoop HDFS architecture and MRv1, MRv2 (YARN) framework.
- Created Hive tables and loaded the data into tables and query data using HQL.
- Managing & review of Hadoop log files.
- Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at Enterprise level.
- Experience in using Pig, Hive, Sqoop, Flume, Zookeeper and Kafka.
- Experience configuring the Resource Manager, Name Node and Hive Server 2 High Availability.
- Expertise in Hadoop Security and Hive Security and validation.
- Expertise in Hive Query Language and debugging hive metastore issues.
- Moving RHAS machines from physical to virtual including VMware ESX 3.5 and VSphere 4.0
- Installing and upgrading VMware tools on client machines.
- Monitored Hadoop cluster using tools like Cloudera Manager 5.5.1, Nagios, Ganglia and Ambari.
- Managed Patches, Upgrades and Licensed Products for System software on all flavors of UNIX and Linux Servers.
- Investigated on new technologies like Spark to catch up with industry developments.
- Excellent understanding and knowledge on NOSQL database like HBASE
Programming Languages: Java, Scala
Shell Scripting: Bash, Python
Hadoop Distributions: Hortonworks, Cloudera Virtualization in VMware ESXi 6.
No SQL Databases: Hbase
Big Data Ecosystems: YARN, MapReduce, HDFS, Hive, Pig, Sqoop, Kafka, Zookeeper, Oozie, Spark.
Operating Systems: Windows variants, Linux Ubuntu, Centos.
Cloud Technologies: Amazon AWS - EC2
- Supported 200+ servers and 50+ users to use Hadoop platform and resolve tickets and issues they run into and provide training to users to make Hadoop usability simple and updating them for best practices.
- Installed, configured and optimized Hadoop infrastructure using Cloudera Hadoop distributions CDH5 using Puppet.
- Good understanding on Architecture, Configurations and Security of Cassandra. Data read path and write path for Cassandra.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Developed simple and complex MapReduce programs in Java for Data Analysis.
- Transfer the data from HDFS TO MONGODB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
- Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
- Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
- Written Flume configuration files to store streaming data in HDFS.
- Upgraded Kafka 0.8.2.2 to 0.9.0.0
- As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
- Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
- Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Installation and configuration of Openstack Juno on RedHat6 with multiple Compute nodes.
- Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
- Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration. Worked on YUM configuration and package installation through YUM.
- Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Used Jira for project tracking, Bug tracking and Project Management. Load data from various data sources into HDFS using Kafka.
- Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
- Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.
Environment: CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Zookeeper-3.4.5, Hue-2.5.0, Jira,Web Logic 8.1 Kafka, Yarn,Impala,Pig,Scripting, MySQL, Red Hat Linux, CentOS and other UNIX utilities, Cloudera Manager.
Confidential, New Brunswick, NJ
- Good knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm. Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop. Involved in migrating java test framework to python flask.
- Responsible for developing data pipeline using HD Insight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Experience in working with Ranger in enabling metadata management, governance and audit.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Oracle, Teradata, and DB2 database.
- Experience in methodologies such as Agile, Scrum and Test driven development
- Worked on installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, Cassandra and slots configuration.
- Installed, configured, and administered a small Hadoop clusters consisting of 10 nodes. Monitored cluster for performance and, networking and data integrity issues.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Architecture and designed Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Puppet, HDP 2.2.4.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Performance tuning a Cassandra cluster to optimize writes and reads. Load log data into HDFS using Flume, Kafka and performing ETL integrations.
- Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Install OS and administrated Hadoop stack with CDH5 (with YARN) Cloudera Distribution including configuration management, monitoring, debugging, and performance tuning.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Scripting Hadoop package installation and configuration to support fully-automated deployments.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Environment: Hive, Pig, HBase, Zookeeper and Sqoop, ETL, Ambari 2.0, Linux Cent OS, MongoDB, Cassandra, Ganglia and Cloudera Manager.
Confidential, Norwalk, CT
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks, MapR and Cloudera clusters.
- Responsible for architecting Hadoop clusters with Hortonworks distribution platform HDP 1.3.2 and Cloudera CDH4.
- Experience in setting up of Data Sources, Configuring Servlets Engines, Session Managers including planning installation and configuration of Web Logic Application Servers.
- Used Config wizard and WLST scripts to create and manage Weblogic domains.
- Involved in setting up cluster environment for Web Logic Server integrated with multiple workflows.
- Handling Mainframe Batch job a bends and critical batch through OPC/TWS on a priority basis and ensure production cycles are not delayed.
- Responsible on-boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
- Wrote Pig scripts to load and aggregate the data.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hbase database and Sqoop .
- Experience on Commissioning, Decommissioning, Balancing, and Managing Nodes and tuning server for optimal performance of the cluster.
- Load and transform large sets of structured, semi structured and unstructured data.
- Installed and configured Hive.
- Extensively involved working in Unix Environment and Shell Scripting
- Helped the users in production deployments throughout the process.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Added new Data Nodes when needed and ran balancer. Responsible for building scalable distributed data solutions using Hadoop.
- Involved in working on Cassandra database to analyze how the data get stored Continuous monitoring and managing the Hadoop cluster through Ganglia and Nagios.
- Wrote complex Hive queries and UDFs in Java and Python.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability. Also Done major and minor upgrades to the Hadoop cluster.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages. Done stress and performance testing, benchmark for the cluster.
- Commissioned and decommissioned the Data Nodes in the cluster in case of the problems.
- Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team from Cloudera.
Environment: Flume, Oozie, Cassandra, WebLogic, Pig, Sqoop, Mongo, Hbase, Hive, Map-Reduce, YARN, Hortonworks and Cloudera Manager.
- Managing resolution of all ICT enquiries escalated to the Helpdesk in a timely and efficient manner for remote and onsite users, maintaining a log of software and hardware issues detected for future investigation.
- Administration of Active Directory, DNS and DHCP technologies on Microsoft Windows Server 2008.
- Install new / rebuild and upgrading existing servers, configure hardware, peripherals, services, settings, directories, HP storage works, Printers.
- Perform ongoing performance tuning, hardware upgrades and resource optimization as required and regularly monitor security to identify any possible intrusions.
- Identify approaches that leverage ICT resources and provide economies of scale.
- Perform daily backup operations, ensuring all required file systems and system data are successfully backed up to the appropriate media, recovery tapes or disks are created, and media is recycled and sent off site as necessary and verifying completion of scheduled jobs such as daily and weekly backups including remote sites using Symantec Backup Exec.
- Exploring opportunities for improvement and innovation of ICT systems, and assessing future ICT needs in consultation with users and stakeholders.
- Document all ICT policy, process and work instruction documentation in line with organization requirements
- Environment: Windows 2008 Server.