- Cloudera Certified Administrator for Apache Hadoop.
- Multi - talented, cross-functional technical professional with progressive experience in Hadoop cluster Administration, Linux System, Security and technical support within large scale IT portfolios, service projects.
- 10 years extensive IT experience that includes application development, Data warehousing, Systems Administration, monitoring and troubleshooting experience on UNIX, Linux Hadoop and Teradata environments.
- 3+ years of experience in Hadoop Cluster Administration with Cloudera/Hortonworks distribution.
- Adding/Removing a Node with Cloudera Manager, Data Rebalancing. Maintaining backups for name node.
- Skilled in: Hadoop Cluster Installation, Administration and maintenance, Shell/Bash Scripting automation.
- Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise, iDRAC (Integrated Dell Remote Controller), Dell Open Manage and other tools.
- Excellent troubleshooting skills in Hardware, Software, Application and Network.
- In depth understanding/knowledge of Hadoop Architecture and various components such as hdfs, Resource Manager, Node Manager, Name Node, Data Node and mapreduce concepts.
- Extensively worked on commissioning and decommissioning of cluster nodes, replacing failed disks, file system integrity checks and maintaining cluster data replication.
- Installing and Administration of various Hadoop distributions like Cloudera, Hortonworks
- Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Hadoop integration with Business intelligence tools
- Namenode and Resource Manager High Availability
- General system performance monitoring for machines running a Hadoop Cluster with Nagios, (for monitoring parameters like disk space, disk partitions, etc.); managing Kerberos authentication for Hadoop.
- Management Tool- Ambari Tool, Cloudera Manager
- Participate in the research, design, and implementation of new technologies for scaling our large and growing data sets, for performance improvement, and for analyst workload reduction.
- End-to-end performance tuning of Hadoop clusters.
- Monitor Hadoop cluster job performance and capacity planning. HDFS support and maintenance
- Grown/Shrunk a Hadoop cluster by adding/removing Nodes and tuning server for optimal performance of the cluster; using Namenode and ResourceManager UI and rebalancing load in a cluster.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Involved in bench marking Hadoop cluster file systems various batch jobs and workloads.
- Experience in HDFS data storage and support for running map-reduce jobs.
- Support the integration needs of Hadoop with ETL & Reporting tools.
- Extensive knowledge on Data warehousing concepts, reporting, relational data bases.
- Hadoop Upgrades - CM & CDH Upgrade i.e., CDH 5.4, CDH 5.8, CDH 5.9.
Hadoop/BIG Data: HDFS, HBase, Hive, Sqoop, Oozie, Flume, Pig, Python, MapReduce and Spark.
BI tools: Ab Initio.
No SQL Databases: HBase
Database: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata.
Operation System: HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Scheduling tools: Autosys, Trivoli, Control-M, CA7.
Languages: SQL, C, C++, Core Java, AWK, Shell Scripting.
- Installed, Configured and Maintained Hadoop cluster for application development and Hadoop ecosystem components like Hive, Pig, Hbase, Zookeeper and Sqoop.
- Extensively worked with Cloudera Distribution Hadoop CDH 5.x
- Extensively involved in Cluster Capacity planning, Hardware planning, Installation, Performance Tuning of the Hadoop Cluster.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery etc.,
- Installed and configured Hue interface for UI access of Hadoop components like hive,pig, oozie, sqoop, Hbase, file browser etc.,
- Timely and reliable support for all production and development environment: deploy, upgrade, operate and troubleshoot.
- Helped in Hive queries tuning for performance gain.
- Configured Data lake which serves as a base layer to store and do analytics on data flowing from multiple sources into Hadoop Platform
- Provide support to developers, Install their custom software’s, upgrade Hadoop components, solve their platform issues, and help them troubleshooting their long running jobs.
- Implement both major and minor version upgrades to the existing cluster and also rolling back to the previous version if needed.
- Daily status check for Oozie workflow and monitor Cloudera manager and check data node status to ensure nodes are up and running.
- Expertise in performing Hadoop cluster tasks like commissioning and decommissioning of nodes without any effect to running jobs and data.
- Used sqoop import and export extensively.
- Extensively working on Spark using Python and Scala.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing keytab using keytab tools.
- Configured NameNode high availability and Resource Manager high availability
- Resolved various issues faced by users which are related to platform.
- Worked directly with vendors, partners and internal clients on gathering and refining technical requirements and designs in order to develop a working solution that addressed needs.
- Act as point of contact for workflow failure/hitches.
- Worked round the clock especially during deployments.
- Monitoring and maintaining Hadoop cluster Hadoop/HBase/zookeeper using these tools Ganglia.
Environment: Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Flume, Java (jdk1.6), Cloudera CDH 5.4, CDH 5.8, CDH 5.9, CentOS 6.6, UNIX Shell Scripting.
Confidential, Hartford, CT
- Responsible for setting up 24/7 Support on Big Data Clusters.
- Involved in log file management where the Hadoop logs greater than 7 days old were removed from log folder and loaded into HDFS and stored for 2 years for Audit purpose.
- Responsible for Availability of clusters and on boarding of the projects into Big Data Clusters.
- Automate common maintenance and BAU activities.
- Collaborate with cross-functional teams to ensure that applications are properly tested, configured, and deployed.
- Cluster maintenance including adding and removing cluster nodes; cluster Monitoring and Troubleshooting
- Extensively worked on cluster configuration files like hadoop-env.sh, core-site.xml, hdf-site.xml, Mapred-site.xml etc.
- Strong knowledge of open source system monitoring and event handling tools like Nagios and Ganglia.
- Experience in trouble shooting cluster problems by analyzing logs and setting log levels, fixing miss configuration, and resource exhaustion problems.
- Worked on troubleshooting performance issues and tuning Hadoop cluster.
- Managing the cluster and troubleshooting the issues using Cloudera manager.
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
- Involved in loading data from LINUX file system to HDFS.
- Experience in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented test scripts to support test driven development and continuous integration.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Analyzed large data sets by running Hive queries.
- Mentored analyst and test team for writing Hive Queries.
- Installed Oozie workflow engine to run multiple MapReduce jobs.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Confidential, Hartford, CT
ETL - Ab Initio Administrator.
- Extensively worked on UNIX shell scripting & Autosys for automation.
- Administration of Ab Initio project directories, data directories and the standard environment on Red Hat Linux servers.
- Maintenance of UNIX shell scripting for creating project directories on the UNIX System Services side of Mainframe.
- Gathering business requirements from the Business Partners and Subject Matter Experts.
- Experience on UNIX commands and Shell Scripting.
- Understanding the business data model and customer requirements.
- Preparing high and low level designs of the system along with build strategy.
- Building various components i.e. Graphs, Scripts and Oracle Queries.
- Testing the changed and impacted components.
- Communicating timely statuses to the Client about the on-goings of the enhancement and code-reviews.
- Checking END-TO-END functionality of code by considering the performance and o/p relevance with the requirements.
- Unit testing for all the solutions to ensure minimum defects.
- Support for troubleshooting and fixing defects.
- Performing smooth cycle execution from source to target to provide test/production data.
- Production batch monitoring and fixing job failures.
- Resolving the incidents raised by users.
- Execution of IA cycle for every release.
- Responding to the defects raised by QA team and Supporting QA queries.
- Providing analysis report to BA queries for next releases.IES.