- Certified administrator and having 8+ years of experience in IT industry in Software development and Integration of various applications including experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Hbase)
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop, MapReduce, HDFS, Hbase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
- Good Understanding of the Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop and Hbase)
- Technical expertise in Big data/Hadoop HDFS, Map Reduce, Spark, HIVE, PIG, Sqoop, Flume, Oozie, NoSQL Data bases HBase, SQL, Unix Scripting.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
- Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera, and Confidential .
- Around 4 years of working experience in setting, configuring and monitoring of Hadoop cluster of Cloudera, Confidential distribution.
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Installation, Configuration, and Administration of Hadoop cluster of major Hadoop distributions such as Cloudera Enterprise (CDH3 and CDH4) and Data Platform (HDP1 and HDP2)
- Extensively used Pig for data cleansing. Proficient work experience with NOSQL, Monod databases.
- Worked on implementing and integrating in NoSQL databases like HBase.
- Proficient in configuring Zookeeper, Flume to the existing Hadoop cluster.
- Strong Knowledge on Spark concepts like RDD Operations, Caching and Persistence.
- Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- In depth knowledge of Job Tracker, Task Tracker, Name Node, Data Nodes and MapReduce concepts.
- Created Hive tables to store data into HDFS.
- Worked with NoSQL databases like Hbase and Mongo DB for POC purpose.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Exceptional organizational, multi - tasking, problem solving and leadership skills with result-oriented attitude.
- Effectively work independently and in team with users, project managers, business analysts and developers.
- Ability to interact with developers and product analysts regarding issues raised and following up with them closely.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented with problem solving and leadership skills.
Hadoop/Big Data Technologies: HDFS, Map Reduce, Hbase, Pig, Hive, Sqoop, Flume, Cassandra, OozieZookeeper, YARN, Kerberos, Ambariand Talend.
Programming Languages: SQL, JAVA, Python, Pig Latin
Web Technologies: HTML, XML, Ajax, SOAP
Database and Tools: Eclipse, MySQL, MS SQL Server, Oracle 10g DB2, NoSQL HBase, HDFS, Cassandra, MongoDB.
Operating Systems: Linux, Unix, Windows, Mac, CentOS
Confidential - San Ramon, CA
- Installed and configured Hadoop ecosystem like HBase, Flume.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Managed and reviewed Hadoop Log files.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Kerberos/LDAP skills Falcon, Ranger, Knox, Ambary - Deep knowledge w/practical exp.
- Experience using Confidential platform and their eco systems. Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hive and Flume.
- Experience on Apache Knox Gateway security for Hadoop Clusters.
- Developed Shell and Python scripts to automate the jobs
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Hive.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments
- Involved in installing Hadoop Ecosystem components.
- Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at Enterprise level.
- Responsible to manage data coming from different sources.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Building massively scalable multi-threaded applications for bulk data processing primarily with Apache Spark and PIG on Hadoop.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
Environment: Hadoop, MapReduce, Hbase, Tez, Hive, Pig, Sqoop, HDP 2.6, HDFS, Talend.
Confidential, Waukegan, IL
- Worked on implementation of SSL /TLS implementation.
- Configuration of SSL and trouble shooting in Hue.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera works
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Enabled Kerberos for authorization and authentication.
- Enabled HA for NameNode, Resource Manager, Yarn Configuration and Hive Megastore.
- Configured Journal nodes and Zookeeper Services for the cluster using Cloudera.
- Monitored Hadoop cluster job performance and capacity planning.
- Monitored and reviewed Hadoop log files.
- Performed Cloudera Manager and CDH upgrades
- Taking backup of Critical data, Hive data and creating snapshots.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster
- Monitoring and troubleshooting, and review Hadoop log files.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extraction data using Flume. Import/Export to HDFS/RDMS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Good Knowledge of NoSQL database like HBase.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive.
- Performance tuning of Impala jobs and resource management in cluster.
Environment: MapReduce, HDFS, Hive, SQL, Oozie, Sqoop, UNIX Shell Scripting, Yarn, Talend.
Confidential - Houston, TX
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper, Cassandra and Sqoop.
- Implemented High Availability Name Nodes using Quorum Journal Managers and Zookeeper Failover Controllers.
- Managed 350+ Nodes HDP 2.3 cluster with 4 peta bytes of data using Ambari 2.0 and Linux Cent OS 7.
- Familiar with Hadoop Security involving LDAP, Kerberos, Ranger.
- Strong experience using Ambary administering large Hadoop clusters > 100
- After the transformation of data is done, this transformed data is then moved to Spark cluster where the data is set to go live on to the application using Spark streaming and Kafka.
- Configure LDAP User Management Access
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System)
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Set up Kerberos locally on 5 node POC cluster using Ambari and evaluated the performance of cluster, did impact analysis of Kerberos enablement.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Tuned complex SQL queries and debugged Talend job code for performance enhancement.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using sqoop.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Data analysis in running Hive queries.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
Environment: Hive, Pig, HBase, Zookeeper, Cassandra, Horton works HDP 2.3, Python, Scala, Kafka, Spark, Talend, shell scripts, Flume, Sqoop, Oracle, Talend, BODS and HDFS.
Confidential - El Segundo, CA
- Handle the installation and configuration of a Hadoop cluster.
- Build and maintain scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
- Handle the data exchange between HDFS and different web sources using Flume and Sqoop.
- Monitor the data streaming between web sources and HDFS.
- Monitor the Hadoop cluster functioning through monitoring tools.
- Close monitoring and analysis of the MapReduce job executions on cluster at task level.
- Inputs to development regarding the efficient utilization of resources like memory and CPU utilization based n the running statistics of Map and Reduce tasks.
- Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
- Handle the upgrades and Patch updates.
- Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
- Responsible for building scalable distributed data solutions using Hadoop.
- Commission or decommission the data nodes from cluster in case of problems.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on Name node and Secondary name node.
- Set up and manage HA Name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters.
- Set up the checkpoints to gathering the system statistics for critical set ups.
- Discussions with other technical teams on regular basis regarding upgrades, Process changes, any special processing and feedback.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Cloudera.
- Install and maintain all server hardware and software systems and administer all server performance and ensure availability for same.
- Monitor everyday systems and evaluate availability of all server resources and perform all activities for Linux servers.
- Maintain and monitor all system frameworks and provide after call support to all systems and maintain optimal Linux knowledge.
- Perform tests on all new software and maintain patches for management services and perform audit on all security processes.
- Gathered test data requirements for data conditioning from Business Units to test total application functionality.
- Developed Automation Scripts using shell scripts to check the log files size, and report the application.
- Responsible in writing the cron jobs to start the processes at regular intervals.
- Involved in Database Testing Using SQL to pull data from database and check whether it matches with GUI.
Environment: RedHat Linux 6.3, MySQL, VMware, Shell, Perl.