- Experience in deploying and managing the multi - node Hadoop cluster with different Hadoop components (Hive, Pig, Sqoop, Oozie, Tez, Flume, Hcatalog, Hbase, Zookeeper)on Cloudera and Horton works clusters .
- Hands on experience in provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2, Rackspace and on private cloud infrastructure - OpenStack cloud platform
- Maintained Hadoop clusters for dev/staging/production
- Creation of Volumes, Security group rules, Key pairs, Floating IPs, Images and Snapshots and Deploying Instances on OpenStack
- Hands-on experience with “Productionalizing” Hadoop applications such as administration, configuration management, debugging and performance tuning
- Worked with application team via scrum to provide operational support, install Hadoop updates, patches and version upgrades as required
- Experience in performance tuning for Mapreduce, Hive andSqoop
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager
- Worked with system engineering team to plan and deploy Hadoop hardware and software environments
- Experience with cluster maintenance tasks such as Commissioning and decommissioning of nodes, cluster monitoring and troubleshooting. Manage and review Hadoop log files
- Worked on Disaster management with Hadoop cluster
- Experience installing and implementing security using SSL encryption for Hadoop clusters with Kerberos
- Strong knowledge in configuring Namenode high availability and Namenode federation
- Designed ingestion framework using flume for streaming logs and aggregated data into HDFS. Built data transform framework using MapReduce and Pig
- Designed, delivered and helped manage a device data analytics at a very large storage vendor
- Experience in benchmarking, performing backup and recovery of Namenode metadata and data residing on the cluster
- Well experienced in managing DHCP, PXE with Kickstart, DNS and NFS and used them in building infrastructure in Linux Environment and working with Puppet for application deployment
- Strong knowledge on work automation using Chef and Puppet
- Hands on experience in developing Sqoop jobs to import the data from RDBMS sources like MySQL, PostgreSQL into HDFS as well as exporting vice versa.
- Experience in writing workflows using ApacheOozie with job controllers like Hive and Sqoop.
- Excellent working knowledge of HBase and data pre-processing using FLUME-ng.
- Strong knowledge on Nosql databases Like Hbase, Cassandra and MongoDB
- Strong knowledge on implementing Kafka .
- Knowledge on Apache Tez, a MapReduce alternative to Hive for speedy execution of queries in HiveQL.
- Worked with big data teams to move ETL tasks to Hadoop
- Hands on experience with AWS based deployments on S3,EC2-Auto-scaling, Elastic Search, etcin a high available production environment.
- Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON, Web Services and XML files.
- Strong knowledge on new technologies like Spark, Shark, Docker and heavy containers to catch up with industry developments
- Effective problem solving skills and outstanding interpersonal skills. Ability to work independently as well as within a team environment. Driven to meet deadlines
Languages: C, C++, Java
Big Data Ecosystem: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop,TezOozie, Flume, ZooKeeper
Hadoop management tools: Cloudera Manager, Ambari, Ganglia, Nagios
Databases: Oracle, MySQL, SQL Server
Scripting language: shell, Python
Software Development Tool: Rational Rose, Eclipse, NetBeans
Web Development: PHP, HTML, AJAX, Java script, CSSWeb Designing Tools: Adobe Photoshop, Adobe Dream Weaver
Nosql Databases: Hbase, Cassandra
Operating Systems: Windows, Linux - Redhat, CentOS
Build Tools: Maven, ANT
Confidential, Houston, TX
- Actively participated with the development team to meet the specific customer requirements and proposed effective Hadoop solutions
- Worked closely with application developers and end-users for analyzing requirements
- Responsible for architecting both Horton works and Cloudera Clusters.
- Installed, configured and monitored Horton works Ambari and Cloudera Manager
- Configured and deployed cluster monitoring tools Nagios and Gangila.
- Responsible for Batch architecture and cluster migration planning
- Excellent knowledge On VPC security configuration and in network access for AWS
- Installed Ambari 2.0 and CDH 4.7 and CDH 5.1.3 with HA.
- Implemented high-availability for Management Services using NFS and Heartbeat
- Provided high-availability for Management Services components like - host monitor, service monitor, activity monitor, event server, reports manager
- Tested the high-availability architecture with enabling/disabling SSL, enabling/disabling Kerberos, Management Services directories as NFS mount
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive, Sqoop and a few system specific jobs
- Development of Pig scripts for handling the raw data for analysis
- Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns
- Worked extensively in writing Sqoop scripts for importing and exporting of data from RDBMS
- Responsibility of extracting the data from various sources into Hadoop HDFS for processing.
- Worked on streaming the real time log data into HDFS from web servers using flume.
- Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented Hive queries using indexes and buckets for time efficiency
- Designed and implemented Pig UDF’s for evaluation, filtering, loading and storing of data.
- Hands on experience in installing Hbase, Hregion servers and creating HBase tables .
- Integrated Cognos and Informatica to access hive tables from the cluster.
Environment: JDK1.6, CentOS, FLUME, HBase, HDFS, Maven, Spark,, Map-Reduce, Hive, Oozie,RedHat Zookeeper, sqoop, pig
Confidential, Santa Clara, CA
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
- Involved in installing Hadoop, Hive, Hbase, Zookeeper, Flume and Oozie.
- Worked closely with data analysts to construct creative solutions for their analysis tasks
- Lead end-to-end efforts to design, develop, and implement data warehousing and business intelligence solutions.
- Worked on performing major upgrade of cluster from CDH3u6 to CDH4.4.0
- Implemented Cloudera Manager on existing cluster
- Optimized our Hadoop infrastructure at both the software and hardware level
- Ensured our Hadoop clusters are built and tuned in the most optimal way to support the activities of our Big Data teams
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop
- Installed, Configured and managed Flume Infrastructure
- Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
- Configured Hive metastore to use MySQL Database, to make available all the tables created in Hive different users simultaneously
- Using HiveQL developed many queries and extracted the business required information
- Exported the business required information to RDBMS using Sqoop to make the data available for BI team to generate reports based on data
- Installed and configured Datastax Cassandra cluster
Environment: JDK1.6, CentOS, FLUME, HBase, HDFS, Maven, Redis, Cassandra, Map-Reduce, Hive, Oozie, Zookeeper.
Confidential, Kansas City, MO
- Installed and configured Cloudera Hadoop on a 24 node cluster.
- Loaded data from Oracle database into HDFS.
- Analyzed the Big Data business requirements and transformed it into Hadoop centric technologies.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Developed MapReduce pipeline jobs in Apache Crunch to process the data and create necessary HFiles.
- Loaded the created HFiles into HBase for faster access of large customer base without taking Performance hit.
- Used apache Maven for project build.
- Performed unit testing of MapReduce jobs on cluster using MRUnit.
- Used Oozie scheduler system to automate the pipeline workflow.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Implemented data serialization using apache Avro.
- Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.
- Analyzed Business Requirement Document written in JIRA and participated in peer code reviews in Crucible.
Environment: Cloudera Hadoop, MapReduce, HDFS, Crunch, HBase, Avro, Oozie, Java (jdk1.6), JIRA, Crucible, GitHub, Maven.
Linux/ Database Administrator
- Installing and maintaining the Linux servers
- Installed Cent OS using Pre-Execution environment boot and Kick start method on multiple servers
- Monitoring System Metrics and logs for any problems.
- Running cron-tab to back up data
- Adding, removing, or updating user account information, resetting passwords, etc
- Creating and managing Logical volumes. Using Java JDBC to load data into MySQL
- Maintaining the MySQL server and Authentication to required users for databases
- Installing and updating packages using YUM