- Over 7+ years of IT experience including 5 years of experience with Hadoop Ecosystem in installation and configuration of different Hadoop eco - system components in the existing cluster.
- Experience in Hadoop Administration (HDFS, MAP REDUCE, HIVE, PIG, SQOOP, FLUME, STORM, OOZIE, IMPALA and HBASE) and NoSQL Administration.
- Expertise in Red Hat Satellite Server installation, Red Hat Linux Kick start, System Imager, SUSE Autoyast and Jumpstart in Solaris.
- Expertise in Hadoop Stack, ETL TOOLS like Tableau and Security like Kerberos. User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
- Experience in installing and configuring the Zookeeper to co-ordinate the Hadoop daemons.
- Expertise in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4, and CDH5) distributions and Hortonworks (HDP).
- Experience in administering, installation, configuration, supporting and maintaining Hadoop cluster using Cloudera, Hortonworks and Mapr distributions.
- Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms.
- Experience with web server configuration like Apache & Web logic on both Linux & UNIX.
- Expertise in Commissioning and decommissioning the nodes in Hadoop Cluster.
- Experience in working on apache Hadoop open source distribution with technologies like HDFS, Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB.
- Expertise in SQL Server Analysis Services.
- Experience in upgrading SQL server software to new versions and applying service packs and patches.
- Installed and configured Apache Hadoop, Hive and Pig environment on Amazon EC2 and assisted in designing, development and architecture of Hadoop and HBase systems
- Good System Performance experience by tuning SQL queries and stored procedures by using SQL Profiler, Database Engine Tuning Advisor.
- Supported Web Sphere Application Server WPS, IBM HTTP/ Apache Web Servers in Linux environment for various projects.
- Expertise and Interest include Administration, Database Design, Performance Analysis, and Production Support for Large (VLDB) and Complex Databases.
- Supported geographically diverse customers and teams in a 24/7 environments.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Hands on experience in administering Linux systems to deploy Hadoop clusters and monitor using Ambari.
- Well versed in importing structured data from RDBMSs like MySQL and Oracle into HDFS and Hive using SQOOP.
- Familiar with Oozie Job Controllers for job automation.
- Experience in System administration of RedHat Enterprise Linux and SUSE Linux.
- Knowledge in shell scripting, Python and Ansible ability for automation.
- Hardware installation and maintenance, software integration and packaging.
- Expertise in performance analysis, troubleshooting, and debugging.
- Experience with web server configuration like Apache & Web logic on both Linux & UNIX.
- Experienced in writing the automatic scripts for monitoring the file systems, key MapR services.
- Strong Knowledge on Hadoop file formats Avro, parquet.
- Team player with strong analytical, technical negotiation and client relationship management skills
Big Data Technologies: Hadoop, HDFS, Hive, Cassandra, Pig, Scoop, Falcon, Flume, Zookeeper, Yarn, Mahout, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4.
Distributions: Cloudera, Hortonworks, MapR
Monitoring Tools: Cloudera Manager, Ambari, Nagios, Ganglia
Testing: Capybara, Web Driver Testing Frameworks RSpec, Cucumber, Junit, SVN
Server: WEBrick, Thin, Unicorn, Apache, AWS
Operating Systems: Linux RHEL/Ubuntu/CentOS, Windows (XP/7/8/10)
Database& NoSql: Database Systems Oracle 11g/10g, DB2, SQL, My SQL, HBASE, MongoDB, Cassandra
Scripting & security: Shell Scripting, HTML Scripting, Python, Kerberos, Dockors
Security: Kerberos, Ranger, Sentry
Other tools: Redmine, Bugzilla, JIRA, Agile SCRUM, SDLC Waterfall.
Confidential, San Francisco, CA
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, Sqoop, Spark, Oozie and Impala with Cloudera distribution.
- Installed and configured Big Data clusters on Open Stack Tenant.
- Worked on Classic and Yarn distributions of Hadoop like the Apache Hadoop 2.0.0, ClouderaCDH4 and CDH5.
- Led "Devops" initiative to create automated infrastructure and deployment capabilities for cloud- and hadoop- based solutions.
- Here we are administrating 140 node hadoop clusters with the use of Cloudera manager and various hadoop eco system technologies.
- Installed and configured Big Data clusters on Open Stack Tenant.
- Build end-end solution to store unstructured data like images, pdf's into Hadoop and Hbase and render the data back to different web applications Using REST API.
- Used HIPI an image processing library designed to be used with the Apache Hadoop MapReduce parallel programming framework to efficiently process the claim related images with MapReduce style parallel programming.
- Assisted in creation of ETL process for transformation of data sources from existing RDBMS systems.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Involved in managing and reviewing Hadoop log files.
- Building POD utilizing Informatica to S3, ETL via Apache Spark(PySpark)/Hive/EMR, orchestration via Data Pipeline/ Lambda, and landing to Redshift.
- Migration from different databases Oracle, DB2, Teradata to Hadoop.
- Transformed the ABintio Process into Hadoop using PIG and HIVE.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows. Also Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Commissioning and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Day-to- day - user access, permissions, Installing and Maintaining Linux Servers.
- Worked on automation for provisioning system resources using puppet.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Worked with highly transactional merchandise and investment in SQL databases with PCI, HIPAA compliance involving data encryption with certificates and security keys Confidential various levels.
- Used Spark SQL (with Hive for metastore in AWS Aurora RDS database) to process large data sets.
- Managed and reviewed Hadoop Log files as a part of administration for troubleshooting purposes. Communicate and escalate issues appropriately.
- Added new Data Nodes when needed and re-balancing.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data to HDFS from various sources.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Responsible on - boarding new users to the Hadoop cluster (adding user a home directory and providing access to the datasets).
- Deep understanding of monitoring and troubleshooting mission critical Linux machines.
Confidential, Nashua, NH
- Performed hadoop installation, Configuration of multiple nodes in AWS EC2 using Horton works platform.
- Installed Kerberos secured kafka cluster with no encryption on POC vms also set up Kafka ACL's
- Involved in the process of Cassandra data modelling and building efficient data structures
- Worked on the ~350*2 and ~200*2 node production cluster and also for various environments like DEV, QA, data centers, which is the backend infrastructure for the storage of Transactional data.
- Maintained and Monitored Hadoop and Linux Servers.
- Created VM's in Oracle Virtual Manager.
- Created POC on Hortonworks and suggested the best practice in terms HDP, HDF platform, NIFI
- Provisioning, installing, configuring, monitoring, and maintaining HDFS, Yarn, HBase, Flume, Sqoop, Oozie, Pig, Hive, Ranger, Falcon, Smartsense, Storm, Kafka.
- Setup and optimize Standalone Clusters, Pseudo-Distributed Clusters, and Distributed Clusters.
- Familiar with MongoDB write concern to avoid loss of data during system failures.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Manage and review data backups, Manage and review Hadoop log files on Hortonworks, MapR and Cloudera clusters
- Implemented Name Node backup using NFS.
- Performed various configurations, which includes, networking and IP Table, resolving hostnames, user accounts and file permissions, SSH Keyless login.
- Building Hadoop-based big data enterprise platforms coding in python and devops with Chef and Ansible
- Worked with the Linux administration team to prepare and configure the systems to support Hadoop deployment.
- Evaluated Hortonworks NiFi (HDF 2.0) and recommended solution to inject data from multiple data sources to HDFS & Hive using NiFi.
- Created volume groups, logical volumes and partitions on the Linux servers and mounted file systems on the created partitions
- Used MongoDB third party tools (ROBO MONGO, MONGOOWL, MONGOVUE) and mongo-built in binaries to monitor and analyze the performance of MONGODB.
- Worked with different Hbase copying mechanisms like export/import, snapshots and copy table.
- Optimized the full text search function by connecting Mongo DB and Elasticsearch.
- Utilized AWS framework for content storage and Elasticsearch for document search.
- Implemented capacity schedulers on the Job Tracker to share the resources of the cluster for the Map Reduce jobs given by the users.
- Helped develop MapReduce programs and define job flows.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- Designed and implemented Mapreduce-based large-scale parallel relation-learning system
- Managed and reviewed Hadoop log files.
- Supported/Troubleshoot MapReduce programs running on the cluster.
- Loaded data from Linux/UNIX file system into HDFS.
- Installed and Configured Hive and wrote Hive UDFs.
- Set Cassandra backups using snapshot backups.
- Created tables, loaded data, and wrote queries in Hive.
- Worked on different storage engines in Mongo.Managing Mongo databases using MMS monitoring tool.
- Monitored cluster using Ambari and optimize system based on job performance and criteria.
- Managed cluster through performance tuning and enhancements.
- Worked with Developer teams to move data in to HDFS through HDF NiFi .
- Worked on cluster maintenance that includes commissioning & decommissioning of datanode, namenode recovery, capacity planning, slots configuration, YARN tuning, Hue High Availability, Load Balancing of both Hue and Hiveserver2.
Environment: Horton works 2.3.4, Ambari 2.2, HDFS, MapReduce, Yarn, Hive, PIG, Zookeeper, TEZ, MongoDB, MYSQL, and Centos 6.6.
Confidential, Los Angeles, CA
- Configuring various components such as HDFS, YARN, MapReduce (MR1 & MR2), Sqoop, Hive, Zookeeper, Sentry.
- Mastered major Hadoop distributes like Horton Works and Cloudera numerous Open Source projects and prototype various applications that utilize modern Big Data tools.
- Build Apache Kafka Multinode Cluster and used Kafka Manager to monitor multiple Clusters.
- Decommissioning and commissioning the Node on running cluster including Balancing HDFS block data.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters. Involved in Installing and configuring Kerberos for the authentication of users and Hadoop daemons.
- Developing and Maintaining Hive queries (Hql), Latin pig and Hbase queries in CLI (opsware) and GUI (Hue).
- Working knowledge in configuring Apache NiFi on kerberized cluster.
- Worked in deploying, managing and developing MongoDB clusters on Linux and Windows environment
- Worked with Big Data team responsible for building Hadoop stack and different big data analytic tools, migration from RDBMS to Hadoop using Sqoop.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring and Troubleshooting.
- Manage and review Hadoop log files and Log cases with Cloudera Manager.
- Configured Fair Scheduler to provide service-level agreements for multiple users of a cluster.
- Installed/Configured/Maintained Horton works Dataflow tools such as NIFI
- Experience in managing Hadoop Clusters, Hive, Hbase and Solr
- Adding and removing users from the group roles and availing set of permissions to the specified group roles.
- I successfully set up a no authentication kafka listener in parallel with Kerberos (SASL) Listener. Also I tested non authenticated user (Anonymous user) in parallel with Kerberos user.
- Implemented Apache NiFi processors for end to end process of ETL- Extraction, Transformation and Loading data files.
- Configured Data Disaster recovery for cluster.
- Experienced in Designing Hadoop Architecture AWS Cloud and with Production ready features like High Availability, Scalability, and Security.
- Sqoop configuration to import RDBMS data to Cluster.
- To troubleshoot, diagnose and solve the Hadoop issues and making sure that they do not occur again.
- Experienced in installing hadoop in the Amazon cloud and using Amazon cloud Product like EC2, S3, and EBS.
- Hands-on experience with installing Kerberos Security and setting up permissions, set up Standards and Processes for Hadoop based application design and implementation.
- Report generation of running nodes using various benchmarking operations.
- Backup and Recovery task by creating Snapshots Policies, Backup Schedules and recovery from node failure.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Performance tuning of Kafka, Storm Clusters. Benchmarking Real time streams
- Worked with Big Data distributions like Cloudera (CDH 3 and 4) with Cloudera Manager.
- Creating Pipeline between Cloudera (CDH) and AWS S3.
- Integrating HUE Browser with Active Directory.
- Configured Trash and recovery and Setting Quota for the Users.
Environment: Hadoop, Hive, MapReduce, kafka, MapR, Cloudera, Amazon Web Services (AWS), Ansible, S3, nNoSQL, HDFS, UNIX, Redhat and CentOS.
Confidential, San Jose, CA
- Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on Hadoop architecture, strategy and the design of the system based on requirement.
- Manage the planning, installations and regular maintenance of clusters and nodes (Maintenance of Name node(s), Resource Manager and Job History Server and their daemons as well as Data nodes and Node managers).
- Cluster configurations (including HA for Name nodes), security, job monitoring and troubleshooting.
- Design securing the system using Kerberos authentication in alignment with existing AD/LDAP and all other services. This also includes role based user authorisation using tools like apache sentry and ACL setup.
- Data integration, moving data in (and out) of HDFS (data ingestion) using Flume ingestion from disparate sources, streaming data, Sqoop import/export to HDFS and traditional RDBMS. Storing and retrieving data in various formats (text file, json file, avro datasets, parquet file and sequence files).
- Worked on Hadoop Stack, ETL TOOLS like TALEND, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other Big Data technologies for multiple use cases.
- Expertise in Hadoop, MapReduce and Yarn (MR2) resource management.
- Perform analysis and review of system and memory dumps, log files and performance tuning. Monitor issues with Linux memory, CPU, OS, storage and network, cluster monitoring and troubleshooting.
- Use various NOSQL platforms and managed wide range of the Hadoop ecosystem on Cloudera. (CDH5).
- Design and implementation of backup strategies.
- I participated in the formulation of procedures for installation of Hadoop patches, updates and version upgrades and capacity planning.
- Use shell scripts to perform optimisation, automation and other tasks.
- Commissioning and Decommission of DataNodes and performing admin tasks like cluster rebalancing, running fsck and dfsadmin/rmadmin tasks and scheduling regular block scanner.
- Use necessary tools to perform ETL into the hadoop systems.
- Participated in evaluation and selection of new technologies to support system efficiency.
- Configured various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
- Performed data completeness, correctness, data transformation and data quality testing.
- Analyzing system failures, identifying root causes, and recommended course of actions.
- Documented the systems processes and procedures for future references.
- Resolve trouble tickets submitted by users, troubleshoot the error and document solutions.
- Use Ansible for cluster-wide OS deployments and configurations.
Environment: Hive, Pig, HBase, Zookeeper, Sqoop, ETL, AWS, ETL, Ansible, Data warehousing, Impala, Ambari 2.0, Linux Cent OS, HBase,, MR, Cloudera, Puppet, Ambari, Kafka, Ganglia, Agile/scrum.
- Responsible for designing, implementing, troubleshooting and administration of servers, desktops, peripherals and systems on Windows NT, Linux, on Local Area Networks.
- Setting up and administration of new and existing network infrastructures which includes LAN, Extended LAN and Wireless networks.
- Server operating system installation, management of RHEL5, CentOs5 and Ubuntu.
- Desktop support for Dell, Compaq and IBM laptops and PCs in a Windows environment.
- Active Directory account creations/deletions and permissions management.
- Exchange mailbox creations.
- Involved in installation and configuration of Linux / UNIX Servers and Work Stations distributions include Red Hat.
- Responsible for installation, configuration, and administration of Red Hat Linux, Solaris systems.
- Monitoring and administering Sun-Solaris servers and Red hat Linux servers.
- Provided VPN connectivity support.
- Asset management and redeployment.
- Worked on Puppet configuration management tool
- Installation of Windows Operating systems XP, Vista, Win7.
- Configure and troubleshoot Microsoft Exchange and Outlook user accounts.
- Configuration of routers, switches, Wi-Fi connection etc.
- Installation of, Network printer, switches and Router also taking care of trouble shooting.
- Installation of Windows Operating systems XP, Vista, Windows 7, 2003 server, 2008 R2 server.
- Troubleshooting Network sharing, Laptop, SMPS, Monitors, Hard disk, Ram etc.
- Recovery of Passwords & Auto data backup schedule.
- Use various virus removal techniques to clean the infected computers as per user's request.
- Assist queries on internet services and Microsoft outlook email configurations.
- Setup and maintained NFS, NIS, and TCP/IP network, configured the systems for TCP/IP Networking with the existing LAN, setting up SSH and SCP features between SUN systems and other Red Hat/UNIX hosts.
- Assist users in setting up various models of network machines such as modems & routers.
- Microsoft Outlook user account password reset
Environment: Red Hat Enterprise Linux 4.x, 5.x, Sun Solaris 8, 9, 10, VERITAS Volume Manager, Oracle 11G, Samba, Oracle RAC/ASM,EMC Power path, DELL PowerEdge 6650, HP Proliant DL 385, 585, 580, Sun Fire v440, SUN BLADE X6250, X6270.