- Around 7+ years of experience in IT with over around 4 years of hands - on experience as Hadoop 4+ years of experience into Linux administration and also good hands on experiences in following areas.
- Hands on experience with "Productionalizing" Hadoop applications (i.e. administration, configuration, management, monitoring, debugging, and performance tuning)
- Experience in software configuration, build, release, deployment and DevOps with Windows and UNIX based operating systems
- Installation, configuration, supporting and managing Hadoop Clusters using Hortonworks, Cloudera, MapR.
- Hadoop Cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
- Planning, Installing and Configuring Hadoop Cluster in Cloudera and Hortonworks Distributions.
- Excellent understanding of Hadoop architecture and underlying framework including storage management.
- Experience in building new OpenStack Deployment through Puppet and managing them in production environment.
- Have extensively worked on Pivotal HD (3.0) and Hortonworks (HDP 2.3), MapR, EMR and Cloudera (CDH5) distributions.
- Hands on experience in creating and upgrading Cassandra clusters
- Experience in using various Hadoop infrastructures such as MapReduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN 2.0, Scala, Spark, Kafka, Strom, Impala, Oozie, and Flume for data storage and analysis.
- Experience with Oozie Scheduler in setting up workflow jobs with MapReduce and Pig jobs.
- Knowledge on architecture and functionality of NOSQL DB like HBase, Cassandra and MongoDB.
- Extending HIVE and PIG core functionality by using custom UDF’s.
- Experience in troubleshooting errors in HBase Shell/API, Pig, Hive, Sqoop, Flume, Spark and MapReduce.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Collected logs of data from various sources and integrated into HDFS Using Flume.
- Experienced in running MapReduce and Spark jobs over YARN.
- Good understanding of HDFS Designs, Daemons and HDFS high availability (HA).
- Pleasant experience in big data analytics tools like Tableau and Trifacta.
- Setting up the Linux environments, Password less SSH, creating file systems, disabling firewalls, swappiness, Selinux and installing Java.
- Provisioning and managing multi-tenant Hadoop clusters on public cloud environment - Amazon Web Services (AWS)-EC2 and on private cloud infrastructure - Open Stack cloud platform.
- Implementing a Continuous Integrations and Continuous Delivery framework using Jenkins, Puppet, and Maven & Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns/Release, Git, Confluence, Jira and Cloud Foundry.
- Hands on experience in Zookeeper in managing and configuring in Name Node failure scenarios.
- Worked on Hadoop Security with MIT Kerberos, Ranger with LDAP.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain.
- Extensive experience in data analysis using tools like Sync sort and HZ along with Shell Scripting and UNIX.
- Experience with writing Oozie workflows and Job Controllers for job automation.
- Researching modern technologies and alternative methods of efficiency.
Programming Languages: Java, Scala
Shell Scripting: Bash, Python, Standby/Failover Administration
Hadoop Distributions: Hortonworks, Cloudera, Virtualization in VMware ESXi 6.
No SQL Databases: Hbase.
Big Data Ecosystems: YARN, MapReduce, HDFS, Hive, Mahout, Pig, Sqoop, Kafka, Zookeeper, Oozie, Spark.
Sr. Hadoop Admin
Confidential, Sunnyvale, CA
- Created Hive tables and worked on them utilizing Hive QL.
- Developed Spark scripts by using Scala shell commands as per the requirement to read/write JSON files.
- Analyzed the data by performing Hive queries and running Pig scripts to know client conduct.
- Used COBOL, so, by migrating or offloading from mainframe to Hadoop.
- Strong experience working with Apache Hadoop Including creating and debug production level jobs.
- Analyzed Complex Distributed Production deployments and made recommendations to optimize performance.
- Driven HDP POC's with various lines of Business successfully.
- Cloudera distribution of MR1 to MR2.
- Configuration Memory setting for YARN and MRV2.
- Design and develop Automated Data archival system using Hadoop HDFS. The system has
- Configurable limit to set archive data limit for efficient usage of disk space in HDFS.
- Configure Apache Hive tables for Analytic job and also create Hive QL scripts for offline Jobs.
- Designed Hive tables for partitioning and bucketing based on different use cases.
- Develop UDF to enhance Apache Pig and Hive features for client specific data filtering
- Designed and implemented a stream filtering system on top of Apache Kafka to reduce stream size.
- Written Kafka Rest API to collect events from Front end.
- Implemented Apache Ranger Configurations in Hortonworks distribution.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Managed log files, backups and capacity.
- Found and troubleshot Hadoop errors.
- Created Ambari Views for Tez, Hive and HDFS.
- Architecture and designed Hadoop 30 nodes Innovation Cluster with SQRRL, SPARK, Puppet, HDP 2.2.4.
- Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
- Managed 350+ Nodes HDP 2.2.4 cluster with 4 petabytes of data using Ambari 2.0 and Linux Cent OS 6.5.
- Complete end to end design and development of Apache Nifi flow which acts as the agent between middleware team and EBI team and executes all the actions mentioned above.
- Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Upgraded the Hadoop cluster from CDH4.7 to CDH5.2.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Continuous monitoring and managing EMR cluster through AWS Console.
Environment: Hive, MR1, MR2, YARN, Pig, HBase Apache Nifi, PL/SQL, Hive, Mahout, Java, Unix Shell scripting, Sqoop, ETL, Ambari 2.0, Linux Cent OS, HBase, MongoDB, Cassandra, Chef, Rhel Ganglia and Cloudera Manager.
Sr. Hadoop/Cassandra Administrator
Confidential - San Jose, CA
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL. Coordinated with business customers to gather business requirements.
- Migrate huge volumes of data from various semi-structured sources and RDBMs using COBOL mainframe workloads to Hadoop.
- Writing Scala User-Defined Functions (UDFs) to solve the business requirements.
- Creating the Case Classes.
- Working with the Data Frames and RDD's.
- Install and maintain the Hadoop Cluster and Cloudera Manager Cluster.
- Manually upgrading and MRV1 installation with Cloudera manager.
- Importing and exporting data into HDFS from database and vice versa using Sqoop.
- Responsible for managing data coming from different sources.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hbase database and Sqoop.
- Load and transform large sets of structured and semi structured data.
- Collecting and aggregating large amounts of log data using Apache and staging data in HDFS for further analysis.
- Analyzed data using Hadoop components Hive and Pig.
- Involved in running Hadoop streaming jobs to process terabytes of data.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in writing Hive/Impala queries for data analysis to meet the business requirements.
- Worked on streaming the analyzed data to the existing relational databases using Sqoop for making it available for visualization and report generation by the BI team.
- Involved in creating the workflow to run multiple Hive and Pig jobs, which run independently with time and data availability.
- Developed Spark scripts by using Python as per the requirement.
- Developed Pig Latin scripts for the analysis of semi structured data.
- Working in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Assembled Puppet Master, Agent and Database servers on Red Hat Enterprise Linux Platforms.
- Implementation in Hive and its components and troubleshooting if any issues arise with Hive. Published Hive LLAP in development environment.
- Built, stood up and delivered Hadoop cluster in Pseudo distributed Mode with Name Node, Secondary Name node, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulo(NO SQL Google's Big table) is stood up in Single VM environment.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Addressed Data Quality Using Informatica Data Quality (IDQ) tool.
Environment: Hadoop, MRV1, YARN, Cloudera Manager, HDFS, Hive, Chef, Rhel, Pig, HBase, Sqoop, SQL, Java (jdk 1.6), Eclipse, Python.
Confidential, Dallas, TX
- Experience in managing scalable Hadoop cluster environments.
- Involved in managing, administering and monitoring clusters in Hadoop Infrastructure.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Experience in HDFS maintenance and administration.
- Managing nodes on Hadoop cluster connectivity and security.
- Experience in commissioning and decommissioning of nodes from cluster.
- Experience in Name Node HA implementation.
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud-based servers.
- Working with data delivery teams to setup new Hadoop users.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive and pig jobs.
- Configured Metastore for Hadoop ecosystem and management tools.
- Hands-on experience in Nagios and Ganglia monitoring tools.
- Experience in HDFS data storage and support for running Map Reduce jobs.
- Performing tuning and troubleshooting of MR jobs by analyzing and reviewing Hadoop log files.
- Installing and configuring Hadoop eco system like Sqoop, Pig, Flume, and Hive.
- Maintaining and monitoring clusters. Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Experience in using distcp to migrate data between and across the clusters.
- Installed and configured Zookeeper.
- Hands on experience in analyzing Log files for Hadoop eco system services.
- Coordinate root cause analysis efforts to minimize future system issues.
- Troubleshooting of hardware issues and closely worked with various vendors for Hardware/OS and Hadoop issues.
Environment: Cloudera4.2, HDFS, Hive, Pig, Sqoop, HBase, Chef, Rhel, Mahout, Tableau, Micro strategy, Shell Scripting, RedHat Linux.
Confidential - Miami, FL
- Worked as Administrator for Monsanto's Hadoop Cluster (180 nodes).
- Performed Requirement Analysis, Planning, Architecture Design and Installation of the Hadoop cluster.
- Tested and Benchmarked Cloudera and Hortonworks distributions for efficiency.
- Suggested and implemented best practices to optimize performance and user experience.
- Implemented Cluster Security using Kerberos and HDFS ACLs.
- Imported and exported data into HDFS and Hive using Sqoop.
- Monitoring Hadoop cluster using tools like Nagios, Ganglia and Cloudera Manager.
- Managed and reviewed Hadoop Log Files.
- Setup data authorization roles for Hive and Impala using Apache Sentry.
- Improved the Hive Query performance through Distributed Cache Management and converting tables to ORC format.
- Configured TLS/SSL based data transport encryption.
- Monitoring Job performance and doing analysis.
- Monitoring, troubleshooting and reviewing of Hadoop log files.
- Adding and decommission of nodes from the cluster.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Transfer the data from HDFS TO MongoDB using pig, hive and Map reduce scripts and visualize the streaming data in dashboard tableau.
- Managed and reviewed Hadoop log files, file system management and monitoring Hadoop cluster capacity planning.
- Hands on experience in working with ecosystem like Hive, Pig scripts, Sqoop, MapReduce, YARN, and zookeeper. Strong knowledge of hive's analytical functions.
- Written Flume configuration files to store streaming data in HDFS.
- As a admin involved in Cluster maintenance, troubleshooting, Monitoring and followed proper backup & Recovery strategies.
- Do analytics using map reduce, hive and pig in HDFS and sends back those results to MongoDB databases and update information in collections.
- Used Restful Web Services API to connect with the MapR table. Involved to create connection to Database was developed through restful web services API.
- Currently working as Hadoop Admin and responsible for taking care of everything related to the clusters total of 100 nodes ranges from POC to PROD clusters.
- Built Cassandra Cluster on both the physical machines and on AWS Automated Cassandra Builds/installation/monitoring etc.
- Involved in loading data from UNIX file system to HDFS. And created custom Solr Query components to enable optimum search matching.
- Consulted with the operations team on deploying, migrating data, monitoring, analyzing, and tuning MongoDB applications.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration. Worked on YUM configuration and package installation through YUM.
- Installed and configured CDH-Hadoop environment and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Install KAFKA on Hadoop cluster and configure producer and consumer coding part in java to establish connection from twitter source to HDFS with popular hash tags.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE. PIG Latin scripts for the analysis of semi structured data.
- Involved in Requirement analysis, Design, Development, Data Design and Mapping, extraction, validation and creating complex business requirements.
Environment: CDH4.7, Hadoop-2.0.0 HDFS, MapReduce, MongoDB-2.6, Hive-0.10, Sqoop-1.4.3, Oozie-3.3.4, Zookeeper-3.4.5, Hue-2.5.0, Jira, Web Logic 8.1 Kafka, Yarn, Impala, Chef, Rhel, Pig, Scripting, MySQL, Red Hat Linux, CentOS and other UNIX utilities.
- Responsible for the implementation and on-going administration of Hadoop infrastructure including the installation, configuration and upgrading of Cloudera & Hortonworks distribution of Hadoop.
- File system, cluster monitoring, and performance tuning of Hadoop ecosystem.
- Planning and controlling change.
- Resolve issues involving map reduce, yarn, sqoop job failures;
- Analyze multi-tenancy job execution issues and resolve.
- Design and manage backup and disaster recovery solution for Hadoop clusters.
- Work on Unix operating systems to efficiently handle system administration tasks related to Hadoop clusters
- Manage the Apache Kafka environment.
- Participate and manage the data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra and Mongodb.
- Evaluate the administration and operational practices, and evolve automation procedures (Using scripting languages such as Shell, Python, Chef, Puppet, Ruby, Rhel etc.)
- Worked with data delivery teams to setup new Hadoop users. Includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and Map Reduce access for the new users.
- Enabling security/authentication using Kerberos and ACL authorizations using Apache Sentry.
- Create and document best practices for Hadoop and big data environment.
- Participate in new data product or new technology evaluations; manage the certification process and evaluate and implement new initiatives in technology and process improvements.
- Interact with Security Engineering to design solutions, tools, testing and validation for controls.
- Advance the cloud architecture for data stores;
- Work with engineering team with automation;
- Help operationalize Cloud usage for databases and for the Hadoop platform.
- Engage vendors for feasibility of new tools, concepts and features, understand their pros and cons and prepare the team for rollout.
- Analyze vendor suggestions/recommendations for applicability to environment and design implementation details.
- Perform short and long-term system/database planning and analysis as well as capacity planning.
- Integrate/collaborate with application development and support teams on various IT projects.
Environment: Cloudera CDH, Hortonworks HDP, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Kafka, Storm, Kerberos, DataGuise, Tableau.
Linux System Admin
- Installing and maintaining the Linux servers.
- Responsible for managing RedHat Linux Servers and Workstations.
- Create, modify, disable, delete UNIX user accounts and Email accounts as per FGI standard process.
- Quickly arrange repair for hardware in occasion of hardware failure.
- Patch management, Patch updates on quarterly basis.
- Setup securities for users and groups and firewall intrusion detection systems.
- Add, delete and Modify UNIX groups using the standards processes and resetting user passwords, Lock/Unlock user accounts.
- Effective management of hosts, auto mount maps in NIS, DNS and Nagios.
- Monitoring System Metrics and logs for any problems.
- Security Management, providing/restricting login and sudo access on business specific and Infrastructure servers & workstations.
- Running crontab to back up data and troubleshooting Hardware/OS issues.
- Involved in Adding, removing, or updating user account information, resetting passwords etc.
- Maintaining the RDBMS server and Authentication to required users for databases.
- Handling and debugging Escalations from L1 Team.
- Took Backup Confidential regular intervals and planned with a good disaster recovery plan.
- Correspondence with Customer, to suggest changes and configuration for their servers.
- Maintained server, network, and support documentation including application diagrams.
Environment: Oracle, Shell, PL/SQL, DNS, TCP/IP, Apache Tomcat, HTML and UNIX/Linux.