- Around 8 years of experience in Administration and implementations of robust technology systems, with specialized expertise in Hadoop Administration, Big Data and Linux Administration.
- Experience in using various Hadoop infrastructures such as Map Reduce, Pig, Hive, Zookeeper, HBase, Sqoop, YARN, Spark, Kafka, Oozie, and Flume for data storage and analysis.
- Experience in deploying and managing the Hadoop cluster using Cloudera Manager and Apache Ambari.
- Installed and configured various Hadoop distributions like CDH - 5.7, 5.9, 5.10 and HDP 2.2 and higher versions.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, Troubleshooting.
- Supporting Hadoop developers and assisting in optimization of map reduce jobs, Pig Latin scripts, Hive Scripts and HBase ingest required.
- Collected logs of data from various sources and integrated into HDFS Using Flume and Sqoop.
- Experienced in running MapReduce and Spark jobs over YARN.
- Excellent understanding of Hadoop Cluster security and implemented secure Hadoop cluster using Kerberos.
- Setting up automated 24x7 monitoring and escalation infrastructure for Hadoop cluster using Nagios and Ganglia.
- Extensive experience in installing, configuring and administrating Hadoop cluster for major Hadoop distributions like CDH5 and HDP.
- Experience in Sentry, Ranger and Knox configuration to provide the security for Hadoop components.
- Good experience on Design, configure and manage the backup and disaster recovery for Hadoop data.
- Hands on experience in analyzing Log files for Hadoop and eco system services and finding root cause.
- Experience in understanding the security requirements for Hadoop and integrating with Kerberos authentication infrastructure- KDC server setup, creating realm /domain, managing.
- Experience in writing scripts for Automation and used automation tools like puppet and chef.
- Experience in setting up of Hadoop cluster in cloud services like AWS and Azure.
- Knowledge on AWS services such as EC2, S3, Glaciers, IAM, EBS, SNS, SQS, RDS, VPC, Load Balancers, Auto scaling, Cloud Formation, Cloud Front and Cloud Watch.
- Experience in Linux System Administration, Linux System Security, Project Management and Risk Management in Information Systems.
- Capable of managing multiple projects simultaneously, comfortable troubleshooting and debugging and able to work under pressure.
- Involved in Cluster maintenance, bug fixing, trouble shooting, Monitoring and followed proper backup & Recovery strategies.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Management of security in Hadoop Clusters using Kerberos, Ranger, Knox, Acl's.
- Excellent experience in Shell Scripting.
Big Data Technologies: HDFS, Hive, Map Reduce, Cassandra, Pig, HCatalog, Phoenix, Falcon, Scoop, Zookeeper, Mahout, Flume, Oozie, Avro, HBase, MapReduce, HDFS, Storm, CDH 5.3, CDH 5.4
Scripting Languages: Shell Scripting, Puppet, Scripting, Python, Bash, CSH, Ruby, PHP
Databases: Oracle 11g, MySQL, MS SQL Server, Hbase, Cassandra, MongoDB
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP, SMTP
Monitoring Tools: Cloudera Manager, Solr, Ambari, Nagios, Ganglia
Application Servers: Apache Tomcat, Weblogic Server, WebSphere
Security: Kerberos, Knox.
Reporting Tools: Cognos, Hyperion Analyzer, OBIEE & BI+
Sr. Hadoop Operations Administrator
Confidential, Charlotte, NC
- Performed the CM/CDH upgrades, 5.7 to 5.9 and 5.9.0 to 5.10.1 and worked to resolve different issues through the upgrades.
- Worked through and supported RedHat Linux OS level upgrade to 6.8, Oracle Database switchover for testing DR, and Oracle Database upgrade to 12c.
- Developed and automate the Benchmark tests for different services like Hive, Impala, Oozie actions, and Spark before and after performing upgrade.
- Analyze pyspark code for avro to orc conversion and worked through performance tuning of the job depending on the input data and set driver/executor memory, executors etc…
- Written shell scripts and successfully migrated data from on Prem to AWS EMR (S3) and helped application teams with data copy from on-Prem to AWS cloud, by spinning up an EMR cluster and then sync/distcp to S3.
- Updated cloud formation templates to use Password Vault to retrieve public/private Ssh keys and updated AWS role arn for S3 by defining specific policies or permissions that the EMR cluster should have access to buckets in S3. Modified security groups, subnet ID, EC2 Instance Types, Ports, and AWS Tags. Worked on Bitbucket, Git and Bamboo to deploy EMR clusters.
- Supported end-to-end Cloud Formation template development and platform setup for different business units in elevating a Data Transmission Project to AWS, where they want to have the files from vendors that are landing into on-Prem HDFS to be parallel sent to s3.
- Configured and implemented enabling the SSL for Hadoop Web UIs (HDFS, YARN, Job History, Spark, Tez and Hue) in AWS.
- Generate certificates in both pem and jks formats as required for different services and created trust stores to establish mutual handshake between different services as part of SSL implementation.
- Currently working towards creating DNS for core nodes in Auto Scaling group in AWS using the lambda functions and Route53.
- Tuning for the effective performance of Hadoop eco system as well as monitoring for performance drop.
- Assisted development teams in loading data from various data sources like DB2, Oracle and SQL server into Hadoop HDFS/Hive Tables using Sqoop or FTP for flat files from different vendors.
- Provide on-Prem support for users by troubleshooting the issues at Hadoop service level and Job level.
- Supported a POC project to stream data using Flume and send to Kafka sink and perform transformations using spark from Kafka topics and store the result data into HBase.
- Work with application teams to tune the cluster level settings, while they onboard to cloud for testing their jobs by estimating the memory requirement, type or number of core nodes depending upon the applications load and help with troubleshooting their jobs when they fail due to lack of proper tuning.
- Automated the process of Ranger Hive and HDFS plugins to be installed when a new cluster spins up in AWS.
- Installed MySQL on an RDS Instance and Externalized the HUE and Hive Metastore databases.
- Onboard new users to Hadoop and perform a manual sync of new users added in LDAP to Hue and grant necessary permissions to different objects in HUE.
- Worked through and supported ID Vault Environment setup on on-Prem Hadoop servers.
- Worked with UNIX teams to have a staging/landing server required for business teams to land the data before they can push to AWS.
- Provided and supported Tableau Integration and tableau user issues.
Environment: Hadoop, MapReduce, Hive, HDFS, Sqoop, Oozie, Cloudera, AWS, EMR, Cloud Formation, Flume, HBase, ZooKeeper, CDH5, Oracle, MySQL, NoSQL and Unix/Linux.
Confidential, Oak Brook, IL
Environment: Hadoop, Hdfs, Cloudera CDH, Spark, MapReduce, Yarn, Pig, Hive, Sqoop, Oozie, Kafka, Linux, AWS, HBase, Cassandra, Kerberos, Scala, Python, Shell Scripting.
Confidential - Atlanta, GA
Environment: Hadoop, Scala, MapReduce, HDFS, Hive, Pig, Sqoop, Hbase, Flume, PostgreSQL, Spark, Spark-Streaming, MapR, Storm, Kafka, Nagios, Python, Oracle, Git, UNIX Shell Scripting and Cassandra, Azure, HBase, Cassandra, Oracle, MySQL, Kerberos, PowerShell, Python, Shell Scripting, Tableau, SAS.
Linux Hadoop Administrator
Environment: Oracle Data Integrator (ODI), TABLEAU, OBIEE (reporting), Oracle, Oracle EPM with FSCM integrations, Talend, My SQL, Amazon Web Services (AWS), S3, Hortonworks Hadoop, Spark, HIVE, SQOOP, Redshift, PostgresSQL and Unix
Linux System Administrator
Environment: Redhat Linux 5.X, HP & Dell Servers, Oracle/DB2, VMWare ESX 4.x,VMware VSphere, ESX, Bash, Shell Scripting, Nagios.
- Installing and Configuring Hadoop ecosystem (HDFS/Spark/Hive/Oozie/Yarn) using Cloudera manager and CDH
- Worked on administration and management of large-scale Hadoop clusters (50 nodes).
- Monitored job performances, file system/disk-space management, cluster & database connectivity, log files, management of backup/security and troubleshooting various user issues.
- Responsible for day-to-day activities which include HDFS support and maintenance, Cluster maintenance, creation/removal of nodes, Cluster Monitoring/ Troubleshooting, Manage and review Hadoop log files, backup restoring and capacity planning.
- Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Experience in Job management using Fair scheduler and Developed Job Processing scripts using Oozie Workflow.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Responsible for Performance Tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and Memory tuning.
- Installed and Configured Hive and Pig. Worked with developers to develop various Hive and PigLatin scripts.
- Worked in HDFS data storage and support for running map-reduce jobs.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Experience in setting up Hadoop clusters on cloud platforms like AWS.
- Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load Balancer, and Auto scaling groups, IAM roles, VPC subnets and Cloud Watch.
- Used Nagios to monitor the cluster to receive alerts around the clock.
- Experience with Splunk Administration, Add-On's, Dashboards, Clustering and Forwarder Management.
- Enabled security to the cluster using Kerberos and integrated clusters with LDAP/AD at Enterprise level.
- Extensively worked on Linux systems (RHEL/CentOS).
- Created and maintained various Shell and Python scripts for automating various processes.
- Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Installed and Configured Hortonworks Data Platform (HDP) and Apache Ambari
- Installed and Configured Hadoop Ecosystem (MapReduce, Pig, and Sqoop. Hive, Kafka) both manually and using Ambari Server.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Worked on tuning the performance Pig queries.
- Converted ETL operations to Hadoop system using Pig Latin Operations, transformations and functions.
- Implemented best income logic using Pig scripts and UDFs
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Worked on YARN capacity scheduler by creating queues to allocate resource guarantee to specific groups.
- Implemented Hadoop stack and different Bigdata analytic tools, migration from different databases to Hadoop (Hdfs).
- Responsible for adding new eco system components, like spark, storm, flume, Knox with required custom configurations based on the requirements
- Installed and configured Kafka Cluster.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
- Helped the team to increase cluster size. The configuration for additional data nodes was managed using Puppet manifests.
- Strong knowledge of open source system monitoring and event handling tools like Nagios and Ganglia.
- Integrated BI and Analytical tools like Tableau, Business Objects, and SAS etc. with Hadoop Cluster.
- Planning and implementation of data migration from existing staging to production cluster. Even migrated data from existing databases to cloud (S3 and AWS RDS).
- Component unit testing using Azure Emulator.
- Analyze escalated incidences within the Azure SQL database. Implemented test scripts to support test driven development and continuous integration.
- Installed and configured Apache Ranger and Apache Knox for securing HDFS, HIVE and HBASE.
- Developed Python, Shell/Perl Scripts and Power shell for automation purpose
- Setup up Big data platform using Amazon web services (AWS), Hadoop with Elastic Map reduce, Hive and Scoop
- Used Python for uploading data into Amazon S3 Buckets, and copy to Redshift.
- Involved in migrating 25 TB oracle database data into Amazon S3 (which would act as HDFS for Amazon EMR)
- Extensively used Amazon NoSQL DB's like DynamoDB and Redshift to stored parsed data to serve Multidimensional Reporting and Cubes. Have a good understanding of HBASE as well.
- Experience in processing streaming data using Kafka/Amazon Kinesis
- Used internally developed Custom ETL's where complex data processing is required.
- Setup PIG & HIVE configure to source data from HDFS
- Configure and schedule SQOOP scripts to import data from Oracle
- Managed the Data Pipeline which can capture data from streaming web data as well as RDBMS source data
- Worked closely with Project Managers, Business users, Data producer owners, Developers and Business Analysts for defining project requirements for BI Multidimensional reporting/dashboard development
- Analysis, design and development activities on multiple projects, mainly concentrating on Citrix Financials and Ecommerce & Marketing data
- Involved in setting up Big data platform using Amazon web services, Hadoop with Elastic Map reduce, Hive and Scoop
- Worked extensively with Flume to capture the steaming semi structured data into HDFS
- Managed ETL jobs with Oracle Data Integrator, to build commerce DataMart’s (CDM) and Ecom & Marketing Data
- Enhancing the ETL jobs performance using Oracle Data Integrator.
- Provided production support for ODI ETL's daily nightly schedules
- Creating technical design document based on function design document
- Experience on various Versions of Linux Redhat (5, 6, 7), SuSE (10, 11), AIX (5, 6, 7), TSM (5, 6) servers Operating systems and VMware Virtualization.
- Installed (racked), loaded Red Hat Enterprise Linux operating system (Kickstart) & maintained HP & IBM & Dell servers.
- Helped develop Kickstart system for building Linux-based developer workstations; trained users on same.
- Trained and mentored junior sysadmins and software developers, improving productivity and keeping costs to a minimum.
- Good understanding of error logging subsystem and Performance Monitoring tools like VMstat, iostat, net stat on AIX & Linux systems.
- Managing disk space using Logical Volume Management (LVM).
- Changing permissions, ownership and groups of file/folders.
- Provided on-call support for any technical issues on UNIX servers.
- Monitoring system performance of virtual memory, managing swap space, Disk utilization and CPU utilization. Monitoring system performance using Nagios.
- Add/Remove/Modify start-up scripts.
- Planning Daily, Weekly and Monthly activities as per POA.
- Generating Monthly Performance Reports.
- Updating of procedural & process documents.
- Experience in System Builds, Server builds Installs, Upgrades, Patches, Migration, Troubleshooting, Security, Backup, Disaster Recovery, Performance Monitoring and Fine-tuning Systems.
- Good knowledge on Unix Shell Scripting and Python scripting.
- Experience in deploying virtual machines using templates and cloning, taking backup with a snapshot. Moving VM's and data stores using vMotion and storage vMotion in VMware environment.
- Deploy virtual machines using templates and cloning.
- Setup, configure, and maintain UNIX and Linux servers, RAID subsystems, and desktop/laptop machines including installation/maintenance of all operating system and application software.
- Install, configure, and maintain Ethernet hubs/switches/cables, new machines, hard drive, memory, and network interface cards.
- Manage software licenses, monitor network performance and application usage, and make software purchases
- Provide user support including troubleshooting, repairs, and documentation and developing web site for support.
- Configured and upgraded large disk volumes (SAS/SATA)
- Plan and execute network security and emergency contingency programs.
- Responsible for meeting with client and gathering business requirements for projects.
- Create, Configure and manage standard Virtual switches.
- Creating and configuring swatches port groups and NIC Teaming.
- Knowledge on Resource Handling, Memory Management techniques, Fault Tolerance and Update Manager.