We provide IT Staff Augmentation Services!

Sr. Hadoop Engineer/ Administrator Resume

2.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY:

  • Over 12 years of experience in Information Technology with a strong background as a Techinical Administrator/Architect/Engineer of Big Data, Data Warehouse, Data Analytics, and Managing Very large cluster environments.
  • Around 7 years of experience in three Hadoop distribution areas (Cloudera/ Hortonworks/ MapR) administration activities like cluster Builds/Upgrades, Configuration management, POCs, Installation and maintenance of Hadoop ecosystems including Cloudera Manager, Ambari, HDFS, YARN, Spark, Machine Learning, Hive, Hbase (NoSQL), Hue, Impala, Kafka, MapReduce, Zookeeper, Oozie, Solr, Sqoop, Flume, Pig, Chef, Puppet, Knox and Cloudera Navigator, Metadata (MySQL) backup and recovery, job scheduling and maintenance, code and data migration, debugging, Troubleshooting (Connectivity/Alerts/System/Data Issues), Performance tuning, Backup and recovery (BDR), monitoring a Hadoop System using Nagios, Ganglia, Machine Learning, Python Scripting, and Security setup and configuration includes Kerberos, Sentry and LDAP .
  • Lead the complete project end to end: Cloudera POC, Hortonworks POC, MapR POC, Hadoop integration with Teradata, Cluster Build, High Availability setup for multiple components, and configuration, Security Design, setup and Upgrades
  • Performed as a DevOps Systems Admin by supporting deployment and operations of various applications in Porduction and lower environments using Git, GitHub, Jenkins, Chef, Puppet, Ansible, Salt and Docker.
  • Good working experience in Amazon Web Services (AWS) provisioning/Services and in depth knowledge of application deployment and data migration on AWS and expertise in monitoring, logging and cost management tools that integrate with AWS. Developed Cloud formation scripts for AWS Orchestration, Chef and Puppet.
  • Very good working experience in taking a specific high level Machine Learning modeling concept and implement it efficiently and quickly at scale in production and Machine Learning Techniques includes information retrieval, data mining, statistics and Alerting mechanism
  • Experience in the successful implementation of ETL solutions on data extraction, transformation and load in Sqoop, Hive, Pig, Spark and HBase (NoSQL Database).
  • Developed/migrated numerous applications on Spark using Spark Core and Spark Streaming APIs in Java. Optimized Map Reduce jobs/SQL queries into Spark Transformations using Spark RDDs. Developed and integrated Spark Framework with Kafka Topics for real time streaming applications.
  • Designed and Documented Big Data Best Practices and Standards, EDL (Enterprise Data Lake) Overview, Step by Step Instructions on Cluster setup/upgrade/Adding/Decommission Nodes, Onboarding Process, Security Design Model, Failure Scenarios and Remedy, PROD to COB Discrepancies, and EDL Migration.
  • Advanced expertise in SQL, Python, Scala, Shell, Java, and Perl scripting
  • Advanced expertise in Teradata DBA activities includes DBMS Upgrades, Migrations, Unity ecosystem activities, Scripting and Automation, setup/tweaking TASM Workloads, handling Node Level & OS issues, Axeda setup, BAR activities, Capacity Planning, Audit and Best Practice, Performance monitoring, Analysis and pivot chart report generation for complete system health and performance metrics, PDCR maintenance, crash dumps & snapshot dumps monitoring, load and unload utility issues, Coordination with GSC/SSM/CSR team on Sev 1 issues, and debug security/network related issues.
  • Experience in different ticketing systems like, ServiceNow, Infoweb, CMR, HPSM, ManageNow, TechXL, Access Request Tool (ART), PAC2000 and HPSD .

TECHNICAL SKILLS:

Operating Systems: Unix, Linux, Solaris - UNIX, AIX, Windows XP/NT/2000

Hadoop Distribution: Cloudera, MapR and Hortonworks

Big Data: HDFS, NFS, HBase, MapReduce, Cloudera Manager, Ambari, MapR Control System, Cloudera Navigator, Machine Learning, Nat Yarn, HUE, Hive, Impala, Pig, Sqoop, Flume, Kafka, Oozie, Zookeeper, Spark, PySpark, Storm, Ganglia, Nagios, Avro, AWS, DevOps Tools (Chef, Puppet), Kerberos, Knox, Ranger NiFi, Tez, BoKS, Sentry, LDAP, AD, Cassandra, MongoDB and Fabric

Teradata: Viewpoint, Data Mover, TMSM, Teradata Administrator, Teradata SQL Assistant, TASM, TSET, Teradata Visual Explain, Teradata Statistics Wizard, Teradata Index Wizard, Schmon, Tdwm, CTL/XCT, Lockdisp, Showlock, Qrysessn, TSTSQL, Vprocmanager, gsc tools, DBSControl, Ferret, rcvmanager, Update Space, Gateway Global, TPT, BTEQ, Fast Export, Fast Load, Multi Load and TPump

DevOps: Git, GitHub, Bitbuket, Jenkins, Chef, Puppet, Docker,

Databases: HBase, Cassandra, MangoDB, PostgreSQL, Teradata DBMS, Oracle 9i, DB2, SQL Server

Languages & Others: SQL, Python, Ruby, Scala, Unix Shell Scripts, C, C++, Java, and Perl Scripts, Kerberos, APT, Sentry, Anaconda,JSON, XML and ACL

Ticket Management: Infoweb, ManageNow, mainframe, TechXL, IMR, MIS, HPSM, HPSD, ServiceNow

Backup Tools: Tivoli Storage Manager, NetVault, NetBackup, Teradata ARC

Data Integration Tools: Ab-Initio, Informatica, Talend

Others: Protegrity tools, GitHub, EventEngine, Autosys, ERwin, Clear Case, Autosys, MicroStrategy, Eclipse, Control-M, Clear Quest, AtanaSuite, WinMerge, UltraEdit, SecureShell, Tectia Client

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Sr. Hadoop Engineer/ Administrator

Responsibilities:

  • Involved in setup and designing the Big Data cluster HDP 2.5 on AWS and documented Ambari Technical User Guide includes Hadoop component basics, Step by Step instructions to Install and manage a Hadoop cluster using Ambari, Troubleshooting Ambari Deployments, and specific issues like HDFS Smoke Test fails, Cluster Install failure, HCatalog Daemon Metastore Smoke Test Fails.
  • Hadoop sizing estimate, node calculator, Cluster build on multiple hosts/virtual hosts and Cluster evaluation and standard Big Data Best Practices
  • Active participant in defining User standards includes Best Practices includes coding schema, Objects, System level schemas, dynamic tables in the functions, leveraging truncate instead of delete, compression level, Partitions, table transformations and subscription backups, etc
  • Involved in various phases of Software Development Life Cycle (SDLC) as Requirement gathering, Data Modeling, Analysis, Architecture Design and Development
  • Setup Java Garbage Collector monitoring alert when memory usage of process exceeds 85% of allocated memory, so that we can fix it to improve applications performance and throughput.
  • Big Data Lunch and Learn Demos on Big Data administration includes Hadoop Ecosystem deep dive includes Hadoop Overview, Ambari, Hadoop features and components, MapReduce, Hive and HBase Architecture, Hive, Storm, Tez, Hue, Flume, Cluster, Sqoop, Kafka, Oozie, Zoo Keeper, Pig, Hue, Hadoop Encryption and Pepperdata.
  • General guidelines to users for long running Impala Queries, and Impala Compute Stats failing issues, Issue when using LIMIT clause and Setup load balancing and VIP
  • Documented Master Document for Developers includes Spark, Hive, Impala, Ambari, Authentication and Authorization and Proxy Server usage and also documented on failure scenerios and remedies includes hardware and software failures like NN/DN failure, logs filling up diskspace, etc. Created Best Practices document for Hive Spark Squoop File formats and compression and Query Optimization and troubleshooting Spark Applications.
  • Monitor critical parts of the Cluster using Ambari includes Monitoring, Troubleshooting, and Performance Tuning, OOM Issues, 100’s of configuration checks for different components, backup and disaster recovery.
  • Created and managed Apache Web and application servers using Chef recipes and supported and managed Chef-based site-specific and application-specific cookbooks
  • Supported and handled issues related to version contol system GitHub
  • Extensive experience on Docker Containers infrastructure and Continous integration for building & Deploying Docker Containers & wrote new plugin checks for Nagios to monitor Docker containers.
  • Involved in adding Kerberos Authentication includes enable Hadoop Security, ZooKeeper Security, HBase, SOLR Security, Kafka and deploy client configurations, and Implemented Hadoop Cross Realm Replication when the clusters are members of different Kerberos realms and Troubleshooting Kerberos Issues
  • Resolved complex Security Issues includes
  • Setup Kafka includes installing Kafka, Enabling SSL and High Availability (HA) on Kafka Brokers, and ingesting Multiple cluster Kafka topics into one cluster using Mirror Maker service.
  • Resolved complex Kafka issues like MirrorMaker Service is unable to start, Trouble adding Kafka to Cluster, Flume setup can’t work with Kafka due to security compatability issue, Kafka Producer-Consumer is not working with SOEID.
  • Setup HBase Project Onboarding includes HBase installation, HBase table replication, HBase Core Access Control, and improving SQL Access to HBase and HBase replication monitoring script and created HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig.
  • Enabled SSL for HiveServer2 and documented Best Practice guidelines for building and maintaining Hive Application and assisted user in troubleshooting HS2 issues, and Hive Query issues.

Environment: RHEL 6.5, HDP 1.3.1/2.1/2.5 , Ambari, Hive, Kafka, Sqoop, Tez, Storm, Python, Hbase, Teradata Query Grid, ZooKeeper, Oozie, Kerberos, Knox, Ranger, Pig, HBase, Avro

Confidential, Irwing, TX

Sr.Hadoop Infrastructure Administrator

Responsibilities:

  • Primary participant in certifying Big Data product for use with in Confidential group, new cluster evaluation recommendations, RAM estimates and Hadoop cluster upgrade for: CDH5.48/CDH5.53 to CDH5.7
  • Active participant in performing COB (continutity of Business) Switchover includes COB Cluster Checkout, COB Cluster testing for all the components like Hadoop Cluster, MySQL, Flume, Datameer, Platfora, Data Ingestion, Talend, and Data Center Failover.
  • Setup/Configured/Documented Hive Metastore high availability (HA)
  • Primary participant in Sqoop setup and securing Sqoop2 Server and used Sqoop on Sentry enabled cluster.
  • Build the BDR Requirement template and COB Business recovery plan template.
  • Installed Kafka, Enabling SSL and High Availability on Kafka Brokers and ingested multiple cluster kafka topics into one cluster using Mirror Maker Service and handled multiple Kafka failure issues
  • Configured HBase to use HDFS High availability, cleanup split logs and added kafka topic to entity ‘mk consumer config’ in Hbase
  • Secured Sqoop2 Server by enabling Kerberos Authentication and SSL Encryption.
  • Upgraded Cloudera Manager patch 5.4.8 and documented the whole process.
  • Certified SPARK-SQL on CDH 5.7 by testing all the functionalities.
  • Enabled SPARK Encryption using Cloudera Manager for encrypting Spark data at rest, and data in transit.
  • Recommended property parameter values for Yarn Baseline Configurations instead of default values includes components like Resource Manager, Node Manager, and Gateway. Prepared YARN tuning worksheet for each cluster to aid in calculating YARN configurations for careful assessment of the workload of a given cluster and a comprehensive tuning exercise.
  • Worked on Big Data Issues and documented in Big Data Issue Tracker
  • Resolved HUE related issues like User not able to login to Hue, Hue will not start in Cloudera Manager, Hue Daemon not starting, and how to change timezone for Hue logs, etc.
  • Tuned Yarn/Impala/Hive/Hue/Spark for Performance and involved in translating from MapReduce to Spark and memory tuning, Hive parameter set to use Spark engine
  • Involved in security setup for Big Data includes Unix Security setup, Cloudera Manager Security, HDFS Security, MapReduce security, HBase Security, Hue Security, Sqoop and Flume Security, Port /URL security, Sentry setup, Port used and Cloudera Search Authorization with Sentry.
  • SPNEGO Web Authentication Configuration and Kerberos setup for Cloudera Manager and CDH.
  • Prepared Best Practice documentation includes Hive, Impala, Spark, Sqoop and Hadoop file formats and compression.
  • Primary Participant in documenting whole Application life cycle process includes Application onboarding, Technical Assessment, Performance Review and Reporting and involved in day to day administrative activities like monitoring/troubleshooting user issues and user access management.
  • Gathered and documented on Failure scenarios and remedy includes Software failures, manually moving Name node to a new node, add new data disks to a worker node, Name Node failures, Data Node failures, Cloudera Manager failures, Backup the PostgreSQL database, data ingestion issues, Hue Database failures, Hive Metastore Database Failures, Hardware/Rack/Master Server/Proxy Server/Switch/Disk failures, CPU Memory IO Spikes, and Kerberos Issues, etc.
  • Documented on PROD to COB discrepancies includes all the components.
  • Raised a case and coordinated with supporting engineer to resolve CPU saturation (100%) issue caused by python process, Cluster and Cloudera Manager issues and OOM issues.

Environment: CDH 5.7, RHEL, Cloudera Manager, Cloudera Navigator, Yarn, Impala, Hive, HUE, Kafka, Sqoop, Pig, HBase, Avro

Confidential, Raritan, NJ

Sr.Hadoop Administrator/ Architect

Responsibilities:

  • Primary Participant in cluster installation and maintenance, cluster upgrades, Patch management and manual installation of Cloudera Manager, setup and configured High Availability, day to day operational activities like monitoring critical parts of the cluster, adding/decommissioning data nodes, different service related issues, tuned multiple services like Yarn, Kafka, Impala, Spark, Hive, and configuration checks, user access management, HDFS support and maintenance, role addition, code and data migration, Backup metadata, and capacity planning.
  • Hadoop Evaluation POC to evaluate Hadoop capabilities as a better, faster, cost efficient and deeper way to source data and support discovery analytics
  • Hadoop POC Development Engagment
  • Performance Benchmarking Report
  • Functional Design Deliverable
  • POC Space Sizing, Software & Hardware Requirements
  • Used Teradata Grid to move data between Teradata and Hadoop and Aster platforms.
  • Runbook and Load Routines
  • Performed Pepperdata POC to get real time reallocation of resources what YARN can provide to allow for the creation of additional YARN Containers. Throttles non-high priority jobs to ensure that high priority job SLA’s are met.
  • Worked on setting up high availability for major production cluster
  • Worked on installing and configuring cluster, Adding new nodes to an existing cluster, safely decommissioning nodes like data node hang issues, recovering from a NameNode failures like NameNode down issues, memory issues, Monitoring cluster health using Teradata Viewpoint, Ganglia, capacity planning, tuning MapReduce job parameters and slots configuration
  • Installed and configured CDH5.5.2 on AWS (Amazon Web Services)
  • Moving the Services (Re-distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services.
  • Primary participant in planning Hadoop Cluster includes planning, identifying the right hardware, network considerations and configuring Nodes.
  • Participated in upgrading cluster, which involved coordination with all ETL application teams, working with System Engineer and performed pre and post checks.
  • Responsible for Cluster Maintenance includes copying data between clusters, adding and removing cluster nodes, checking HDFS status, and rebalancing the Cluster
  • Worked on installation of Cloudera Manager and used all the features of the Cloudera Manager like configuration management, service management, resource management, reports, alerts, aggregated logging, and also involved in Hadoop (CDH) installation.
  • Handled the Python errors during upgrading CM 5.5.2 manager agent
  • Worked on setting up high availability (HA) for major production cluster
  • Replaced a Standby NameNode in a Running HDFS HA cluster
  • Created multiple instances of the Hue service for HA
  • Handled Oozie/Hue HA issues and changed NameService ID for HDFS HA
  • Setup multiple HiveServer2 (HS2) instances behind a Load Balancer for HA
  • Enabled HUE HA and load balance HUE queries between both the server
  • Experience in managing system configuration with Puppet and developed Puppet modules for hotfixes after major and mnor releases.
  • Setup and configured Git Server for code repository, code merge and qulatiy checks with various tools.
  • Configured Jenkins with GIT and for appropriate release builds and scheduled automated builds.
  • Setup and configured various Jenkins jobs for build and test automation and created Branches, Tags and perform Merges in GIT
  • Provided Best Practice document for Docker, Jenkins, Puppet and GIT
  • Primary participant in integrating Enterprise Data Lake (EDL) with the LDAP/Kerberos for Authentication and setting up Authorization via the Sentry roles and ACL permissions and is specific to the business application to which the User/Service Account belongs.
  • Used Cloudera Manager to configure and enable Kerberos and setup Kafka in Kerberos Cluster. Fixed multiple Kerberos issues like Kerberos error in Hue logs, Kerberos role error after Impala upgrade issue, Kerberos error when connecting to Hive metastore
  • Designed and developed scalable statistical machine learning framework using Localized Linear/Logistic Regression, Auto Encoders, and Decision Trees on top Topological Data analysis.
  • Handled many Kafka issues includes Kafka and Zookeeper services not coming up after patch upgrade to CDH 5.7.1, unable to produce messages from kafka-console-producer tool due to kafka exception error,
  • Used HUE applications to browse HDFS and jobs, manage a Hive metastore, run Hive, Cloudera Impala queries and Pig Scripts, browse HBase, export data with Sqoop, submitted MapReduce programs, build custom search engines with Solr, and scheduled repetitive workflows with Oozie and tested Ooziew workflows via Hue.
  • Enabled Django Debug mode for Hue to handle different Hue related issues
  • Resolved Hue related issues includes Hue is slow/hangs/login timeouts and poor concurrent performance with Hue Web Server.
  • Scheduled Hue history cron.sh script via Cron job to clean up old data in Oozie and Beeswax Hue tables
  • Primary participant in setting up the MySQL in a Master-Slave format on each of the EDL environment.
  • Setup backup job to backup Master data on a daily basis via a housekeeping job (Python scripting) which has been scheduled in the Crontab and sends out an email notification on success/failure of the job to the EDL team. Once the backup completes the backup copy is replicated to one of date node for increased availability.
  • Participated in evaluation of the severity of the patch by working with the Cloudera System Engineer (SE)
  • Decision making in necessity of the requirement of patch.
  • Based on the patch severity (critical/non-critical), deployed the necessity software by following the change management window deadlines
  • Worked in setting up Alerting mechanism which has been integrated with the Cloudera Manager and setup-alerting emails in case of any changes in the health status for the cluster related services or configuration/security related changes, which have been scheduled to send to the EDL Administration team.
  • Added new alerts based on the business requirement timely
  • Monthly Log Review: Performed Monthly log review of Audit Logs and security related events from the Navigator Audit Logs and other Security/Audit related logs to look for any suspicious activity on the cluster from both the user and admin related activities and shall report to the management to take the action accordingly.
  • Co-ordination with Application Leads: Worked closely with App Leads in addressing the impacts reported by the alert emails
  • Built a ‘Transporter’ migration tool to migrate files and new objects from DEV to QA and QA to PRD environments.
  • Automated Housekeeping/Audit jobs using Python Scripting
  • Worked in HDFS management, upgrade process and rack management and consider NameNode memory considerations and involved in HDFS security enhancements.
  • Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS
  • Installed and configured Hive, Impala, MapReduce, and HDFS
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs and responsible for Oozie job scheduling and maintenance
  • Resolved Classloader issues with parquet-scala
  • Used Hive and Python to clean and transform geographical event data.
  • Used Pig and a Pig user-defined filter function (UDF) to remove all non-human traffic from a sample web server log dataset and used Pig to sessionize web server log data.
  • Troubleshooting user issues on services such as Spark, Kafka, Hue, Solr, HBase, Hive, and Impala
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Configured spark with the python APIs
  • Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions
  • Designed and implemented PIG UDFS for evaluation, filtering, loading and storing of data
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Used Ganglia and Nagios to monitor system health and give reports to management
  • Diagnosing and fixing performance problems on HBase Cluster.
  • Troubleshooting HBase problems and fine-tuned HBase configurations
  • Security enhancements, Authentication and Access Control to HBase
  • Checked configuration files, error messages and java exceptions to troubleshoot any Cluster startup issues, data node issues and task tracker issues
  • Used fsck (file system checker) to check for corruption in data node blocks and to clean unwanted blocks
  • Tweaked MAPRED configuration file to enhance the performance of the job.
  • Tuned Hadoop Configuration files to fix performance issues such as swapping and CPU Saturation issues
  • Assisted Application developer in fixing the application code using Python scripting.
  • Addressed multiple execution problems of Python based streaming jobs
  • Documented EDL (Enterprise Data Lake) best practices and standards includes Data Management (File formats (Avro, Parquet), Compression, Partitioning, Bucketing, De-Normalizing), Data Movement (Data Ingestion, File Transfers, RDBMS, Streaming Data/Log files, Data Extraction, Data Processing, MapReduce, Hive, Impala, Pig, HBASE)
  • Documented on EDL Overview and services offered on EDL Platform includes Environments, Data Storage, Data Ingestion, Data Access, Security, Indexing,
  • Primary participant in designing EDL Migration form
  • Worked with the management in defining and documenting the EDL onboarding process.
  • Tested and documented the step by step process for Application users on how to download/Install/Run/ Testing Cloudera Impala ODBC Driver and connectivity to Tableau.
  • Tested and documented the step by step process in creating a HDFS connection, Teradata Connection, Hive Connection, HDFS file object, Hive Table Object, Teradata object and pushdown (load balancing) optimization using Hive.

Environment: CDH 4.5 to 5.5.2, RHEL, Cloudera Manager 5.5.3, Cloudera Navigator, Yarn, Impala 2.1.2, Hive, HUE, Kafka, Sqoop, Pig, HBase, Avro.

Confidential, Phoenix, AZ

Sr.Hadoop/ Teradata Administrator

Responsibilities:

  • Key participant in Big Data regular activities like patch management, major upgrades, cluster services, monitor critical parts of the cluster, account management, NFS support and maintenance, code release management, data extraction/ import, Backup and restore, and capacity planning.
  • Involved in all kinds of P1 issues for the Teradata system
  • Installad/ Configured/Tweaked MapR hadoop Cluster nodes and MySQL
  • Primary participant in preparation of reports and dashboard development including ETL and ELT of data from source systems to NFS/HBase/Hive
  • Setup Hadoop Cluster, Teradata-Hadoop connectivity, BI Connectivity, Test Backup & Restore/Network Capabilities - IO throughput/ Clusters on Virtual Machines/ Integration with AD, and Administration and Monitoring, In-memory Data grid capabilities, Schema on Read/ Agile Data discovery Capabilities, constructed test instance on hbase replication using Python APIs
  • Developed both Hbase and non-hbase warehouse solutions for BDR (Backup and Disaster Recovery)
  • Documented Best Practices and implemented Security and Governance policies for big data applications
  • Designed and implemented Hadoop Platform capabiilities like Oozie, Rev R, YARN, STORM
  • System Validation Testing to verify basic system functionality includes running a MapReduce Job, Hue Testing, Spark Functionality and Hive connectivity test.
  • Installed Hadoop and configured with Teradata database for data loading process
  • Involved in Hadoop Cluster environment administration using MapR Control System that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and troubleshooting.
  • Performed Hadoop backup, added libraries and jars successfully migrated from the existing infrastructure to latest releases.
  • Enabled debug logging in Python scripts as part of installing and configuring the cluster
  • Used Avro to store multiple small files
  • Diagnosed and tuned performance problems includes identifying map-side data skew problems, reduce-side data skew problems, CPU contention, Memory swapping, Disk health issues, Networking issues and discovering hardware failures
  • Used native DataFrame API to query Avro files in Python.
  • Improved Performance for HBase data in serializing the data using Avro
  • Used custom formatters in HBase shell to visualize data stored in different data serialization formats.
  • Resolved performance issues while loading bulk data into HBase
  • Setup Audit and Housekeeping jobs using Python scripting
  • Documented Standards and Best Practices that includes File Formats, Compression, Partitioning, Bucketing, and De-Normalizing.
  • Developed a standardized approach to use the right data load method for the right task. Some to the factors like timelines of data ingestion (Such as Batch, Near-Real time, Real time), Data extraction and processing (such as random row access, required data transformations), Source system and data structure formats.
  • Picked the right compression codec for the data
  • Primary participant in setting up Audit process in monitoring the alerts generated during the events and also probing and analyzing the logged activity. The alerts have been configured in the Cloudera Manager to notify the DBA Team comprising of but not limited to the below mentioned changes:
  • Security related activities
  • Health status of the Servers configured in the Cluster
  • Health status of the configured services such as HDFS, Hive, HBase, HUE etc
  • Any configuration changes happening on the cluster
  • Involved in setting up security across Hadoop clusters which involved user support for LDAP as well as Linux owners and groups
  • Setup and maintaining SSL and Kerberos
  • Managed load balancers, firewalls in production environment
  • Worked on different ticketing system like Manage Now, Infoweb, CMR, Tech XL and Service Now to handle different planned/unplanned maintenance activities and user issues

Environment: MapR 4.0, RHEL, Hbase, Hive, Impala, Flume, Spark Streaming, Kerberson and Sentry

Confidential, Hartford, CT

Sr Teradata DBA

Responsibilities:

  • Deployed LDAP in all Environments for three Line of Businesses (LOBs) (Personnel Insurance, Business Insurance, and Claims) as per Audit
  • IP Filter Setup in Production for three LOBs using ipxml2bin utility as per Audit
  • Cleaned up redundant roles and erroneous privileges in the roles, direct grant privileges as per Teradata Best Practices.
  • Teradata Client Installation on Linux, AIX, Solaris and Windows Servers and resolved the Product Support Issues
  • Backup and Restore using Tivoli Storage Management System
  • Debugged multiple issues with PDCR collection on daily jobs. Using PDCR tool kit analyzed the reports on system health
  • Migrated the data from one environment to another environment using Perl Scripts
  • Created Macros and Stored Procedures to Manage User Provision process, Role Management, Profile Management and Move the Space between databases
  • Performed the Offshore Team lead role & 24X7 Support for production system
  • House Keeping Activities: Dropping and recreating the Objects, fixing the failures while applying the DDL version changes, Role Management & Profile Management
  • Worked on changing the DBSControl Parameters and Co-coordinated with the Testing Team
  • Monthly Database Cleanup & Suggestions to the users on cleanup initiatives
  • Created Database Automation Script to create databases in different Environments using Stored Procedure.
  • Generated different Space Reports in Teradata Manager to analyze different kind of issues.
  • For better filter setup, analyzed the data logged by the TASM filters in warning mode and recommended the solution for the work on the system.
  • Worked on Database Synch up across different Servers (Dev, Test, MO and Prod) using AtanaSuite SyncTool
  • Used the AtanaSuite Delta Tool on comparing the objects through different environments.
  • Migrated the data from one environment to another environment using Perl Scripts
  • Suggestions for collecting the statistics during the migrations.
  • Responsible to improve the performance of the queries using Performance Tuning
  • Monitoring database and handling security as per business requirement
  • Granting & revoking the access to the users through Roles.
  • Shell scripts for automating common tasks
  • Design and Document monitoring processes of production databases
  • Handling and Coordination with Global Support Center (GSC).
  • Worked on MIS and HPSM Ticket Management Systems
  • Handling and Coordination with Global Support Center

Environment: Solaris10, Linux, AIX, Windows 32/64 bit, Teradata Database V12, Teradata Database V13, Teradata Administrator, Teradata Manager 12.0.0.5, Teradata SQL Assistant, BTEQ, TSET, Tivoli Storage Management, Teradata Arcmain.

Confidential

Teradata DBA

Responsibilities:

  • Installation of Teradata Client on Windows, and Unix Servers
  • Creation of database and users
  • Enrolling, Controlling and monitoring users access to the database
  • Build tables, views, UPI, NUPI, USI and NUSI
  • Database performance tuning
  • Regular database health check reports for Capacity planning
  • Fixed different NetBackup drive down issues (frozen drive from configuration etc), debugged issues in TARA GUI
  • Database reorganization
  • Installed and setup Teradata Manager
  • Implemented alert policy for both development and test system
  • Database Hardening
  • Upgrading the Teradata Versions
  • Patch upgrades of the Teradata systems
  • NetBackup server maintenance

Environment: MP-RAS Unix, Windows XP, Teradata V2R5, Teradata Manger, Teradata Administrator, Teradata SQL Assistant, BTEQ, DBQL, load/unload utilities, TSET, NetBackup.

We'd love your feedback!