Sr.big Data Engineer Resume
0/5 (Submit Your Rating)
Raritan, NJ
SUMMARY:
- 10+ years of experience in Information Technology with a strong background in Database development and Data warehousing
- 6 years of experience in Cloudera Hadoop administration activities like cluster upgrades/services, Configuration management, Installation and maintenance of Hadoop ecosystems including Cloudera Manager, HDFS, YARN, Hive, Impala, Kafka, MapReduce, Zookeeper, Oozie, Sqoop, Flume, Pig and Spark, Metadata (MySQL) backup and recovery, job scheduling and maintenance, code and data migration, configuration management, debugging, troubleshooting (Connectivity/ System/Data Issues), performance tuning, monitoring a Hadoop System using Nagios, Ganglia and Alerting mechanism.
- Sound Knowledge in administering Hadoop clusters using Cloudera Manager including activities like deploy Hadoop cluster, add or remove services, add or remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name - node high availability and schedule.
- Lead the complete project end to end: Cloudera POC, Hortonworks POC, Hadoop integration with Teradata, Hadoop Cluster, Security enhancements and Upgrades
- Good working experience in importing and exporting data in Hadoop Distributed File System.
- Manage the backup and disaster recovery for Hadoop data.
- Designed and documented Big Data best practices and standards, EDL (Enterprise Data Lake) Overview, onboarding process, and EDL Migration.
- Advanced expertise in SQL, shell, Java, Perl and Python scripting
- Good expertise in Teradata DBA activities like database maintenance activities, Teradata user and database creation / maintenance, Space allocation / maintenance, Performance monitoring using PMON, Teradata Manager, PDCR maintenance, DDL Change management, cleanup activities, crash dumps & snapshot dumps monitoring, upgrades, Clustering, IP Filter and LDAP implementation.
- Experience in different ticketing systems like, ServiceNow, Infoweb, CMR, HPSM, MIS, ManageNow, TechXL and HPSD .
PROFESSIONAL EXPERIENCE:
Confidential, Raritan, NJ
Sr.Big Data Engineer
Responsibilities:
- Primary Participant in Installation, cluster upgrades/services, Patch Management, day to day activities like monitor critical parts of the cluster, user access management, HDFS support and maintenance, role addition, code and data migration, Backup and restore, and capacity planning.
- Hadoop Cluster Installation and Maintenance:
- Worked on setting up high availability for major production cluster
- Worked on installing cluster, starting Hadoop in pseudo-distributed mode and distributed mode, Adding new nodes to an existing cluster, safely decommissioning nodes, recovering from a NameNode failures, Monitoring cluster health using Ganglia, capacity planning, tuning MapReduce job parameters and slots configuration
- Moving the Services (Re-distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services.
- Primary participant in planning Hadoop Cluster includes planning, identifying the right hardware, network considerations and configuring Nodes.
- Participated in upgrading cluster, which involved coordination with all ETL application teams, working with System Engineer and performing pre and post checks.
- Responsible for Cluster Maintenance includes copying data between clusters, adding and removing cluster nodes, checking HDFS status, and rebalancing the Cluster
- Worked on installation of Cloudera Manager and used all the features of the Cloudera Manager like configuration management, service management, resource management, reports, alerts, aggregated logging, and also involved in Hadoop (CDH) installation.
- Primary participant in integrating Enterprise Data Lake (EDL) with the LDAP and setting up Authorization via the Sentry roles and ACL permissions and is specific to the business application to which the User/SA belongs. We configured standard set of roles includes Developer Role, ETL Role, Report Role, Support Role in the Sentry environment specific to the each application which will be leveraged for specific purpose.
- Defined pre-defined Cloudera Manager roles (Full Admin, User Admin, Navigator Admin, BDR Admin, Cluster Admin, Operator, Configurator, Read-only, Auditor), which comes with pre-defined, set of access.
- Defined pre-defined Cloudera Navigator Roles (Full Administrator, Lineage Viewer, Auditing Viewer, Policy Viewer, Metadata Administrator, Policy Administrator, User Administrator)
- Used HUE applications to browse HDFS and jobs, manage a Hive metastore, run Hive, Cloudera Impala queries and Pig Scripts, browse HBase, export data with Sqoop, submitted MapReduce programs, build custom search engines with Solr, and scheduled repetitive workflows with Oozie.
- Defined only two sets of distinct users (Administrators/Individual Users) w.r.t HUE
- Decision making in necessity of the requirement of patch.
- Based on the patch severity (critical/non-critical), deployed the necessity software by following the change management window deadlines
- Worked in setting up Alerting mechanism which has been integrated with the Cloudera Manager and setup-alerting emails in case of any changes in the health status for the cluster related services or configuration/security related changes, which have been scheduled to send to the EDL Administration team.
- Added new alerts based on the business requirement timely
- Monthly Log Review: Performed Monthly log review of Audit Logs and security related events from the Navigator Audit Logs and other Security/Audit related logs to look for any suspicious activity on the cluster from both the user and admin related activities and shall report to the management to take the action accordingly.
- Co-ordination with Application Leads: Worked closely with App Leads in addressing the impacts reported by the alert emails
- Worked in HDFS management, upgrade process and rack management and consider NameNode memory considerations and involved in HDFS security enhancements.
- Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS
- Installed and configured Hive, Impala, MapReduce, and HDFS
- Installed Oozie workflow engine to run multiple Hive and Pig jobs and responsible for Oozie job scheduling and maintenance
- Used Hive and Python to clean and transform geographical event data.
- Used Pig and a Pig user-defined filter function (UDF) to remove all non-human traffic from a sample web server log dataset and used Pig to sessionize web server log data.
- Troubleshooting user issues on services such as spark, Kafka, Solr, Hive, Impala
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions
- Designed and implemented PIG UDFS for evaluation, filtering, loading and storing of data
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Used Ganglia and Nagios to monitor system health and give reports to management
- Checked configuration files, error messages and java exceptions to troubleshoot any Cluster startup issues, data node issues and task tracker issues
- Used fsck (file system checker) to check for corruption in data node blocks and to clean unwanted blocks
- Tweaked MAPRED configuration file to enhance the performance of the job.
- Tuned Hadoop Configuration files to fix performance issues such as swapping and CPU Saturation issues
- Documented EDL (Enterprise Data Lake) best practices and standards includes Data Management (File formats (Avro, Parquet), Compression, Partitioning, Bucketing, De-Normalizing), Data Movement (Data Ingestion, File Transfers, RDBMS, Streaming Data/Log files, Data Extraction, Data Processing, MapReduce, Hive, Impala, Pig, HBASE)
- Documented on EDL Overview and services offered on EDL Platform includes Environments, Data Storage, Data Ingestion, Data Access, Security, Indexing,
- Primary participant in designing EDL Migration form to capture technical information about the proposed migration includes Application Name, Ticket Number, Source Environment, Source Objects (Folders/ Scripts/ Database/ Table/ View/ UDF/ HBase/ SOLR etc.), Target Environment, Target Objects, Migrate date, functionality etc.
- Defined Folder Structure template to create the directories (both hdfs and linux file system) for both structured data and as well unstructured data, which are to be used by the project team.
- Worked with the management in defining the EDL onboarding process and built the intake form for the Application owner which is used to capture technical information about the proposed project for planning and implementation which includes Project/POC, Name, Start Date, Manger details, Team members, Cloudera EDL supported Technologies like Data Ingestion (Sqoop, SFTP, Spark Streaming, Flume) and Data Transformation (Pig, Map Reduce, Spark, Python Scripts, Informatica) and other Hadoop Services (Solr, Hive, Impala, HBase, and Spark), data size, etc.
- Tested and documented the step by step process for Application users on how to download/Install/Run/ Testing Cloudera Impala ODBC Driver and connectivity to Tableau.
- Tested and documented the step by step process in creating a HDFS connection, Teradata Connection, Hive Connection, HDFS file object, Hive Table Object, Teradata object and pushdown (load balancing) optimization using Hive.
Confidential, Phoenix, AZ
Big Data Administrator
Responsibilities:
- Key participant in Big Data regular activities like patch management, cluster services, monitor critical parts of the cluster, account management, HDFS support and maintenance, code release management, data extraction/ import, Backup and restore, and capacity planning.
- Involving in all kinds of SYSDBA activities for the complete Teradata system
- Cloudera POC
- Setup Hadoop Cluster, Teradata-Hadoop connectivity, BI Connectivity, Test Backup & Restore/Network Capabilities - IO throughput/ Clusters on Virtual Machines/ Integration with AD, and Administration and Monitoring, In-memory Data grid capabilities, Schema on Read/ Agile Data discovery Capabilities
- Installed Hadoop and configured with Teradata database for data loading process
- Involved in Hadoop Cluster environment administration using Cloudera Manager that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and troubleshooting.
- Performed Hadoop backup, added libraries and jars successfully migrated from the existing infrastructure to latest releases.
- Pushed system log messages into HDFS with Flume
- Used Avro to store multiple small files
- Diagnosed and tuned performance problems includes identifying map-side data skew problems, reduce-side data skew problems, CPU contention, Memory swapping, Disk health issues, Networking issues and discovering hardware failures
- Documented Standards and Best Practices that includes File Formats, Compression, Partitioning, Bucketing, and De-Normalizing.
- Data Movement:
- Developed a standardized approach to use the right data load method for the right task. Some to the factors like timelines of data ingestion (Such as Batch, Near-Real time, Real time), Data extraction and processing (such as random row access, required data transformations), Source system and data structure formats.
- MapReduce:
- Used Combiner function to reduce the data transfer between map and reduce tasks and thereby increased the productivity
- Used split compression algorithms on the data set and as well as the intermediate output in order to reduce the space needed to store, and speed up data transfer across the network
- Used Distributed Cache for the lookup data distribution when the files are small.
- Picked the right compression codec for the data
- Hadoop Security:
- Primary participant in setting up Audit process in monitoring the alerts generated during the events and also probing and analyzing the logged activity. The alerts have been configured in the Cloudera Manager to notify the DBA Team comprising of but not limited to the below mentioned changes:
- Any configuration changes happening on the cluster
- Health status of the configured services such as HDFS, Hive, HBase, HUE etc
- Health status of the Servers configured in the Cluster
- Security related activities
- Involved in setting up security across Hadoop clusters which involved user support for LDAP as well as Linux owners and groups
- Setup and maintaining SSL and Kerberos
- Managed load balancers, firewalls in production environment
- Worked on different ticketing system like Manage Now, Infoweb, CMR, Tech XL and Service Now to handle different planned/unplanned maintenance activities and user issues
Confidential, Hartford, CT
Sr Teradata DBA
Responsibilities:
- Deployed LDAP in all Environments for three Line of Businesses (LOBs) (Personnel Insurance, Business Insurance, and Claims) as per Audit
- IP Filter Setup in Production for three LOBs using ipxml2bin utility as per Audit
- Cleaned up redundant roles and erroneous privileges in the roles, direct grant privileges as per Teradata Best Practices.
- Implemented Strong Password rule set for all the Generic Profiles in Three LOBs
- Teradata Client Installation on Linux, AIX, Solaris and Windows Servers and resolved the Product Support Issues
- Backup and Restore using Tivoli Storage Management System
- Debugged multiple issues with PDCR collection on daily jobs. Using PDCR tool kit analyzed the reports on system health
- Generated Audit weekly Reports to stop Direct Grant Users, unshielded Non Adhoc Profiles, Insecure Adhoc Users and other reports in Production for three LOBs
- Migrated the data from one environment to another environment using Perl Scripts
- Created Macros and Stored Procedures to Manage User Provision process, Role Management, Profile Management and Move the Space between databases
- Performed the Offshore Team lead role
- Active Participant in Teradata Upgrade from V12 to V13
- Created the User Provision Process to create the Users, Drop the Users, Grant the Role, and Revoke the Role using Perl Script and Autosys Sweep Job
- Trained the App DAs on Teradata View Point
- Developed the code to unlock/lock DBC, dbadmin and SU IDs
- On TASM rules, tweaked the workload definition for demoting the offending queries for better performance
- Defined exception criteria on several important metrics, like, CPU percentage with Qualification criteria, IO skew percentage, and demoting to Penalty box etc.
- Worked closely with defining and modifying the events for TASM states as well as introduced Event Combination event for several requirements to achieve better control over the system
- Recommended the best practice settings for the TASM intervals, especially Exception, Dashboard and Logging intervals.
- Worked on different TASM reporting and trending aspects to evaluate how well the workload management settings are helping to achieve goals, mining the workload trends using the tables TDWMsummarylog, TDWMexceptionlog, DBQL data etc.
- Released the locks on databases using Show locks
- Weekend Support on Critical Production Issues
- Aborted the blocked sessions in PMON
- Implemented Version DDL changes for all Environments in three LOBs using Perl Script.
- Creation of Database Groups, Database, and Schemas using Perl Script
- Scheduled the jobs using Autosys
- Worked on MIS and HPSM Ticket Management Systems
- 24X7 Support for production system
- Handling and Coordination with Global Support Center
Environment: Solaris10, Linux, AIX, Windows 32/64 bit, Teradata Database V12, Teradata Database V13, Teradata Administrator, Teradata Manager 12.0.0.5, Teradata SQL Assistant, BTEQ, TSET, Tivoli Storage Management, Teradata Arcmain.
Confidential, Bensalem, PA
Teradata DBA
Responsibilities:
- House Keeping Activities: Dropping and recreating the Objects, fixing the failures while applying the DDL version changes, Role Management & Profile Management
- Worked on changing the DBSControl Parameters and Co-coordinated with the Testing Team
- Monthly Database Cleanup & Suggestions to the users on cleanup initiatives
- Created Database Automation Script to create databases in different Environments using Stored Procedure.
- Generated different Space Reports in Teradata Manager to analyze different kind of issues.
- For better filter setup, analyzed the data logged by the TASM filters in warning mode and recommended the solution for the work on the system.
- Worked on Database Synch up across different Servers (Dev, Test, MO and Prod) using AtanaSuite SyncTool
- Used the AtanaSuite Delta Tool on comparing the objects through different environments.
- Migrated the data from one environment to another environment using Perl Scripts
- Suggestions for collecting the statistics during the migrations.
- Clean-up activities and worked on different kinds of Alerts.
- Responsible to improve the performance of the queries using Performance Tuning
- Monitoring database and handling security as per business requirement
- Granting & revoking the access to the users through Roles.
- Teradata Installation on Unix Box, Linux, Solaris and Windows Servers
- Released the locks on databases
- Upgrading the Teradata Versions
- Patch upgrades of the Teradata systems
- Set up Teradata Manager polices
- Implemented multi-value compression
- Managing resource and CPU utilization using Priority scheduler
- Review disk space management and monitoring procedure
- Shell scripts for automating common tasks
- Review current security setup, re-define user groups
- Backup and restore process using NetVault
- Design and Document monitoring processes of production databases
- Aborted the blocked sessions in PMON
- Worked on Clear Quest Ticket Management Systems
- Offshore Team Coordination.
- Handling and Coordination with Global Support Center (GSC).
Environment: Teradata V12.0, Teradata Manager, Teradata Administrator, PMON, BTEQ, Sql Assistant, AtanaSuite Tool, Clear Quest, Netvault, Datastage.
Confidential
Teradata DBA
Responsibilities:
- Installation of Teradata Client on Windows, and Unix Servers
- Creation of database and users
- Enrolling, Controlling and monitoring users access to the database
- Build tables, views, UPI, NUPI, USI and NUSI
- Database performance tuning
- Preparing and implementing backup strategy
- Created backup and restore jobs via NetBackup
- Regular database health check reports for Capacity planning
- Fixed different NetBackup drive down issues (frozen drive from configuration etc), debugged issues in TARA GUI
- Set up Teradata Manager polices
- Implemented multi-value compression
- Database reorganization
- Installed and setup Teradata Manager
- Implemented alert policy for both development and test system
- Database Hardening
- Upgrading the Teradata Versions
- Patch upgrades of the Teradata systems
- NetBackup server maintenance