Sr.Big Data Engineer Resume Raritan, NJ - Hire IT People

SUMMARY:

10+ years of experience in Information Technology with a strong background in Database development and Data warehousing
6 years of experience in Cloudera Hadoop administration activities like cluster upgrades/services, Configuration management, Installation and maintenance of Hadoop ecosystems including Cloudera Manager, HDFS, YARN, Hive, Impala, Kafka, MapReduce, Zookeeper, Oozie, Sqoop, Flume, Pig and Spark, Metadata (MySQL) backup and recovery, job scheduling and maintenance, code and data migration, configuration management, debugging, troubleshooting (Connectivity/ System/Data Issues), performance tuning, monitoring a Hadoop System using Nagios, Ganglia and Alerting mechanism.
Sound Knowledge in administering Hadoop clusters using Cloudera Manager including activities like deploy Hadoop cluster, add or remove services, add or remove nodes, keep track of jobs, monitor critical parts of the cluster, configure name - node high availability and schedule.
Lead the complete project end to end: Cloudera POC, Hortonworks POC, Hadoop integration with Teradata, Hadoop Cluster, Security enhancements and Upgrades
Good working experience in importing and exporting data in Hadoop Distributed File System.
Manage the backup and disaster recovery for Hadoop data.
Designed and documented Big Data best practices and standards, EDL (Enterprise Data Lake) Overview, onboarding process, and EDL Migration.
Advanced expertise in SQL, shell, Java, Perl and Python scripting
Good expertise in Teradata DBA activities like database maintenance activities, Teradata user and database creation / maintenance, Space allocation / maintenance, Performance monitoring using PMON, Teradata Manager, PDCR maintenance, DDL Change management, cleanup activities, crash dumps & snapshot dumps monitoring, upgrades, Clustering, IP Filter and LDAP implementation.
Experience in different ticketing systems like, ServiceNow, Infoweb, CMR, HPSM, MIS, ManageNow, TechXL and HPSD .

PROFESSIONAL EXPERIENCE:

Confidential, Raritan, NJ

Sr.Big Data Engineer

Responsibilities:

Primary Participant in Installation, cluster upgrades/services, Patch Management, day to day activities like monitor critical parts of the cluster, user access management, HDFS support and maintenance, role addition, code and data migration, Backup and restore, and capacity planning.
Hadoop Cluster Installation and Maintenance:
Worked on setting up high availability for major production cluster
Worked on installing cluster, starting Hadoop in pseudo-distributed mode and distributed mode, Adding new nodes to an existing cluster, safely decommissioning nodes, recovering from a NameNode failures, Monitoring cluster health using Ganglia, capacity planning, tuning MapReduce job parameters and slots configuration
Moving the Services (Re-distribution) from one Host to another host within the Cluster to facilitate securing the cluster and ensuring High availability of the services.
Primary participant in planning Hadoop Cluster includes planning, identifying the right hardware, network considerations and configuring Nodes.
Participated in upgrading cluster, which involved coordination with all ETL application teams, working with System Engineer and performing pre and post checks.
Responsible for Cluster Maintenance includes copying data between clusters, adding and removing cluster nodes, checking HDFS status, and rebalancing the Cluster
Worked on installation of Cloudera Manager and used all the features of the Cloudera Manager like configuration management, service management, resource management, reports, alerts, aggregated logging, and also involved in Hadoop (CDH) installation.
Primary participant in integrating Enterprise Data Lake (EDL) with the LDAP and setting up Authorization via the Sentry roles and ACL permissions and is specific to the business application to which the User/SA belongs. We configured standard set of roles includes Developer Role, ETL Role, Report Role, Support Role in the Sentry environment specific to the each application which will be leveraged for specific purpose.
Defined pre-defined Cloudera Manager roles (Full Admin, User Admin, Navigator Admin, BDR Admin, Cluster Admin, Operator, Configurator, Read-only, Auditor), which comes with pre-defined, set of access.
Defined pre-defined Cloudera Navigator Roles (Full Administrator, Lineage Viewer, Auditing Viewer, Policy Viewer, Metadata Administrator, Policy Administrator, User Administrator)
Used HUE applications to browse HDFS and jobs, manage a Hive metastore, run Hive, Cloudera Impala queries and Pig Scripts, browse HBase, export data with Sqoop, submitted MapReduce programs, build custom search engines with Solr, and scheduled repetitive workflows with Oozie.
Defined only two sets of distinct users (Administrators/Individual Users) w.r.t HUE
Decision making in necessity of the requirement of patch.
Based on the patch severity (critical/non-critical), deployed the necessity software by following the change management window deadlines
Worked in setting up Alerting mechanism which has been integrated with the Cloudera Manager and setup-alerting emails in case of any changes in the health status for the cluster related services or configuration/security related changes, which have been scheduled to send to the EDL Administration team.
Added new alerts based on the business requirement timely
Monthly Log Review: Performed Monthly log review of Audit Logs and security related events from the Navigator Audit Logs and other Security/Audit related logs to look for any suspicious activity on the cluster from both the user and admin related activities and shall report to the management to take the action accordingly.
Co-ordination with Application Leads: Worked closely with App Leads in addressing the impacts reported by the alert emails
Worked in HDFS management, upgrade process and rack management and consider NameNode memory considerations and involved in HDFS security enhancements.
Configured Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS
Installed and configured Hive, Impala, MapReduce, and HDFS
Installed Oozie workflow engine to run multiple Hive and Pig jobs and responsible for Oozie job scheduling and maintenance
Used Hive and Python to clean and transform geographical event data.
Used Pig and a Pig user-defined filter function (UDF) to remove all non-human traffic from a sample web server log dataset and used Pig to sessionize web server log data.
Troubleshooting user issues on services such as spark, Kafka, Solr, Hive, Impala
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Implemented UDFS, UDAFS, UDTFS in java for hive to process the data that can’t be performed using Hive inbuilt functions
Designed and implemented PIG UDFS for evaluation, filtering, loading and storing of data
Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Used Ganglia and Nagios to monitor system health and give reports to management
Checked configuration files, error messages and java exceptions to troubleshoot any Cluster startup issues, data node issues and task tracker issues
Used fsck (file system checker) to check for corruption in data node blocks and to clean unwanted blocks
Tweaked MAPRED configuration file to enhance the performance of the job.
Tuned Hadoop Configuration files to fix performance issues such as swapping and CPU Saturation issues
Documented EDL (Enterprise Data Lake) best practices and standards includes Data Management (File formats (Avro, Parquet), Compression, Partitioning, Bucketing, De-Normalizing), Data Movement (Data Ingestion, File Transfers, RDBMS, Streaming Data/Log files, Data Extraction, Data Processing, MapReduce, Hive, Impala, Pig, HBASE)
Documented on EDL Overview and services offered on EDL Platform includes Environments, Data Storage, Data Ingestion, Data Access, Security, Indexing,
Primary participant in designing EDL Migration form to capture technical information about the proposed migration includes Application Name, Ticket Number, Source Environment, Source Objects (Folders/ Scripts/ Database/ Table/ View/ UDF/ HBase/ SOLR etc.), Target Environment, Target Objects, Migrate date, functionality etc.
Defined Folder Structure template to create the directories (both hdfs and linux file system) for both structured data and as well unstructured data, which are to be used by the project team.
Worked with the management in defining the EDL onboarding process and built the intake form for the Application owner which is used to capture technical information about the proposed project for planning and implementation which includes Project/POC, Name, Start Date, Manger details, Team members, Cloudera EDL supported Technologies like Data Ingestion (Sqoop, SFTP, Spark Streaming, Flume) and Data Transformation (Pig, Map Reduce, Spark, Python Scripts, Informatica) and other Hadoop Services (Solr, Hive, Impala, HBase, and Spark), data size, etc.
Tested and documented the step by step process for Application users on how to download/Install/Run/ Testing Cloudera Impala ODBC Driver and connectivity to Tableau.
Tested and documented the step by step process in creating a HDFS connection, Teradata Connection, Hive Connection, HDFS file object, Hive Table Object, Teradata object and pushdown (load balancing) optimization using Hive.

Confidential, Phoenix, AZ

Big Data Administrator

Responsibilities:

Key participant in Big Data regular activities like patch management, cluster services, monitor critical parts of the cluster, account management, HDFS support and maintenance, code release management, data extraction/ import, Backup and restore, and capacity planning.
Involving in all kinds of SYSDBA activities for the complete Teradata system
Cloudera POC
Setup Hadoop Cluster, Teradata-Hadoop connectivity, BI Connectivity, Test Backup & Restore/Network Capabilities - IO throughput/ Clusters on Virtual Machines/ Integration with AD, and Administration and Monitoring, In-memory Data grid capabilities, Schema on Read/ Agile Data discovery Capabilities
Installed Hadoop and configured with Teradata database for data loading process
Involved in Hadoop Cluster environment administration using Cloudera Manager that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, and troubleshooting.
Performed Hadoop backup, added libraries and jars successfully migrated from the existing infrastructure to latest releases.
Pushed system log messages into HDFS with Flume
Used Avro to store multiple small files
Diagnosed and tuned performance problems includes identifying map-side data skew problems, reduce-side data skew problems, CPU contention, Memory swapping, Disk health issues, Networking issues and discovering hardware failures
Documented Standards and Best Practices that includes File Formats, Compression, Partitioning, Bucketing, and De-Normalizing.
Data Movement:
Developed a standardized approach to use the right data load method for the right task. Some to the factors like timelines of data ingestion (Such as Batch, Near-Real time, Real time), Data extraction and processing (such as random row access, required data transformations), Source system and data structure formats.
MapReduce:
Used Combiner function to reduce the data transfer between map and reduce tasks and thereby increased the productivity
Used split compression algorithms on the data set and as well as the intermediate output in order to reduce the space needed to store, and speed up data transfer across the network
Used Distributed Cache for the lookup data distribution when the files are small.
Picked the right compression codec for the data
Hadoop Security:
Primary participant in setting up Audit process in monitoring the alerts generated during the events and also probing and analyzing the logged activity. The alerts have been configured in the Cloudera Manager to notify the DBA Team comprising of but not limited to the below mentioned changes:
Any configuration changes happening on the cluster
Health status of the configured services such as HDFS, Hive, HBase, HUE etc
Health status of the Servers configured in the Cluster
Security related activities
Involved in setting up security across Hadoop clusters which involved user support for LDAP as well as Linux owners and groups
Setup and maintaining SSL and Kerberos
Managed load balancers, firewalls in production environment
Worked on different ticketing system like Manage Now, Infoweb, CMR, Tech XL and Service Now to handle different planned/unplanned maintenance activities and user issues

Confidential, Hartford, CT

Sr Teradata DBA

Responsibilities:

Deployed LDAP in all Environments for three Line of Businesses (LOBs) (Personnel Insurance, Business Insurance, and Claims) as per Audit
IP Filter Setup in Production for three LOBs using ipxml2bin utility as per Audit
Cleaned up redundant roles and erroneous privileges in the roles, direct grant privileges as per Teradata Best Practices.
Implemented Strong Password rule set for all the Generic Profiles in Three LOBs
Teradata Client Installation on Linux, AIX, Solaris and Windows Servers and resolved the Product Support Issues
Backup and Restore using Tivoli Storage Management System
Debugged multiple issues with PDCR collection on daily jobs. Using PDCR tool kit analyzed the reports on system health
Generated Audit weekly Reports to stop Direct Grant Users, unshielded Non Adhoc Profiles, Insecure Adhoc Users and other reports in Production for three LOBs
Migrated the data from one environment to another environment using Perl Scripts
Created Macros and Stored Procedures to Manage User Provision process, Role Management, Profile Management and Move the Space between databases
Performed the Offshore Team lead role
Active Participant in Teradata Upgrade from V12 to V13
Created the User Provision Process to create the Users, Drop the Users, Grant the Role, and Revoke the Role using Perl Script and Autosys Sweep Job
Trained the App DAs on Teradata View Point
Developed the code to unlock/lock DBC, dbadmin and SU IDs
On TASM rules, tweaked the workload definition for demoting the offending queries for better performance
Defined exception criteria on several important metrics, like, CPU percentage with Qualification criteria, IO skew percentage, and demoting to Penalty box etc.
Worked closely with defining and modifying the events for TASM states as well as introduced Event Combination event for several requirements to achieve better control over the system
Recommended the best practice settings for the TASM intervals, especially Exception, Dashboard and Logging intervals.
Worked on different TASM reporting and trending aspects to evaluate how well the workload management settings are helping to achieve goals, mining the workload trends using the tables TDWMsummarylog, TDWMexceptionlog, DBQL data etc.
Released the locks on databases using Show locks
Weekend Support on Critical Production Issues
Aborted the blocked sessions in PMON
Implemented Version DDL changes for all Environments in three LOBs using Perl Script.
Creation of Database Groups, Database, and Schemas using Perl Script
Scheduled the jobs using Autosys
Worked on MIS and HPSM Ticket Management Systems
24X7 Support for production system
Handling and Coordination with Global Support Center

Environment: Solaris10, Linux, AIX, Windows 32/64 bit, Teradata Database V12, Teradata Database V13, Teradata Administrator, Teradata Manager 12.0.0.5, Teradata SQL Assistant, BTEQ, TSET, Tivoli Storage Management, Teradata Arcmain.

Confidential, Bensalem, PA

Teradata DBA

Responsibilities:

House Keeping Activities: Dropping and recreating the Objects, fixing the failures while applying the DDL version changes, Role Management & Profile Management
Worked on changing the DBSControl Parameters and Co-coordinated with the Testing Team
Monthly Database Cleanup & Suggestions to the users on cleanup initiatives
Created Database Automation Script to create databases in different Environments using Stored Procedure.
Generated different Space Reports in Teradata Manager to analyze different kind of issues.
For better filter setup, analyzed the data logged by the TASM filters in warning mode and recommended the solution for the work on the system.
Worked on Database Synch up across different Servers (Dev, Test, MO and Prod) using AtanaSuite SyncTool
Used the AtanaSuite Delta Tool on comparing the objects through different environments.
Migrated the data from one environment to another environment using Perl Scripts
Suggestions for collecting the statistics during the migrations.
Clean-up activities and worked on different kinds of Alerts.
Responsible to improve the performance of the queries using Performance Tuning
Monitoring database and handling security as per business requirement
Granting & revoking the access to the users through Roles.
Teradata Installation on Unix Box, Linux, Solaris and Windows Servers
Released the locks on databases
Upgrading the Teradata Versions
Patch upgrades of the Teradata systems
Set up Teradata Manager polices
Implemented multi-value compression
Managing resource and CPU utilization using Priority scheduler
Review disk space management and monitoring procedure
Shell scripts for automating common tasks
Review current security setup, re-define user groups
Backup and restore process using NetVault
Design and Document monitoring processes of production databases
Aborted the blocked sessions in PMON
Worked on Clear Quest Ticket Management Systems
Offshore Team Coordination.
Handling and Coordination with Global Support Center (GSC).

Environment: Teradata V12.0, Teradata Manager, Teradata Administrator, PMON, BTEQ, Sql Assistant, AtanaSuite Tool, Clear Quest, Netvault, Datastage.

Confidential

Teradata DBA

Responsibilities:

Installation of Teradata Client on Windows, and Unix Servers
Creation of database and users
Enrolling, Controlling and monitoring users access to the database
Build tables, views, UPI, NUPI, USI and NUSI
Database performance tuning
Preparing and implementing backup strategy
Created backup and restore jobs via NetBackup
Regular database health check reports for Capacity planning
Fixed different NetBackup drive down issues (frozen drive from configuration etc), debugged issues in TARA GUI
Set up Teradata Manager polices
Implemented multi-value compression
Database reorganization
Installed and setup Teradata Manager
Implemented alert policy for both development and test system
Database Hardening
Upgrading the Teradata Versions
Patch upgrades of the Teradata systems
NetBackup server maintenance

Environment: MP-RAS Unix, Windows XP, Teradata V2R5, Teradata Manger, Teradata Administrator, Teradata SQL Assistant, BTEQ, DBQL, load/unload utilities, TSET, NetBackup.

We provide IT Staff Augmentation Services!

Sr.big Data Engineer Resume

Raritan, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship