Big Data Engineer/Administrator Resume Manhattan, NY - Hire IT People

SUMMARY

14+ years of solid experience in Information Technology with a strong background as a Technical Administrator/Architect/Engineer of Big Data, Data Warehouse, Data Analytics, and Managing Very large cluster environments
Around 7.5 years of experience in three Hadoop distribution areas (CDP/ Cloudera/ Hortonworks/ Azure HDInsight/ MapR) administration activities like cluster Builds/Upgrades, Configuration management, POCs, Installation and maintenance of Hadoop ecosystems including Cloudera Manager, Ambari, HDFS, YARN, Spark, Machine Learning, Hive, Hbase (NoSQL), Hue, Impala, Kafka, MapReduce, Zookeeper, Oozie, Solr, Sqoop, Flume, Pig, Chef, Puppet, Knox and Cloudera Navigator, Metadata (MySQL) backup and recovery, job scheduling and maintenance, code and data migration, debugging, Troubleshooting (Connectivity/Alerts/System/Data Issues), Performance tuning, Backup and recovery (BDR), monitoring a Hadoop System using Nagios, Ganglia, Machine Learning, Python Scripting, and Security setup and configuration includes Kerberos, Sentry and LDAP.
Lead the complete project end to end: CDP POC, Cloudera POC, Hortonworks POC, HDInsight POC, Hive LLAP POC, Waterline POC, StreamSets POC, Databricks POC, ADF POC (Data Factory), Unravel POC, Trifacta POC, whole Cluster Build, High Availability setup for multiple components, and configuration, Security Design, setup and Upgrades
Performed as a DevOps Systems Admin by supporting deployment and operations of various applications in Production and lower environments using Git, GitHub, Jenkins, Chef, Puppet, Ansible, Salt and Docker.
Good working experience in Azure HDInsight provisioning/Services and in - depth knowledge of HDInsight Cluster deployment includes Spark, HBase, Kafka, Hive LLAP, etc and expertise in monitoring, logging and cost management tools that integrate with HDInsight.
Experience in the successful implementation of ETL solutions on data extraction, transformation and load in Sqoop, Hive, Pig, Spark and HBase (NoSQL Database).
Developed/migrated numerous applications on Spark using Spark Core and Spark Streaming APIs in Java. Optimized Map Reduce jobs/SQL queries into Spark Transformations using Spark RDDs. Developed and integrated Spark Framework with Kafka Topics for real time streaming applications.
Designed and Documented Big Data Best Practices and Standards, EDL (Enterprise Data Lake) Overview, Step by Step Instructions on Cluster setup/upgrade/Adding/Decommission Nodes, Onboarding Process, Security Design Model, Performance Tuning, Failure Scenarios and Remedy, PROD to COB Discrepancies, and EDL Migration.
Advanced expertise in SQL, Python, Scala, Shell, Java, and Perl scripting
Innovative and risk taker with proven track record of addressing numerous issues which help steer organization toward success.

TECHNICAL SKILLS

Operating Systems: Unix, Linux, Solaris-UNIX, AIX, Windows XP/NT/2000

Hadoop Distribution: Cloudera, Azure HDInsight, MapR and Hortonworks

Big Data: HDFS, NFS, HBase, MapReduce, Cloudera Manager, Ambari, MapR Control System, Cloudera Navigator, Machine Learning, Nat Yarn, HUE, Hive, Impala, Pig, Sqoop, Flume, Kafka, ADF(Data Factory), Oozie, Zookeeper, Spark, PySpark, Storm, Ganglia, Nagios, Avro, AWS, DevOps Tools (Chef, Puppet), Kerberos, Knox, Ranger NiFi, Tez, BoKS, Sentry, LDAP, AD, Cassandra, MongoDB, REST APIs and Fabric

Teradata: Viewpoint, Data Mover, TMSM, Teradata Administrator, Teradata SQL Assistant, TASM, TSET, Teradata Visual Explain, Teradata Statistics Wizard, Teradata Index Wizard, Schmon, Tdwm, CTL/XCT, Lockdisp, Showlock, Qrysessn, TSTSQL, Vprocmanager, gsc tools, DBSControl, Ferret, rcvmanager, Update Space, Gateway Global, TPT, BTEQ, Fast Export, Fast Load, Multi Load and TPump

DevOps: Git, GitHub, Bitbuket, Jenkins, Chef, Puppet, Docker, CI/CD,Docker, Kubernetes (K8S), Splunk

Databases: HBase, Cassandra, MangoDB, PostgreSQL, Teradata DBMS, Oracle 9i, DB2, SQL Server

Languages & Others: SQL, Python, Ruby, Scala, Unix Shell Scripts, C, C++, Java, and Perl Scripts, Kerberos, APT, Sentry, Anaconda,JSON, XML and ACL

Ticket Management: Infoweb, ManageNow, mainframe, TechXL, IMR, MIS, HPSM, HPSD, ServiceNow

Backup Tools: Tivoli Storage Manager, NetVault, NetBackup, Teradata ARC

Data Integration Tools: Ab-Initio, Informatica, Talend

Others: Protegrity tools, GitHub, EventEngine, Autosys, ERwin, Clear Case, Autosys, MicroStrategy, Eclipse, Control-M, Clear Quest, AtanaSuite, WinMerge, UltraEdit, SecureShell, Tectia Client

PROFESSIONAL EXPERIENCE

Confidential, Manhattan, NY

Big Data Engineer/Administrator

Responsibilities:

Primary Participant in administrative activities includes HDInsight Spark/HBase/Kafka/Interactive Query (LLAP) Cluster deployment using ARM Templates and using Azure Portal, creating Resource Groups, NSG Rules, Scaling the Clusters, creating blob and ADLS Storage accounts, day to day operational activities like monitoring the jobs, giving recommendation to the skewed jobs, different service related issues, tuned multiple services like Yarn, Kafka, Impala, Spark, Hive, Performance Tuning and configuration checks, setup process for User Onboarding, and Backup metadata.
Played Architect role in setting up the process includes User Onboarding, Application Onboarding, setting up optimal SKU for Spark/HBase/Kafka Clusters from lower to higher environments, recommending different configuration setting changes from default to non-default values for multiple components like Spark2, MapReduce2, Hive, LLAP, and Queue Manager to enhance the cluster performance.
Handling and co-ordination with Microsoft/Hortoworks Support team and different production support team
Setup the Nagios alert system to collect all possible/reasonable metrics to alert us only on those that require an action
Setup Kafka Performance Flags and alerts to address the non-sync between producer and consumer
Setup OpsGenie notification system to send the notification to the on-call Infra Admin person
Install and configured Cloudbreak in a VM using Azure cloud resources
Setup multiple Cloudbreak blueprints, recipes, Management packs and configured external database for Ranger
Involved in setting up Azure subscription/ interactive Credential, and Vnet/Subnet
Setup ADLS Gen2 storage account with two filesystems storage-fs and logs-fs
Created Managed Identities for Data Lake Admin, Assumer, Ranger Audit Logger and Logger
Registered an Azure Environment
Created cluster template (blueprint) for our application using existing cluster template
Used Management Console to register the clusters and build new clusters
Used Data Hub to configure cluster topology (master, worker, and compute) and cloud storage for HDFS, Yarn and Zeppelin
Tested Auto scaling based on performance metrics
Used Replication Manager to register the existing clusters and copied the HDFS data
Scheduled Stop/Start the clusters with optimal performance

Environment: RHEL 6.5, CDP, HDP 1.3.1/2.1/2.5/3.0 , Ambari, Hive, Kafka, Sqoop, Tez, Storm, Python, Hbase, Teradata Query Grid, ZooKeeper, Oozie, Kerberos, Knox, Ranger, Pig, HBase, Avro

Confidential, Phoenix, AZ

Sr Big Data Analyst/Administrator

Responsibilities:

Primary participant in certifying Big Data product for use with in multi tenancy group, new cluster evaluation recommendations, RAM estimates and Hadoop cluster upgrade for: CDH5.48/CDH5.53 to CDH5.7
Active participant in performing COB (continuity of Business) Switchover includes COB Cluster Checkout, COB Cluster testing for all the components like Hadoop Cluster, MySQL, Flume, Datameer, Platfora, Data Ingestion, Talend, and Data Center Failover.
Setup/Configured/Documented Hive Metastore high availability (HA)
Primary participant in Sqoop setup and securing Sqoop2 Server and used Sqoop on Sentry enabled cluster.
Build the BDR Requirement template and COB Business recovery plan template.
Installed Kafka, Enabling SSL and High Availability on Kafka Brokers and ingested multiple cluster kafka topics into one cluster using Mirror Maker Service and handled multiple Kafka failure issues
Tuned Kafka Prod cluster by adjusting the configuration parameters like num.partitions etc.
Configured HBase to use HDFS High availability, cleanup split logs and added kafka topic to entity ‘mk consumer config’ in Hbase
Secured Sqoop2 Server by enabling Kerberos Authentication and SSL Encryption and resolved issues with Firewall while connecting to RDBMS to extract data via Sqoop2.
Upgraded Cloudera Manager patch 5.4.8 and documented the whole process.
Certified SPARK-SQL on CDH 5.7 by testing all the functionalities.
Enabled SPARK Encryption using Cloudera Manager for encrypting Spark data at rest, and data in transit.
Worked on Big Data Issues and documented in Big Data Issue Tracker
Resolved HUE related issues like User not able to login to Hue, Hue will not start in Cloudera Manager, Hue Daemon not starting, and how to change timezone for Hue logs, etc.
Tuned Yarn/Impala/Hive/Hue/Spark for Performance and involved in translating from MapReduce to Spark and memory tuning, Hive parameter set to use Spark engine
Tuned Spark Performance related issues includes SparkContext did not initialize, Spark Query running very Slow, Spark history Server log is not creating, NoClassDefFound Error, Failed to get Spark client, Long running Spark Executors for Hive Sessions, Spark Job aborted issue. Spark bugs and version compatibility issues, etc.
Involved in configuring Spark includes Configured Spark with Python 2.7, Configuring Spark Container and overhead settings on Yarn, and Configured RollingFileAppender.
Involved in security setup for Big Data includes Unix Security setup, Cloudera Manager Security, HDFS Security, MapReduce security, HBase Security, Hue Security, Sqoop and Flume Security, Port /URL security, Sentry setup, Port used and Cloudera Search Authorization with Sentry.
SPNEGO Web Authentication Configuration and Kerberos setup for Cloudera Manager and CDH.
Prepared Best Practice documentation includes Hive, Impala, Spark, Sqoop and Hadoop file formats and compression.
Primary Participant in documenting whole Application life cycle process includes Application onboarding, Technical Assessment, Performance Review and Reporting and involved in day to day administrative activities like monitoring/troubleshooting user issues and user access management.
Gathered and documented on Failure scenarios and remedy includes Software failures, manually moving Name node to a new node, add new data disks to a worker node, Name Node failures, Data Node failures, Cloudera Manager failures, Backup the PostgreSQL database, data ingestion issues, Hue Database failures, Hive Metastore Database Failures, Hardware/Rack/Master Server/Proxy Server/Switch/Disk failures, CPU Memory IO Spikes, and Kerberos Issues, etc.
Documented on PROD to COB discrepancies includes all the components.
Raised a case and coordinated with supporting engineer to resolve CPU saturation (100%) issue caused by python process, Cluster and Cloudera Manager issues and OOM issues.

Environment: CDH 5.7/5.8.3, RHEL, Cloudera Manager, Cloudera Navigator, Yarn, Impala, Hive, HUE, Kafka, Sqoop, Pig, HBase, Avro

Confidential, Manhattan, NY

Sr.Teradata System DBA/Big Data DBA

Responsibilities:

Involved in everyday DBA activities: monitoring system, housekeeping activities and change implementation
Resolved different issues on LDAP mechanism with different tools and prepared documentation on login procedures & issues on LDAP
Produced Audit Weekly Reports: To find Direct Grant Users, Using Functional ID logon from user workstation, NonLDAP Users, Users in NonLDAP Profiles, Users have rights greater than read privileges, etc on Production Databases
Initiated and lead the cleanup activities on extraneous privileges in the roles and direct grant privileges, Profiles as per Best Practices
Worked on different system maintenance activities like, restart, node down issues, performance issues and used concern utilities accordingly like CNS utilities, Checktable (level2), Packdisk, Scandisk, rcvmanager, qrysession, DBScontrol etc
Raised the issues to GSC on different Viewpoint portlets namely System health, query spotlight, TASM (workload designer/health/monitor), resolved different portal issues, modified TDPID for the system, migrated the passive Viewpoint server to Viewpoint active during disasters and calamities (Irene hurricane) to provide the ceaseless system monitoring
Worked on changing the states and events and event actions for client requirements.
Generated TASM reports based on the exception logs and dbql information
Involved in Sybase to Teradata Migration project includes performance analysis, overnight watch on the system alerts (using Netcool), RFB calls, team coordination and time to time report about the system and documentation
Worked on MorganStanely Halsey Data Center Power Down project, involved in Pre Power Down and Post Power Down tasks and issues
Created backup and restore jobs via Netvault & Netbackup and resolved different drive down issues(frozen drive from configuration etc), debugged issues in TARA GUI, added new policies and Netvault Device Management issues
Experience with AWS and SWS for the window setup before system force TPA restart
Lead the JVC(Production mirror image for testing) Flush Project for refreshing the data with PRODUCTION data
Recovered failed PDCR jobs due to different failures and documented the process
Generated different performance reports on poorly tuned queries and highly resource consuming queries and analyzed with ETL teams
Performed the significant role in total upgrade project from DBMS V12 to V13
Implemented Strong Password rule set for all the Generic IDs
Provided Teradata Client Installation packages to Techconnect team for the rollout of Teradata on all Windows Servers and resolved the different installation issues.
Conducted presentations to end users on Viewpoint usage on portlets like Query Monitor, Query Spotlight, System health, Capacity Heatmap
Co-operation with Teradata DBS team & GSC on numerous issues on Database, performance etc
Migrated the data from one environment to another environment and implemented Version DDL changes using Perl Scripts
Performed the Offshore Team lead role
24X7 support for production system

Software/Platform: Teradata Database V12 & 13, Linux, AIX, Windows 32/64 bit, Viewpoint V12, 13.11/12, Teradata Administrator, Teradata Manager, Teradata SQL Assistant, BTEQ, TSET, Netbackup, Netvault

Confidential, Hartford, CT

Teradata DBA

Responsibilities:

Involved in LDAP Mechanism in PERSONAL INSURANCE (PI) Line of Businesses (LOB’s) as per Audit rules
Cleaned up redundant roles and erroneous privileges in the roles, direct grant privileges as per Teradata Best Practices
Implemented Version DDL changes for all Environments in three LOB’s using Perl Script
Modified DBScontrol internal parameter 142 (DeleteLeftOverSpool) to automatically delete the left over spool when the system detects
Involved ETL teams to debug the performance issues with long running queries via DBQL reports and ResUsage reports.
Participated in IP Filter Setup in Production PI LOB using ipxml2bin utility as per Audit
Coordinated with Teradata performance analysis representative
Used different CNS utils, vprocmanager, qrysession, rcv to debug the issues and for maintenance activities
Managed the crashdumps accordingly with the coordination of Teradata
Using Viewpoint monitored the system health and query monitor portlet for blocking and other issues
Released the HUT locks on databases using Show locks and MLOAD locks and resolved 7446, 7449 errors
Assisted the DEV team on resolving the optimization issues with LOAD Utilities
Worked on Audit weekly Reports to stop Direct Grant Users, Insecure Non Adhoc Profiles, Insecure Adhoc Users and other reports in Production for three LOB’s
Working on Atana Sync Tool for the database synch up across different environments (DEV/TEST/PROD)
Aborted the blocked sessions in PMON
Product Support on ODBC connection issues
Worked on User Provision process (using macros & stored procedures) to manage Role Management, Profile Management, Space Allocation and User Management
Worked on the SU ID for DDL Version Changes in PRODUCTION ENVIRONMENT
DB2 support for the different version changes
Responsible for the documentation on LDAP log in process and issues with Audit
Created the tables, macros, procedures on adhoc basis
Weekend Support on Critical Production Issues
Implemented Strong Password rule set
Documentation on Access Rights and Space Allocation to track and streamline the user requirements
Worked on MIS and HPSM Ticket Management Systems

Software/Platform: Solaris10, Linux, AIX, Windows 32/64 bit, Teradata Database V12, Teradata Administrator, Viewpoint 12, Teradata Manager 12.0.0.5, Teradata SQL Assistant, BTEQ, TSET, Tivoli Storage Management, Teradata Arcmain.

We provide IT Staff Augmentation Services!

Big Data Engineer/administrator Resume

Manhattan, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship