Cassandra Data Architect Resume

SUMMARY:

Passionate technologist seeks challenging work at the intersection of database, unix, storage and automation. Over 20 years of Enterprise experience as a Senior Bigdata Engineer/ Database Architect/Engineer/Administrator, Unix Engineer/Administrator, I want to design and build the next generation high performance systems. I am seeking an interesting environment that will allow me to work on the latest technology while continuing to grow and develop my skills.
Over 3 years of experience on Big Data tools (Hadoop, Spark ..)
Over 5 years of experience as Cassandra (Datastax)
Over 2 years of experience on AWS
Expertize with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Flume, Spark, Kafka, Yarn, Oozie, and Zookeeper.
Extensive working experience with broad range of AWS Cloud Services like EC2, ELB, AutoScaling, VPC, Route53, RDS, S3, IAM, Elasticsearch and CloudWatch. Exposure to DynamoDB.
Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa as per the business requirement.
Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
Extensively used Docker containers for testing Cassandra
Good working experience in using Spark SQL to manipulate Data Frames in Python.
Managed structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure by using Cassandra.
Experience in handling messaging services using Apache Kafka.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, effective & efficient Joins, Transformations and other during ingestion process.
Exposure to Python libraries Pydoop, Pyspark, Numpy, Pandas, kNN, K-Means, and Matplotlib for writing programs to handle big data
Developed Scala scripts, UDF’s using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persist to Cassandra. Used Cassandra Connector to load data to and from Cassandra.
Experienced in providing security to Hadoop cluster with Kerberos and integration with LDAP/AD at Enterprise level.
Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements
Familiarity and experience with data warehousing and ETL tools.
Extensively used Devops tools like Puppet, Ansible, GIT, Jenkins, SBT, Packer, Terraform.
Extensively worked on Unix, Linux administration, EMC XtremIO, VMAX 2/3/4, Oracle FS1 administration.
Over 15 years of experience on Oracle DBA
Over 20 years of experience on Unix, Linux, Storage, Scripting and Automation

TECHNICAL SKILLS:

DB Admin: Oracle 12.x/11.x/10.x/9.x/8.x, Cassandra, HBASE, GoldenGate, OEM 12.x

Programming: Python, Scala, Shell scripting (KSH), Perl, PHP, PL/SQL, HTML, C, Java

Big Data Tools: Hadoop, Spark, Hive, Pig, HDFS, Pyspark, Kafka, Sqoop, Flume, Solr

AWS: EC2, ELB, AutoScaling, VPC, Route53, RDS, S3, IAM, Elasticsearch and CloudWatch, DynamoDB

DevOps Tools: Puppet, Ansible, GIT, Jenkins, SBT, Packer, Terraform

OS Admin: Redhat, OEL, Solaris, Aix, Windows Mac OS X, VMWare ESX, Centos, Ubuntu

Storage Admin: EMC XtremIO, VMAX AFA/3/2, VNX, Oracle FS1, Celerra

Backup Tools: RMAN, Simpana, Veritas Netbackup, Legato Networker, EMC DD boost

PROFESSIONAL EXPERIENCE:

Cassandra Data Architect

Confidential

Responsibilities:

Working as Cassandra Admin/Architect (Datastax DSE--NoSQL DB) on 54 nodes prod and QA and equal number of nodes in DR cluster.
Validating DSE Graph database. Implemented TDE for data at rest on DSE Cassandra.
Cassandra Cluster planning which includes Data sizing estimation, and identify hardware requirements based on the estimated data size and transaction volume
Designing, creating, Running and monitoring spark job on cassandra cluster for validation, loading from Oracle RDBMS
Working closely with Application team to resolve issues related to spark, cql and loading issues etc
Analyze Solr indexing requirements and evaluate the impact to overall system.
Datastax Enterprise 4.8 Installations & multi data center DSE Cluster Setup on multiple environments with solr & spark enabled.
cql queries to support integration with Cassandra cluster.
Documentation of Design Documents, Data dictionaries, Implementations, Operational strategies, standards and policies.
Administration, Maintenance and Supporting of the cluster using Opscenter,Devcenter,Linux, Spark,Nodetool etc.
Data migration from Oracle to Cassandra.
Working on Opscenter(monitoring),Devcenter,Nodetool,Spark.Validating data using Spark.
In memory data comparison between Exadata and Cassandra using Spark.Resolving spark validation issues, creating solr and secondary indexes for faster response and Tuning.
Cassandra Data modelling,NoSql Architecture, DSE Cassandra Database administration- Keyspace creation,Table creation, Secondary and Solr index creation,User creation &access administration.
Working closely with Datastax to resolve issues on cluster using ticketing mechanism.Nodetool repair, Compaction,Secondary index issues resolution.
Query tuning & performance tuning on cluster & suggesting best practice for developers.
Automation and configuration using Puppet tool.
Working closely with Cassandra loading activity on history load and incremental loads from Oracle Databases and resolving loading issues and tuning the loader for optimal performance.
Working closely with Datastax support Team to resolve user query, loader and spark query issues.

Expert Big Data Architect

Confidential

Responsibilities:

Completely automated Cassandra cluster deployments on AWS using Terraform templates, Cloud formation templates and puppet.
Involved in data modeling for applications on Cassandra and migrating databases from Oracle to Cassandra.
Written CQL and python Scripts on DataStax for Service implementation
Cassandra Cluster planning which includes Data sizing estimation, and identify hardware requirements based on the estimated data size and transaction volume. Cassandra deployments in Single, multi datacenter with spark and solr enabled.
Designing, creating, executing and monitoring spark job on cassandra cluster for validation, loading from Oracle database.
Good experience in using DataStax Enterprise (DSE) and OpsCenter for cluster administration and monitoring
Working closely with Application team to resolve issues related to spark, cql and loading issues, Writing CQL queries as per business requirements.
Analyze Solr indexing requirements and evaluate the impact to overall system.
Query tuning & performance tuning on cluster & suggesting best practice for developers.
Ingestion into Kafka using Spark on Cassandra for Micro batching.
Working closely with Cassandra loading activity on history load and incremental loads from Oracle Databases and resolving loading issues and tuning the loader for optimal performance.
COE and given training on DataStax Caasandra, data Modeling and Spark, Solr.
Involved in defining and documenting best practices for Cassandra, migrating application to Cassandra database from the legacy platform for Choice, upgraded Cassandra from 2.0 to 2.2.x
Architected the database migration from managed services to Confidential owned data center in Virginia.
Completely automated Oracle installation for standalone and RAC with patch bundles using Oracle cloning.
Developed toolset to automatically create ASM disks/diskgroups, RAC setup, and DB creations.
Developed toolset to deploy Oracle Dataguard automatically.
Implemented Transparent Data Encryption (TDE) for Oracle databases at tablespace level for encrypting the data at rest. Implemented Oracle listener SSL for encrypting data at transit.
Implemented Goldengate trail file encryption for securely transferring the GG trail files across datacenters.
Architected the migration plan for SAP database migrations from Oracle 11.x on HP-UX to Oracle 12c on OEL 7.x.
Developed a fully automated toolset for migrating underlying storage for Oracle ASM diskgroups.
Designed and developed the backup toolset for Oracle database backups, the tool is a fully customizable using config file, centralised scheduling, uses Oracle catalog, web based reporting and centralised logging, error reporting, backup performance monitoring, daily backup status reporting. Some of the multi terabyte DB’s backed from DR using the DG replicated data.
RMAN backups are automatically synced to AWS S3 bucket for long term retention and automatically expire after retention period.
Implemented EMC DD boost for Oracle RMAN backups.
Architected and implemented Delphix VDB’s for deploying 600 databases in dev.
Scripted a toolset that real time alerts for any file changes across the database landscape, the alerting includes details on what changed and who changed, and where they logged in from.
Implemented Oracle auditing across the Oracle DB environments, also written some customized triggers that audit/alert based on certain business requirements.
Written a php/python centric web based tool that can be used by business to upload an encrypted zip file, that gets populated to Oracle database once the user has been authenticated across the corp ldap servers.
Written Oracle triggers/procedures that captures the change data, convert to Json and transfers to NOSQL database.
Written php, shell scripts that gathers the usage/stats data through REST calls from EMC XtremIO clusters and publishes as webpages
Written web based tool to selectively reset application passwords, as well as in bulk.
Written a modular web based tool for managing and monitoring Oracle RAC scan-ip’s, RMAN backup’s, ASM DG usage, DB growth patterns, tablespace monitoring, filesystem, inode monitoring, db logins monitoring, DG lag monitoring, RMAN backups to S3, db lock monitoring and few others.

Senior Bigdata Architect

Confidential

Responsibilities:

Involved in Capacity planning, Data modeling, Hardware designing and initial configuration of Apache and DataStax Cassandra clusters in 2-multi data center configuration.
Configured SSL communication between Cassandra nodes, as well as Cassandra node to clients.
Configured Cassandra user authentication across LDAP.
Performed rolling upgrades, adding and dropping nodes, decommissioning nodes, manual compactions, repairs
Hands on experience in configuring snapshot and incremental backups on Cassandra databases.
Migrated high-volume OLTP transactions from Oracle to Cassandra in order to reduce Oracle licensing footprint
Fine tuning config files to getter better read performance, consistent levels and quorums
In depth knowledge of architecture, core concepts and creating Cassandra database systems,
Monitor Cassandra system logs, opscenter, opscenter-agent logs for errors. Manage and setup alerts in opscenter monitoring tool.
Expertise in Managing Hadoop clusters administration/Hbase including setup, install, monitor, maintenance and operational support for distribution: Cloudera CDH3, CDH4,CDH5
Involved in setting up High availability solutions to Hadoop cluster and HBase
Set up standards and processes for Hadoop based application design and implementation.
Involved in loading data from Unix file system to HDFS. Importing and exporting data into HDFS and Hive using Sqoop
Implemented Partitioning, Dynamic Partitions, Buckets in Hive
Created HBase tables to store various data formats of incoming data from different portfolios.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile devices and pushed to HDFS.
Extending Hive and Pig core functionality by writing custom UDFs.
Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
Implemented Namenode HA in all environments to provide high availability of clusters
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Senior Database High Availability Engineer

Confidential

Responsibilities:

Worked on database provisioning and automation using OEM12c. Extracting custom information using MGMT views and EMCLI. Developed deployment procedures.
Certified and fully automated a backup solution for Large Oracle databases (Multi Terabytes in size), backing up the databases on the DR side using SRDF, TimeFinder Snaps, RMAN and Veritas Netbackup.
Developed scripts automating silent RAC installs on new hardware clusters.
Designed and built Linux RPMs for oracle software distribution for oracle database, ASM, CRS, and Grid software with patches applied. Develop and execute test plans. Wrote certification documentation.
Automation of processes and writing custom tools for database support in shell and perl. Built reporting tools with perl and SQL.
Work with oracle support mos/metalink to report bugs and support Provide third level support for critical escalations from production.
Worked with Storage and Linux engineering to build multi-site HA/DR solution using EMC SRDF SAN snapshot replication.
Review quarterly Oracle CPU releases with security team to address critical bugs.
Developed a fully automated Oracle Transport tablespace, TimeFinder Clones, Veritas Vxvm based solution for making large chunk of data available to BW database on a daily basis for reporting.
Extensively worked on Active DataGuard, SnapShot standby database, FSFO
Developed an automated tool for managing the DG’s prior to 11gr1 that can be synced up to a certain SCN and open read-only for reporting and back to standby as per the business requirements.
Developed a fully automated tool that deploys the GoldenGate replication with a single command.
Deployed GoldenGate in Multi-Master replication, cascaded replication, cross database replications (only worked on customizing the Oracle side). Used datapump for initial sync, as well Oracle Datapump.
Developed a fully automated GoldenGate configuration tool to deploy GoldenGate on CRS for high availability, the solution works similar to Veritas Cluster, developed CRS agents for GoldenGate and Filesystem.
Developed a GoldenGate Monitoring tool for alerting on lags, failures and issues.
Setup, configure OEM 12c, upgraded OEM from 12.1.0.1 to 12.1.0.3 on Oracle ODA's, imported custom deployment procedures using partool.
Developed a OEM 12c batch that collects details on newly added databases and sync’s with the legacy user authorization systems, the batch job also collects any local id’s that exist on the databases and updates the details in legacy user authorization system.
Developed a tool based on partool that will deploy the DP’s from engineering OEM 12c environment to other OEM 12c environments across the bank.
Automated Database binary’s deployment across db hosts managed by oem 12c
Automated patch deployment, Monitoring and alerting.DB provisioning, DG provisioning, RAC provisioning. extensively used emcli.
Participated in analysis and designing of Oracle Exadata Server for infrastructure and related projects, worked on migrations to Exadata platform.
Well versed in creating Linux RPM’s, Solaris packages and AIX packages.
Good Experience in Working in Exadata Environment X3-2 and managing databases in Exadata x3-2 and familiar with features in Exadata
Involvement in space management, rectification of lock problems, managed quotas.
Performance Tuning of Oracle database using OEM GC, Cost optimizer, SQL analyzer, Performance Manager, Statspack, AWR, ADDM, ASH, RDA, Explain plan (10046), TKPROF and SQL TRACE.
Responsible in optimizing database performance by analyzing database objects, generating statistics, creating indexes, creating materialized views, AWR, ADDM, ASH
Planned and Implemented cross platform upgrades from 11g to 12c using Golden Gate
Extensively used Golden Gate during migration and troubleshooting issues
configured, automated and implemented physical RMAN full and incremental backups (differential, cumulative) for all databases to recover from failures.
Responsible for capacity planning, recovery planning and installing patches
Responsible in installing and setting-up Golden Gate from source to target to provide active data replication
Configured multiple scripts for starting Goldengate processes during server reboots and automated scripts for auto restart ogg process on servers
Worked on x3-2, Implemented dataguard between Exadata machines, setting up databases parallelism and configured scripts for monitoring standby databases
Responsible in switching over/switching back to Physical standby database
Extensively used puppet for automating routine tasks, and enforcing configurations.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship