We provide IT Staff Augmentation Services!

Cassandra Database Analyst Resume

3.00/5 (Submit Your Rating)

NC

SUMMARY

  • Having 9 years of experience in all aspects of Software development including requirement analysis, development, implementation, documenting and maintenance of web applications using cassandra and Big Data.
  • Strong knowledge and understandingof Cassandra, Hadoop HDFS &MapReduce conceptsand Hadoop Ecosystem.
  • Knowledge on both development and adminstration skills of cassandra Framework.
  • Experience in installation, configuration, supporting and managing Cassandra clusters.
  • In depth knowledge of Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
  • Experience in analyzing data using HiveQL, PIG Latin.
  • Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
  • Experience in managing Hadoop clusters using Cloudera Manager Tool.
  • Good experience in analysis using PIG and HIVE and understanding of SQOOP.
  • Experience working on NoSQL databases including HBase.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Experience in database design using PL SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle 8i/9i/10g.
  • Good experience in database performance tuning.
  • Hands on experience with Shell Scripting and UNIX.
  • Experience with Deployment Automation using: Jenkins, Git, Maven
  • Experience in production support and application support by fixing bugs.
  • Have very good communication / interpersonal skills and is capable of maintaining and working with team.
  • Project management skills like schedule planning, Offshore Team management, and design presentation.
  • Experience in working on Agile methodology

TECHNICAL SKILLS

Big Data / Hadoop: Apache Hadoop, Map Reduce, HDFS, HBase, Hive, Oozie, Sqoop. Cloudera Distribution of Apache Hadoop, IBM Infospheres, IBM Biginsights, cassandra

Other Technology: XML, XSLT, Maven, Jenkins

Languages: Java, C, C++, SQL, PL/SQL

Databases: MYSQL, MS Access, Oracle

Testing: Restclient, postman

PROFESSIONAL EXPERIENCE

Confidential

Cassandra Database Analyst

Responsibilities:

  • Configured and maintained the Cassandra clusters
  • Added and removed nodes from cluster using ATT specific client.
  • Regualr monitoting of Cassandra cluster to see all nodes are up and normal
  • Starting, shuttingdown and bouncing the applications and apis of the cluster.
  • Experience using nodetool utility
  • Backing up the data before any upgrades and restoring the data.
  • Upgrading servers from 2.0.8 cassandra to 2.1.2 cassandra
  • Checking logs for the errors occurred during launch of an application and debugging.
  • Check cqlsh for the connectivity to the node.
  • Use cqlsh commands to retrieve the data.
  • Programming using JAVA to connect to Cassandra for any fixes and creating layers
  • Create mavem projects for connecting to Cassandra.
  • Testing the apis using restclient and postman.
  • Creating the documentation using markdown and generating HTML files using aglio
  • Uploading the code using GIT.

Environment: cassandra, JAVA, eclipse, Maven, Git, Markdown, HTML, Aglio, Putty, Linux, Oracle, MYSQL, Hive, PIG, Sqoop, Oozie.

Confidential, NC

Consultant Hadoop Developer

Responsibilities:

  • Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
  • Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
  • Implemented NameNode backup using NFS. This was done for High availability.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
  • Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
  • Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
  • Created Hive External tables and loaded the data in to tables and query data using HQL.
  • Wrote shell scripts for rolling day-to-day processes and it is automated.
  • Automated workflows using shell scripts pull data from various databases into Hadoop
  • Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Experience in AWS (Amazon web services)

Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.

Confidential, Dallas, TX

Big Data Hadoop Developer/Administrator

Responsibilities:

  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
  • Installed and configured Hadoop Map Reduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
  • Applied MapReduce frameworkjobs in java for data processing by installing and configuring Hadoop, HDFS.
  • Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a map reduce way.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
  • Developed map reduce programs for applying business rules on the data.
  • Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Created HBase tables to store various data formats of PII data coming from different portfolios Implemented Map-reduce for loading data from oracle database to NoSQL database.
  • Exported data from DB2 to HDFS using Sqoop and NFS mount approach.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Wrote Pig scripts to run ETL jobs on the data in HDFS.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Moved data from Hadoop to Cassandra using Bulk output format class.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed and executed hive queries for denormalizing the data.
  • Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
  • Automated the work flow using shell scripts.
  • Performance tuning of the hive queries, written by other developers.

Environment: Hadoop, MapReduce, Pig, Hive, HBase, Oozie, HDFS, Sqoop, Oozie, Cloudera, Cassandra, NoSQL, DB2 and UNIX.

Confidential, TX

SAS Programmer/Analyst

Responsibilities:

  • Worked closely with research team of scientists and Biostatisticians to analyze data, summarize tables and listing specifications.
  • Developed project analysis plans, including table specifications, statistical analyses and report formats using BI Solutions
  • Reviewed study Protocol, Annotated Case Report Form (ACRF), and performed validation of clinical trial data to identify illogical data entries.
  • Review of specifications, mock tables and listings.
  • Performed Statistical Analysis and generated reports using SAS/MACRO, SAS/ODS, Proc Report, Proc Print, Proc Summary, Proc Freq, Proc means, Proc tabulate and Proc SQL.
  • Used Base SAS to perform sorting, indexing, merging of datasets and generated reports.
  • Modified existing SAS programs using SAS macro variables to improve the ease, speed and consistency of the results.
  • Extensively used SAS DICTIONARY tables to get updated information on datasets.
  • Extensively used BASE SAS functions like ROUND, SCAN, INDEX, SUBSTR, TRIM, LENGTH, PUT, INPUT, DATE, MAX, MIN and MEAN.
  • Used SAS/ODS to generate the Statistical reports in HTML format.
  • Used SAS macros functions to simplify the process and to get consistent results.
  • Developed routine SAS Macros to create tables, graphs and listings for inclusion in clinical study reports and regulatory submissions and maintained existing ones

Environment: SAS, SQL, SAS MACROS, STAT

Confidential

SAS Programmer/Analyst

Environment: SAS/BASE, SAS/MACROS, SAS/ACCESS, SAS/STAT, SAS ODS,Windows.

Responsibilities:

  • Analyzed Phase II and III Clinical Trials through SAS programming and by providing statistical support to statisticians and Biostatisticians.
  • Created CRT (Case Report Tabulations) datasets using ODM model of CDISC standards for submissions to the FDA.
  • Successfully created Tables, Listings and Graphs using various procedures like Proc Report, Proc Tabulate, Proc Plot, and Proc Gplot.
  • Extensively used SAS BI tool for generating BI solutions
  • Created SAS Macros and modified the existing ones relating to multiple studies.
  • Produced Tables, Listings and Graphs from Integrated Summaries of Efficacy (ISE) and Safety (ISS).
  • Effectively and timely contacted data management head of the respective study about the various Data Issues and resolved the queries through meetings.
  • Maintained appropriate study application documentation.
  • Performed Program Documentation on all programs, files and variables for accurate historical record and for future reference.
  • Optimized performance using Data Validation and Data Cleaning on Clinical Trial Data.
  • Involved in writing the SAS codes to help in the process of Quality control by implementing various statistical procedures like Proc freq, Proc means, Proc uni-variate and other procedures like Proc Summary, Proc Transpose, Proc SQL and Proc print.
  • Successfully validated study TLG’s and CRT’s through independent validation using Proc compare and departmental standard macros.

Environment: SAS, BASE SAS, MACROS, STAT, GRAPH

Confidential

Jr.JAVA developer

Responsibilities:

  • Gathered requirements for the project and involved in analysis phase.
  • Developed quick prototype for the project so as to aid business in deciding the necessary ramifications to the requirements.
  • Created UML class and sequence diagrams using Rational Rose.
  • Designed and created user interactive front-end screens using JavaScript, HTML and JSP's.

Environment: Java, HTML, Oracle, SQL

We'd love your feedback!