Cassandra Database Analyst Resume
NC
SUMMARY
- Having 9 years of experience in all aspects of Software development including requirement analysis, development, implementation, documenting and maintenance of web applications using cassandra and Big Data.
- Strong knowledge and understandingof Cassandra, Hadoop HDFS &MapReduce conceptsand Hadoop Ecosystem.
- Knowledge on both development and adminstration skills of cassandra Framework.
- Experience in installation, configuration, supporting and managing Cassandra clusters.
- In depth knowledge of Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Experience in Big Data analysis using PIG and HIVE and understanding of SQOOP and Puppet.
- Experience in analyzing data using HiveQL, PIG Latin.
- Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data.
- Experience in managing Hadoop clusters using Cloudera Manager Tool.
- Good experience in analysis using PIG and HIVE and understanding of SQOOP.
- Experience working on NoSQL databases including HBase.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in database design using PL SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle 8i/9i/10g.
- Good experience in database performance tuning.
- Hands on experience with Shell Scripting and UNIX.
- Experience with Deployment Automation using: Jenkins, Git, Maven
- Experience in production support and application support by fixing bugs.
- Have very good communication / interpersonal skills and is capable of maintaining and working with team.
- Project management skills like schedule planning, Offshore Team management, and design presentation.
- Experience in working on Agile methodology
TECHNICAL SKILLS
Big Data / Hadoop: Apache Hadoop, Map Reduce, HDFS, HBase, Hive, Oozie, Sqoop. Cloudera Distribution of Apache Hadoop, IBM Infospheres, IBM Biginsights, cassandra
Other Technology: XML, XSLT, Maven, Jenkins
Languages: Java, C, C++, SQL, PL/SQL
Databases: MYSQL, MS Access, Oracle
Testing: Restclient, postman
PROFESSIONAL EXPERIENCE
Confidential
Cassandra Database Analyst
Responsibilities:
- Configured and maintained the Cassandra clusters
- Added and removed nodes from cluster using ATT specific client.
- Regualr monitoting of Cassandra cluster to see all nodes are up and normal
- Starting, shuttingdown and bouncing the applications and apis of the cluster.
- Experience using nodetool utility
- Backing up the data before any upgrades and restoring the data.
- Upgrading servers from 2.0.8 cassandra to 2.1.2 cassandra
- Checking logs for the errors occurred during launch of an application and debugging.
- Check cqlsh for the connectivity to the node.
- Use cqlsh commands to retrieve the data.
- Programming using JAVA to connect to Cassandra for any fixes and creating layers
- Create mavem projects for connecting to Cassandra.
- Testing the apis using restclient and postman.
- Creating the documentation using markdown and generating HTML files using aglio
- Uploading the code using GIT.
Environment: cassandra, JAVA, eclipse, Maven, Git, Markdown, HTML, Aglio, Putty, Linux, Oracle, MYSQL, Hive, PIG, Sqoop, Oozie.
Confidential, NC
Consultant Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Installed and configured Hadoop, MapReduce, HDFS (Hadoop Distributed File System), developed multiple MapReduce jobs in java for data cleaning.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Worked on installing cluster, commissioning & decommissioning of DataNodes, NameNode recovery, capacity planning, and slots configuration.
- Implemented NameNode backup using NFS. This was done for High availability.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Responsible for developing data pipeline using HDInsight, flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Use of Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Worked on importing and exporting data from Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase, MongoDB, and Cassandra.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Automated workflows using shell scripts pull data from various databases into Hadoop
- Deployed Hadoop Cluster in Fully Distributed and Pseudo-distributed modes.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Experience in AWS (Amazon web services)
Environment: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, MongoDB, Cassandra, Oracle, NoSQL and Unix/Linux.
Confidential, Dallas, TX
Big Data Hadoop Developer/Administrator
Responsibilities:
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Installed and configured Hadoop Map Reduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Applied MapReduce frameworkjobs in java for data processing by installing and configuring Hadoop, HDFS.
- Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a map reduce way.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the MapReduce jobs given by the users.
- Developed map reduce programs for applying business rules on the data.
- Used Sqoop to import and export data from HDFS to RDBMS and vice-versa.
- Created HBase tables to store various data formats of PII data coming from different portfolios Implemented Map-reduce for loading data from oracle database to NoSQL database.
- Exported data from DB2 to HDFS using Sqoop and NFS mount approach.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Used Cloudera Manager for installation and management of Hadoop Cluster.
- Wrote Pig scripts to run ETL jobs on the data in HDFS.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Moved data from Hadoop to Cassandra using Bulk output format class.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed and executed hive queries for denormalizing the data.
- Involved in the regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
- Automated the work flow using shell scripts.
- Performance tuning of the hive queries, written by other developers.
Environment: Hadoop, MapReduce, Pig, Hive, HBase, Oozie, HDFS, Sqoop, Oozie, Cloudera, Cassandra, NoSQL, DB2 and UNIX.
Confidential, TX
SAS Programmer/Analyst
Responsibilities:
- Worked closely with research team of scientists and Biostatisticians to analyze data, summarize tables and listing specifications.
- Developed project analysis plans, including table specifications, statistical analyses and report formats using BI Solutions
- Reviewed study Protocol, Annotated Case Report Form (ACRF), and performed validation of clinical trial data to identify illogical data entries.
- Review of specifications, mock tables and listings.
- Performed Statistical Analysis and generated reports using SAS/MACRO, SAS/ODS, Proc Report, Proc Print, Proc Summary, Proc Freq, Proc means, Proc tabulate and Proc SQL.
- Used Base SAS to perform sorting, indexing, merging of datasets and generated reports.
- Modified existing SAS programs using SAS macro variables to improve the ease, speed and consistency of the results.
- Extensively used SAS DICTIONARY tables to get updated information on datasets.
- Extensively used BASE SAS functions like ROUND, SCAN, INDEX, SUBSTR, TRIM, LENGTH, PUT, INPUT, DATE, MAX, MIN and MEAN.
- Used SAS/ODS to generate the Statistical reports in HTML format.
- Used SAS macros functions to simplify the process and to get consistent results.
- Developed routine SAS Macros to create tables, graphs and listings for inclusion in clinical study reports and regulatory submissions and maintained existing ones
Environment: SAS, SQL, SAS MACROS, STAT
Confidential
SAS Programmer/Analyst
Environment: SAS/BASE, SAS/MACROS, SAS/ACCESS, SAS/STAT, SAS ODS,Windows.
Responsibilities:
- Analyzed Phase II and III Clinical Trials through SAS programming and by providing statistical support to statisticians and Biostatisticians.
- Created CRT (Case Report Tabulations) datasets using ODM model of CDISC standards for submissions to the FDA.
- Successfully created Tables, Listings and Graphs using various procedures like Proc Report, Proc Tabulate, Proc Plot, and Proc Gplot.
- Extensively used SAS BI tool for generating BI solutions
- Created SAS Macros and modified the existing ones relating to multiple studies.
- Produced Tables, Listings and Graphs from Integrated Summaries of Efficacy (ISE) and Safety (ISS).
- Effectively and timely contacted data management head of the respective study about the various Data Issues and resolved the queries through meetings.
- Maintained appropriate study application documentation.
- Performed Program Documentation on all programs, files and variables for accurate historical record and for future reference.
- Optimized performance using Data Validation and Data Cleaning on Clinical Trial Data.
- Involved in writing the SAS codes to help in the process of Quality control by implementing various statistical procedures like Proc freq, Proc means, Proc uni-variate and other procedures like Proc Summary, Proc Transpose, Proc SQL and Proc print.
- Successfully validated study TLG’s and CRT’s through independent validation using Proc compare and departmental standard macros.
Environment: SAS, BASE SAS, MACROS, STAT, GRAPH
Confidential
Jr.JAVA developer
Responsibilities:
- Gathered requirements for the project and involved in analysis phase.
- Developed quick prototype for the project so as to aid business in deciding the necessary ramifications to the requirements.
- Created UML class and sequence diagrams using Rational Rose.
- Designed and created user interactive front-end screens using JavaScript, HTML and JSP's.
Environment: Java, HTML, Oracle, SQL