Data Analyst Resume Profile
MA
Objective
Seek a SAS programmer or data analyst position.
Summary
- Expertise on SAS Data Step and SQL Oracle PL/SQL and SQL R Validating SQL code.
- Strong ability to write flexible and reusable code using SAS Macros
- Proficient to write, test, debug, document, and implement complex SAS programs to extract data from a variety of data sources and file formats, such as simple ASCII flat files flat files with multiple layouts, XML Excel file.
- Expertise on predictive modeling using logistic and linear regression and SAS Enterprise Miner.
- Ability to handle big data sets which size go up to 150 GB and 100 million observations
- Experience with Queries Inner and Outer Joins Merging updating large data sets
- Data mining and text mining to retrieve key word to improve service.
Technical Skills:
- SAS Skills: SAS BASE, STAT, ACCESS, MACROS, CONNECT, GRAPH .
- Languages: R, Matlab, PL/SQL, SQL, Toad
- Databases: Oracle database, MS Access.
Professional Experiences
Confidential
Data Analyst
- Working in MVP Most Valuable Population project as database management and a PL/SQL developer
- Manage the Oracle data warehouse and database. Created PL/SQL search engine using Oracle Text and regular expression for data matching.
- Wrote complex PL/SQL. Designing and modifying of Database tables, used Collections, Bulk Binds to improve performance by minimizing the number of context switches between the PL/SQL and SQL engines. Data loading and export using PL/sql within Toad.
Project 1: Database Normalization and Standardization
SAS dataset were pulled from Oracle data warehouse which contains the information, such as street / email / phone number. The goal is to standardize the database which is complicated dirty. I wrote PL/SQL function to extract the substring to get typical properties using Oracle analytic function, using lookup table and regular expression, such as Regexp Replace, Regexp Substr, among Regexp instr to standardize the street, email address and phone number.
Confidential
SAS Programmer
- Collecting online questions and loading into Oracle database, using SQL and SQ/SQL retrieve keyword to improve service. Finding the most common questions.
- Techniques: Dataset preparation, Predictive binary multiple Logistic regression multiple linear regression, logistic regression modeling, Proc logistic, Proc Mixed and Proc glm model were used
- SAS Dataset: retail -banking datasets were pulled from Oracle data warehouse, about 35,000 observation rows and 50 variables columns which is bank's other products, most input are interval or binary, but 2 of input are nominally scaled.
- Binary target: indicate whether customer has insurance product 1 yes,0 no
- Process: Both redundancy and irrelevant variables were eliminated by using Varclus and Corr procedures respectively before variables selection technique in the Logistic procedure, then only 12 of 50 variables were selected by Proc Logistic, the interaction among continuous and categorical predictors will be included if necessary. After models validating, the visualization techniques, such as ROC and LIFT graph, were used to present or compare models. The results presented using odds and odds ratio will be interpreted to probability.
Confidential
Data Analyst
- Clinical trial data set, Create summary of demographics report and summary of adverse events by maximum severity table. Data step, data null , SQL, Proc Report, and macro array variables combined p - value and descriptive statistic. Tables were exported to pdf or html format using ODS.
- Calculate change-from-baseline, LOCF Last Observation Carrier Forward ,
- Create clinical trial graphs using Proc gplot, such as scatter plot, bar chart and box plot. SG graph procedure, including sgpanel, sgplot and sgscatter was chosen to present my results.
- Mapping data into SDTM standard.
- Running multiple projects. Studied the correlations between autism disease and brain neural activities using GLM model.
- Produce descriptive statistics such as mean, median and standard deviation using SAS. One way ANOVA with Dunnett's post hoc test was used to compare multiple group means. Overlay ROC graph present the difference of multiple group using Kolmogorov-Smirnov KS test. Unbalanced two-way ANOVA with Bonferroni's post-hoc test was used to compare multiple group variances.
- Migrating data among SAS, Oracle and ACCESS.
- Integrating data using SAS data step and Proc SQL.
- Develop macros, such as writing observation into array of macro writing variables name into macro array.
- Developed SAS programs for data cleaning, validation.
- Develop and initiate moderate to complex SAS programs, and using DATA steps, macros, merging, do loops, arrays, PROC FREQ, PROC MEANS, PROC REPORT, if/then/else logic, and other SAS procedures to manipulate, summarize and validate the data for output.
- Attended team meetings, presented reports to internal team and cross-functional teams using MS PowerPoint software.
- Well communicated with other team members, including statistician and non-statistician.
Confidential
Data Analyst
- Run multiple projects. Five projects were completed before the schedule. Save over 80 research time.
- Manage the data flow and integrate the data to a database. Contributes to cross-functional / cross-department / cross-institute raw data handling process.
- Generates summary statistics on data analyzing the content to identify and address data issues, anomalies, inconsistencies, outliers and limitations such as overlapping admissions, outlier lab values, apparent data entry errors, etc.
- Analyze research and medical data using SAS and SAS Enterprise Guide. Access and extract MS Excel file Using PROC IMPORT. Query against databases through the use of SAS SQL language.
- Conduct ad-hoc and post hoc on medical data and scientific data analysis and conduct accurate and appropriate interpretation of data. Create and maintain reporting processes using SAS.
- Developed demographics/adverse events listings, tables and graphs.
- Prepare table, figure and listing for presentations. Present report to functional team members and whole department using MS PowerPoint.
- Build a relational database using MS Access.
- Get SQL language based certificates: SAS Advanced certificate and Oracle OCA certificate. Multiple table join manipulation.
Confidential
Data Analyst
- Alzheimer's disease data were manipulated in MS Excel and were built a relational database using MS Access.
- Correlation was found to exist between the neuron synchronization , correlation analysis was performed using Matlab. Both t test and ANOVA analysis was conducted using Matlab, Tukey test was chosen for post hoc analysis.
- Presented report to research team and whole department using PowerPoint.
Confidential
Data Analyst
- Select the appropriate statistical techniques needed to analyze.
- Multiple group comparison were performed using ANOVA test, significant difference existed among the groups. Tukey test for post hoc analysis.
- Figures and tables were prepared based on the research and presented to the function team and department using PowerPoint software. The results were written into scientific papers for peer-reviewed publications.
- Presented report to research team and international meeting using PowerPoint.
