Data Scientist Resume
Columbus, GA
SUMMARY:
- About 10 years of strong IT experience in field of Data Analytics & Data Science focused on processing and analyzing large amount of data using Hadoop (Mahout, Hive, PIG), R, MS Excel 2010, MS Access 2010, MS SQL 2010, SAS, Matlab.
- Proficient in Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
- Efficient in: data acquisition, storage, analysis, integration, predictive modeling, logistic regression, decision trees, data mining methods, forecasting, factor analysis, cluster analysis, ANOVA and other advanced statistical techniques.
- Strong experience in Data Visualization with QlikView & Tableau creating: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
- Adept in Data Quality Management to get, clean, process, and cross - verify the data in multiple sources. Skilled R user with knowledge of other statistical programming languages like SAS and SPSS.
- Good knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests.
- Experience with creating MapReduce programs, SQL on Hadoop using Hive and ETL using PIG scripts. Skilled in R, Java, Python, C#, SQL Server, Matlab, SAS.
- Willing to relocate: Anywhere
TECHNICAL SKILLS:
Statistical Software: R, Matlab, SAS Studio, MS Excel 2010
Statistical Techniques: Linear Regression, Logistic Regression, Random Forests
Big Data Ecosystems: HDFS, MapReduce, Mahout, Hive, Pig, Sqoop, Flume
Database: SQL Server, MS Access 2010, MySQL
Languages: Java, Python, C#, SQL, HTML, JavaScript
WORK EXPERIENCE:
Data Scientist
Confidential - Columbus, GA
Responsibilities:
- Analyzed individual customer behavior.
- Segmented customers based on spending activities.
- Categorized risky customers based on the days past due parameter.
- Categorized active and inactive customers based on their utilization.
- Designed, developed and deployed statistical data models R.
- Utilized machine learning techniques for predictions & forecasting based on the data.
- Developed data mining algorithms using Machine learning (Random Forest, Regression, Clustering) for decision making using R, Mahout on Hadoop.
- Partnered with ETL team to extract data from Hadoop environment.
- Prepared Dashboards using calculations, parameters in QlikView.
- Created and assisted users in Tableau dashboard development.
Environment: R, Hadoop, Mahout, QlikView, Excel.
Data Scientist
Confidential - Basking Ridge, NJ
Responsibilities:
- Utilized machine learning techniques for predictions & forecasting based on the Sales data. Executed overall data aggregation/alignment & process improvement reporting within the sales dept. Managed Data quality & integrity using skills in Data Warehousing
- Databases & ETL. Monitored and maintained high levels of data analytic quality, accuracy, and process consistency. Assisted sales management in data modeling. Ensured on-time execution and implementation of sales planning analysis and reporting objectives. Worked with sales management team to refine predictive methods & sales planning analytical process. Executed and monitored the accuracy and efficiency for sales forecasts & reporting. Prepared Dashboards using calculations, parameters in QlikView. Supported consistent implementation of company reporting and sales process initiatives.
Environment: R, Excel, SAS, QlikView, MS SQL Server 2010.
Data Scientist
Confidential - Nashville, TN
Responsibilities:
- Responsible for predictive analysis of credit scoring to predict whether or not credit extended to a new or an existing applicant will likely result in profit or losses.Primarily used R packages for the data mining tasks.
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation and visualization.
- Data for modeling was collected using SQL by querying several tables. The extracted tables were further appended or merged to create tables for modeling using R.
- Adopted principal component analysis. The missing values were replaced if applicable with the group average using proc means.
- Computed Credit Risk Parameters such as Probability of Default and Loss Given Default and Exposure at Default.
- Used logistic regression, clustering and multivariate modeling to provide valuable analytical insights.
- Used R for generating various graphs and charts for analyzing the different features.
- Used k-fold cross validation to avoid over fitting.
- Used Kolmogorov-Smirnov test to measure the quality of the models.
Environment: R, MS SQL, Hadoop, Hive, Pig, Mahout.
Data Analyst/Data Scientist
Confidential - Nashville, TN
Responsibilities:
- Worked on large data sets of structured, semi-structured, and unstructured data.
- Developed data analysis solutions based on predictive, behavioral or other models via statistical analysis and use of relevant modeling techniques.
- Identified opportunities for Cost saving for the customers using Data models
- Developed data mining algorithms using Machine learning (Decision Tree, Regression, Clustering) on Historical big data for decision making using R, Mahout on Hadoop
- Designed, developed and deployed statistical data models R, Python
Environment: R, Hadoop, Mahout, Python, MS SQL Server 2010.
BI Analyst
Confidential
Responsibilities:
- Utilized skills in software applications such as R/Excel/SAS.Used decision tree analysis, regression analysis.
- As an ETL Team member, Involved in Business Analysis, Business Process and Technical Design sessions.
- Worked with business and technical staff to develop requirements document and specifications.
- Responsible for extracting data from MS SQL Server, MS Access and Flat files, Data Warehousing & Database Design, ETL, Data reporting and query, preparing data for analytics and project management.
- Created database from representative project data and modified procedures, tables, views and constraints.
- Generated reports for managers using dimensional modeling and reporting tools.
- Edited raw data and created R data sets for statistical analysis for project/business decisions. Used R package for Statistical Analysis. Created and maintained bulk data load & extract processes.
Environment: R, Excel, SAS, MS SQL Server 2008, Excel.
Business Analyst - Mobile portal commercials
Confidential
Responsibilities:
- Worked as both developer and problem solver to develop both client side & server side code.Worked closely with Business Analysts to understand the requirements.
- Wrote Core Java classes, JSP and HTML files
- Worked with team to developed interactive and user friendly web pages using JSP, CSS, HTML, JavaScript
- Involved in injecting dependencies into code using Spring core module.
- Involved in developing code for obtaining bean s in Spring framework using Dependency Injection (DI) or Inversion of Control (IoC).
Environment: Java/ J2EE, JSP, MySQL, HTML, CSS, JavaScript, Spring.