Data Scientist Resume
Hartford, CT
CORE COMPETENCIES:
- Able to influence the strategic direction of the company by identifying opportunities in large, rich data sets and creating and implementing data driven strategies that fuel growth including revenue and profits.
- Design and implement statistical / predictive models and algorithms utilizing diverse sources of data.
- Utilize analytical applications like R to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings in business terms to the management.
- Develop tools and reports that help users access and analyze data to make fact based decisions.
- Capable of turning dry analysis into an exciting story that influences the direction of the business and communicating with diverse teams to take a project from start to finish.
- Collaborate with product teams to develop and support our internal data platform and to support ongoing analyses.
TECHNICAL SKILLS:
Programming Languages: R, Python, SAS, SPSS, Java, VB.Net
Internet Technologies: HTML/CSS
Operating Systems: Windows, Unix
Database Systems: MS SQL, Access, Oracle, Hadoop, Hive
Tools: H2o, Tableau, Spotfire, R Studio, Weka, Eclipse, MS Visual Studio, MS Office, Excel, GitHub Shell.
Source Control Git: Machine Learning Algorithms Naïve Bayes, Linear Regression, Logistic Regression, Decision Tree, Random Forest, K - means, K- Nearest Neighbour, SVM, GLM, GBM
Big Data Technologies: Spark, Power BI, Microsoft Azure, ADF
PROJECT EXPERIENCE:
Confidential, Hartford, CT
Data Scientist
Responsibilities:
- Built GLM elastic net model in H2O and R to predict the frequency and severity of the non- cat losses such as fire, water non weather, theft, and electronic vandalism by coverages - building, contents and time element.
- Refreshed the non-cat pricing model with updated data and measured the changes to validate the consistency with the changes in the business.
- Built large property book analytics shiny dashboard by aggregating data from multiple data sources to create a holistic view for policies exposure and loss data.
- Built a shiny application with leaflet map visualization to show the catastrophic events and the locations of the policies that fall into the risk area.
- Collaborated with third party data teams to evaluate the third party data sets such as Verisk, ISO, Demyst to find the sources that have best match and provide lift to our current build an unconstrained pricing model.
- Executed overall data extraction and aggregation of exposure and loss data from claims and P-stage databases for policies using SQL and stored the data in Hadoop.
- Data for modeling was collected using SQL by querying several tables and merged the data from various sources, Excel, Oracle, and Sap Hana in R.
- Built a POC to understand the drivers for the non-cat perils using GBM and built a surrogate decision tree to know the driving factors.
- Created model monitoring application plan to monitor the model for any changes in the data and to make the decision if model refresh/ tuning is required.
Confidential
Data Scientist
Responsibilities:
- Built a SVM regression model using Weka to predict the competitor’s peanut butter price for various types of products such as crunchy, creamy peanut butter using current and predicted commodity prices of the ingredients, stock prices of the companies and other market data.
- Gathered user requirement for sales and marketing team to define the objective of the project and prepared a project plan.
- Performed data aggregation, cleaning, exploratory analysis, correlation analysis, variable selection as a preprocessing steps before modeling using R.
- Multiple models were built and tuned with combinations of various variables and compared the results to pick the best performing model.
- Collaborated with the business team to get constant feedback on the results of the model to align with the business needs and align with the project objectives. Documented the project to handover the model to data engineering team to production.
Confidential, New Brunswick, NJ
Data Scientist
Responsibilities:
- Managed strategic initiatives, creating and applying analytic frameworks to solve difficult business problems to enhance and strengthen decision making efforts at all levels.
- Gathered user requirement for Business Continuity Management (BCM) to analyze the direct and indirect supplier risk, Site Risk, Brand Risk and Revenue at Risk for various biologics brands.
- Executed overall data aggregation/alignment & process improvement reporting within the RE team.
- Data for modeling was collected using SQL by querying several tables and merged the data from various sources, Excel, Oracle, and Sap Hana in R.
- Performed exploratory data analysis (EDA), trained and tested models such as logistic regression, Decision Tree, K- means and Naïve Bayes.
- Created features by decomposing variables such as category of the products, reframe quantities and units of dosage and categorized various suppliers into direct and indirect suppliers.
- Trained the model with Naïve Bayes and Decision Tree and tested with 10 -fold cross-validation.
- Compared performance of the different models using error rates, ROC Curve and confusion matrix.
- Worked with management to refine predictive methods & sales planning analytical process.
- Used data visualization as a medium, to encourage stakeholders to look at the data in different perspective, ask more intelligent questions and make data driven decisions.
- Built ad hoc dashboards with KPI’s using spotfire and tableau. Integrated spotfire with JSViz and D3 visualization, added time series timelines, to show forecast.
- Interacted and collaborated with off store teams for productive teamwork.
- Developed a training program to train the management to use the spotfire web player and the dashboard
Confidential, Clinton, NJ
Data Scientist
Responsibilities:
- Defined and identified the target variable, the data that will be used to predict the target variable from various data sources including internal systems, and new external data sources by collaboration with various teams, business owner, system owners and SME’s to meet the business objective.
- Performed Univariate analysis and multivariate Exploratory Data Analysis (EDA) to check the distribution, missingness, identify patterns, understand data quantity in subpopulations, and investigate problematic dependencies.
- Transformed and created variables (feature engineering) by coordinating with SME by bucketing/binning of continuous variables.
- Used correlation analysis of each predictor with dependent variable as one preliminary exploration of potential model relationships.
- Built decision trees to help identify useful breaks for continuous variable transformations and to subpopulations on which models behave differently. Built Random Forest to identify upper limits on predictive performance that we can expect for an overall model. And finally, ultimate model was built on Logistic Regression because of its high interpretability and easy implementation.
- Evaluated the model using the test data set and repeated random sub-sampling validation. Measured goodness of fit using AIC/BIC and tested performance on subpopulation by product, by target market, by agent etc.
Confidential, New York
Data Analyst/ Tableau Developer
Responsibilities:
- Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
- Created Heat Map showing current customers by color that were broken into regions allowing business user to understand where we have most users vs. least users using Tableau.
- Projected and forecasted future growth in terms of number of customers in various classes by developing Area Maps to show details on which states were connected the most and publishing it on Tableau Server.
- Converted charts into Crosstabs for further underlying data analysis in MS Excel.
- Created Bullet graphs to determine profit generation by using measures and dimensions data from Oracle, SQL Server and excel.
- Blended data from multiple databases into one report by selecting primary key from each database for data validation.
- Combined views and reports into interactive dashboards in Tableau Desktop that were presented to Business Users, Program Managers, and End Users.
- Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
- Tested dashboards to ensure data was matching as per the business requirements and if there were any changes in underlying data.
- Rewrote various business process and tested result in MS Excel using various functions and sub query with not exists.
- Involved in updating functional requirement document after development and created documentation for deployment team.
- Data quality check on variable level including missing values, unique values, frequency tables.
- Obtained the data from variety of sources such as Database, CSV, flat files etc.
- Wrote complex join SQL queries to extract, load data.
Confidential, FL
Data Analyst
Responsibilities:
- Performed data query, extraction, compilation, and reporting tasks.
- Managed, updated and manipulated report orientation and structures with the use of advanced Excel functions including Pivot Tables and V-Lookups.
- Generated weekly, monthly, and quarterly reports necessary in maintaining a good and balanced financial statement.
- Researched for new means of qualifying and obtaining data and methods of utilizing analytical tools effectively to be used for systems development and improvement.
- Pivot tables are created to parameterized data analysis of premium amounts earned v/s actual
- Effectively used pivot filters to extract specific sales data for the analysis.
- Performed data cleansing and analysis using pivot table, formulas and V-lookups.