Data Scientist Resume
Schenectady, NY
SUMMARY:
- SQL DBA (8 years); Data Scientist (6 years); Linux Admin; Project Management Professional (8 years); AWS Solutions Architect - Associate (4 years); Certified Life & Health Insurance Agent (1 year).
- 5+ years as a Data Scientist with strong technical expertise, business experience, and communication skills to drive high-impact business outcomes through data -driven innovations and decisions.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data .
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Hands on experience in implementing LDA, Na ve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems in R.
- Experience in writing Procedures, Triggers and Packages in SQL/PL SQL.
- Experience on advanced SAS programming techniques, such as PROC SQL (JOIN/ UNION), PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.
- Implemented Deep Learning models and numerical Computation with the help of data flow graphs using Tensor Flow.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Experience with data analytics, data reporting, Ad-hoc reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Strong understanding of advanced Tableau features including calculated fields, parameters, table calculations, joins, data blending, and dashboard actions.
- Enthusiastic to learn and solve open-ended business problems.
- Hands on experience on Spark-Mlib utilities such as classification, regression, clustering, collaborative filtering, dimensionality reductions
- Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors) in Forecasting/Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure.
- Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including R, Python, Spark, SQL, Scikit Learn, Hadoop Map Reduce.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across massive volumes of structured and unstructured data .
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Expertise in Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Proficient in SQL, Database, Data Modeling, Data Warehousing, ETL and reporting tools.
- Proficient in Data Science programming using Programming in R, Python and SQL.
- Solid team player, team builder, and an excellent communicator.
- Speak French and English fluently.
TECHNICAL SKILLS:
SQL (8 years), MS SQL Server (8 years), MySQL (8 years), MICROSOFT SQL SERVER (8 years), Hadoop (3 years), Linux (6 years), Javascript (7 years), Python (8 years), R (8 years), Hadoop (7 years), Tableau (9 years), Business Intelligence (5 years), Unix Administration (7 years), Adobe (10+ years), Coding (9 years), Apache (7 years), C++ (7 years), CMS (9 years), Database Development (7 years), Database Administration (8 years), Data Entry (8 years), Data Analysis (9 years), Data Mining (8 years), Big Data (7 years), Mongo DB (8 years), AWS (6 years), Deep Learning (5 years), Machine Learning (3 years), Project Management (8 years), Data Management (9 years), Data Warehousing (8 years), BI (5 years), Excel (9 years), Essbase (7 years), Fisma (5 years), Hyperion (6 years)
WORK EXPERIENCE:
Data Scientist
Confidential - Schenectady, NY
- New Light Technologies Client Project: leveraging Social Media to Map Disasters
- Gathering the data using API. Modeling using NLP techniques. Vectorization by using Word2Vec model. Mapping geolocation data with use of GeoPandas.
- Statistical Summaries and Inference of Standardized Testing: SAT vs ACT
- Data cleaning. Developed familiarity and proficiency with python and data science libraries including SK - Learn, Pandas, Numpy, Statsmodels, Scipy, Matplotlib, and more.Performed methods of exploratory data analysis, including:Use NumPy to explore distributions of individual variables and relationships among pairs of variables.
- Regression Challenge: Predicted the price of homes at sale for the Aimes Iowa Housing dataset
- Generated and refined a regression model using the training data by making use of: Train-test split/ Cross-validation / grid searching for hyperparameters. Exploratory data analysis to question correlation and relationship across predictive variables.
- Determining Price of Loose Diamonds: Predicted diamond price using machine learning
- EDA. Data preprocessing/ Cross-validation/ Selection of parameters. Modeling: Random Forest/ Ridge and Lasso optimized Linear Model/ XGBoost model. Hyper Parameter Tuning and Grid Search.
Data Scientist
Confidential - San Jose, CA
- Continuously interacted with Marketing Strategists and Business leaders to identify their analytic needs.
- Data acquisition and manipulation using SQL in SQL Navigator and R from Oracle database
- Experience in SQL queries using Common Table Expressions, Case statements, Set operators, Date formats and other DML statements
- Delivered various R scripts that included
- Importing data into R from Oracle database
- Performing data transformations using user - defined functions, string functions and date functions, loops and conditional expressions
- Data manipulating packages dplyr, reshape2, tidy, reader, lubricate
- Initial exploratory analysis including descriptive statistics
- Data visualizations using the grammar of graphics-ggplot2 package- ggplot2 histograms, scatterplots, boxplots, line graphs, heat maps, bar graphs, Pareto charts
- Statistical hypothesis testing like Independent t-test, Kolmogorov-Smirnov test (K-S test)
- Performed Monthly Automated R scripts and analyzed the results
- Performed chi-square test to study the independence of variables using scipy.stats
- Exception handling in Python DB-API likes Warning, Operational Error, and Programming Error.
- Carried out Recursive Feature Elimination (RFE) for selecting the best features for our analysis
- Analyzed output results through Confusion Matrix, Sensitivity, Specificity, Accuracy and Kappa
- Validated the machine learning classifiers using Accuracy, AUC, ROC Curves and Lift Charts
- Carried out various team meetings on project updates and presented business reports and presentations to non-business users
- Performed Explanatory Data Analysis that included Data Profiling on descriptive statistics (unknown response values, imbalanced data ), Feature Engineering and data pre-processing functions like transformations, imputation of missing data, capping skewed values, binning, duplicates using Python Pandas library.
- Utilized SAP ERP and CRM systems for Supply Chain, sales, service and marketing to develop and execute marketing research analysis, customer behaviour research and competitive analysis.
Environment: Oracle, Oracle R Enterprise, R and SQL, AUC, ROC Curves, Kappa, Python DB-API, Smirnov test (K- S Set), DML, boxplots.
Data Scientist
Confidential - Buffalo, NY
- Analyzed time series data related to credit card payments of customers in the USA using statistical techniques.
- Performed data pre - processing, feature engineering & attribute reduction techniques to prepare the dataset for modelling.
- Used Power BI, Excel Pivot and Tableau to visualize, understand and detect the anomalies in the raw data .
- Performed Cleansing and Pre-processing of raw data using R programming language.
- Built Statistical models using Logistic Regression, Regression Trees and Artificial Neural Networks for classifying each customer as a default or a non-default.
- Built interactive and intuitive dashboards using Tableau, Power BI to communicate key metrics to the Risk Management team.
- Integrated large amount of data from DW using Talend Integration Tool and SSIS from Microsoft SQL Server 2012 to Oracle Toad involving complex mapping between fields and transformations between data .
- Developed user-interactive Dashboards and BI reports by performing different OLAP operations on DW using Tableau and Power BI reporting tool.
- Carried out Recursive Feature Elimination (RFE) for selecting the best features for our analysis
- Analyzed output results through Confusion Matrix, Sensitivity, Specificity, Accuracy and Kappa.
- Performed data munging including transformations, merging, sorting, detecting missing values, outliers, standard deviation, distributions of the data etc. in R
- Documented and submitted reports on descriptive statistics and graphs of predictor variables
- Performed data balancing to balance out the ratio of subscribers for churn versus active subscribers
- Integrated different data sets like customer's base, call center inbound and outbound calls, campaigns, plans & prices and billing from various sources
- Work with business and technology teams to design and supervise the implementation of data science and machine learning techniques
- Carried out logistic regression, analyzed coefficient estimates, probabilities of predicted and observed responses and concordance and discordant pairs
- Carried out forward, backward, subset and stepwise variable selection to obtain the best model giving high C-statistic/concordance percentage
Environment: Power BI, Excel, Tableau, RStudio, R, SQL Server, Oracle, RFE, Matrix, Kappa, Excel Pivot, Visualize, Microsoft SQL Server 2012.
Jr. Data Scientist
Confidential - Washington, DC
- Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit - learn in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, Decision trees, Neural Networks and Random Forests for data analysis.
- Validated the machine learning classifiers using Accuracy, AUC, ROC Curves and Lift Charts.
- Performed random forests and analyzed graphs on training and testing errors w.r.t sample size.
- Carried out neural networks through back propagation method and a 10-fold cross validation are used to select best parameters of the Ebola network. The tuning parameters included- the number of layers, converge rate and the range of initial random weight.
- Compared the model accuracies before and after applying PCA.
- Created Tableau scorecards, dashboards using Stack bars, bar graphs, scattered plots, geographical maps, Gantt charts using show me functionality. Created dashboards to have a clear view of descriptive statistics of all variables, Ebola-country wise trend analysis and predicted vs actual response rate for each country. Worked extensively with Advance analysis Actions, Calculations, Parameters, Background images and Maps. Effectively used data blending feature in tableau.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in R.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
- Processed huge datasets (over billion data points, over 1 TB of datasets) for data association pairing and provided insights into meaningful data association and trends
- Developed cross-validation pipelines for testing the accuracy of predictions
- Enhanced statistical models (linear mixed models) for predicting the best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap Analysis.
Environment: Oracle, Python, Tableau, SQL, Apache Sqoop, Apache ZOO, Apache, Oozie, Rstudio, Python, Theano, SQL, MS Excel 2016, Tableau, WINDOWS/Linux platform
Data Reporting Analyst
Confidential
- Designed Excel and Tableau dashboards for higher management reports, and other data visualization reports as per the needs of the user to measure KPI. Frequently prepared ad-hoc reports using Microsoft Access and Excel.
- Performed Data Analysis using QlikView, Tableau software to analyze financial data .
- Created BRDs and PRDs with the collaboration of project manager, QA and systems analyst. Data Analysis with SQL for queries from DB2 and Oracle DB, OBIEE, Excel, Access DB
- Developed UML diagrams showing a high-level map of business interactions.
- Work experience with Project management & Collaboration tools: MS Project, MS PowerPoint, MS Excel, Atlassian Confluence, JIRA.
- Working with the business to do UAT. Acted as an SME and worked with the testing team.
- Interpret, break down accumulated data in a forthright & concise manner to facilitate efficient use for end users. Reconciling accuracy of data in the system
- Providing efficient technical support during and after implementation for EDI configuration
- Utilized effectively CRM systems like SAP and Salesforce CRM, Supplier related systems like SAP SCM and SRM, SAP ERP Experience with core ERP modules like HR (Human Resource), MM (Material Management), FI-CO (Financial Accounting and Controlling), among others. SAP HCM and Sap Basis.
- Experience with Microsoft SharePoint and Atlassian Confluence to collaborate with other team members and clients.
- Combining structured and unstructured information for data warehousing and analyzing claims data to help identify cost and recovery. Working with formats like CSV, XML, and Excel
- Wrote SQL queries in Microsoft Access based on management's requirements/expectations.
Environment: R, Python, Microsoft Excel and Access, Query Analyzer, SharePoint, Office 365, Tableau, MySQL, SQL Server 2005/2008, PL/SQL, Oracle, UNIX, Windows