Data Scientist Resume
Columbus, GA
PROFESSIONAL SUMMARY:
- Around 5 years of IT experience as Machine Learning Engineer with hands on experience in interpreting and analyzing data through Statics ML& Statistical techniques and deriving meaningful insights to implement solutions in a fast - paced environment.
- Experience in Statistic Modelling, Predictive Modelling, Data Analytics, Data Modelling, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP) algorithms.
- Experience in utilizing analytical applications like R and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
- Experience of using Artificial Intelligence in Text Analytics and developing different StatisticalMachineLearning, Data Mining Solutions to various business problems and generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Proficient indatascience life cycle, i.e., “Source • Clean • Explore • Communicate”.
- Experience in machine learning techniques like Regression, Classification, Random Forest, Clustering Analysis, Market Basket Analysis, Association Rules, Naïve Baye, Recommendation System, Dimension Reduction and Neural Networks.
- Proven expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization Methods and Natural Language Processing (NLP), Time Series Analysis.
- Extensive experience in Text Analytics, developing different StatisticalMachine Learning solutions to various business problems and generating data visualizations using R and Python.
- Hands on experience on R packages and libraries like caret, ggplot2, dplyr, magrittrHmisc, e1071, ROSE, epiR, ggviz etc.
- Used Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, and NLTK in Python for developing variousmachinelearningalgorithms.
- Working knowledge in building prediction models using Linear Regression Analysis, Logistic Regression, Correlation Coefficient, and Coefficient of determination techniques.
- Good knowledge on statistical analysis techniques like Confidence Interval, Hypothesis testing, ANOVA.
- Rich industry experience in Banking/Health care.
- Adapt and deep understanding of Statistical modelling, Multivariate Analysis, model testing, problem analysis model comparison, optimization and validation.
- Experience in working on both windows, Linux platforms.
- Performed Sentiment Analysis and Opinion Tracking of the presidential debates by using twitter analysis and created visualization using R and Python.
TECHNICAL SKILLS:
Statistical Software: R,Python
Databases: SQL Server 2014/2012/2008/2005/2000, MS-Access, Oracle 12c/11g.
Programming and Scripting Languages: R (shiny, ggplot2, dplyr, tidyr, Tidyverse), C, C++, JAVA, HTML, Java Script, Python (numpy, scipy, scikit-learn, nlkt, Matplotlib, keras), Linux commands.
Statistical Methods: Hypothesis Testing, Time Series, Regression Models, Confidence Intervals, Dimensionality Reduction, Correlation, Covariance, Bootstrapping, Paired and Unpaired Sample Tests, Principal component analysis ( PCA) and LDA.
Tools and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, Jupyter Notebook, R Studio, Microsoft Office, AWS, Tableau.
Machine learning Algorithms: Classification, Regression, Decision Trees, Random Forest, KNN, Clustering(K-means), Neural Nets, SVM, Bayesian Algorithm, Social Media Analytics, Sentimental analysis, Market Base Analysis, Bagging, Boosting.
PROFESSIONAL EXPERIENCE:
Confidential, Columbus, GA
Data Scientist
Responsibilities:
- Working closely with marketing team to deliver actionable insights from huge volume of data, coming from different marketing campaigns and customer interaction matrices such as web portal usage, email campaign responses, public site interaction, and other customer specific parameters.
- Extensively researched and approached subject matter experts to identify wire fraud indicators. Developed features based on past activity, online activity, updates to personal data, travel, card transactions etc.
- Acquiring, cleaning and structuring data from multiple sources and maintain databases/data systems.
- Outlier detection using high-dimensional historical data.
- Identifying, analyzing, and interpreting trends or patterns in complex data sets.
- Characterizing false positives and false negatives to improve a model for predicting customer fraud rate.
- Implemented a real-time scoring system for Fraud operations team to manage, review and process transactions.
- Work independently and collaboratively throughout the complete analytics project Artificial Intelligence lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
- Identified root causes of problems and facilitated the implementation of cost effective solutions with all levels of management.
- Application of variousmachinelearningalgorithms and statistical modelling like decision trees, regression models, clustering, SVM to identify volume using scikit-learn package in Python.
- Performed K-means clustering, Regression and Decision Trees in R. also worked with Naïve Bayes and skilled in Random Forests, Decision Trees, and Logistic Regression, SVM, Clustering, Principle Component Analysis.
- Pro-actively analyzed data to uncover insights that increase business value and impact.
- Hold a point-of-view on the strengths and limitations of statistical models and analyses in various business contexts and can evaluate and effectively communicate the uncertainty in the results.
Environment: R, Python 3.2/2.7, MATLAB, Decision Tree, Random Forest, Logistic regression, Naïve Bayes, Scala NLP Linux, GIT, NumPy, Scala NLP Pandas and Tableau.
Confidential, Harrisburg, PA
Machine Learning Data Scientist
Responsibilities:
- Acquiring, cleaning and structuring data from multiple sources and maintain databases/data systems.
- Developing and implementing data collection systems and other strategies that optimize statistical efficiency and data quality.
- Filtering and "cleaning" data and review computer reports, printouts, and performance indicators to locate and correct code problems.
- Implemented end-to-end systems forDataAnalytics,DataAutomation and Integration.
- Responsible fordataidentification, collection, exploration & cleaning for modelling, participated in model development.
- Segmented users based on their health and demographic attributes such as location, sex, age and glucose levels.
- We then used K-means to cluster users based on their Insulin usage. Usage was recorded according to the as number of Blood Sugar or the Glucose level (mg/dl) and the dosage in regimen with the data across different time(s) of day, day(s) of week and month(s) etc.
- User activity was scored w.r.t. the cluster they were in. We used Z-scores to identify their deviation from cluster behaviour.
- We are working towards improving our anomaly scoring algorithm by using a multivariate Gaussian distribution to determine the probability of occurrence of a particular activity.
- Built standard reports for company presentation, provided ad-hoc query and analysis support, and created requirements document by interacting with customers.
- Used Python to implement different Artificial Intelligence machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network.
- Implemented various statistical techniques to manipulatedata(missingdataimputation, principle component analysis and sampling) and build predictive models.
- Writing detailed analysis plans and descriptions of analyses and findings for research protocols regulatory reports and healthcare Manu scripts.
- Implemented a Python-based distributed random forest via Python streaming.
- Record and maintain meta-analyses and analyses of systematic reviews of medical literature.
- Successfully built models to predict glucose levels based on the meal plan followed by the patient using the Logistic regression.
- Created various types ofdatavisualizations using Python and Tableau.
Environment: R,Python2.7/3.2, Clustering, Regressions Analysis, Singular Vector Decomposition -SVD, Minitab, Oracle, and Tableau.
Confidential, Batavia, IL
Data Analyst
Responsibilities:
- Participated indataacquisition withDataEngineer team to extract historical and real-timedata.
- Provided statistical solutions for clients based on retail chains for Fast-Moving Consumer Goods (FMCG) and Consumer Packed Goods (CPG) goods.
- Conducted ExploratoryDataAnalysis using R and carried out visualizations with Tableau reporting.
- Analyzed high volume, high dimensional client and survey data from different sources using R.
- Designed and implemented cross-validation and continuous statistical tests.
- Performed data analysis, statistical analysis, generated Tables, Listings and Graphs using R
- Managed projects through its lifecycle - research, planning, development, and execution.
- Participated in data acquisition, modelling, processing, manipulation, visualization, and product development. Engineered and implementedmachinealgorithms.
- Performed K-means clustering, Regression and Decision Trees in R and also worked with Naïve Bayes, Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis.
- Developed, reviewed, tested and documented by using R and created Templates by using them for existing reports to reduce the manual intervention.
- Created reports and dashboards to explain and communicatedatainsights, significant features, and models scoresand performance of new recommendation system to both technical and business teams.
- Applied machine learning algorithms to automate portfolio collection and aggregation process, access to appropriate market information and utilization of different pricing methodologies to estimate fair value.Analytical Implementation:
- Operations Analytics comprises store Scorecard, Cluster based analysis, Store and Resource productivity analysis, Growth-Trend analysis, Like-for-Like Store analysis.
- Merchandising and Inventory Analytics - Merchandise Plan Performance, Category scorecard, Category tactics including Assortment planning, OTB planning, Buying Plan, Allocation planning and Promotion planning.
- Customer Analytics - Demographic and purchase behaviour segmentation, Customer Churn-Acquisition.
- Retention, Market Basket Analysis, Loyalty based analysis, RFM Scoring, Campaign Analysis, Customer Concentration Analysis and Customer Purchase Behaviour Analysis.
- Documentation - provide functional documents to support Sales and Marketing teams.
Environment: R, Minitab, Multi-Class Logistics Regression Classifier, Boosted Regression Tree, Random Forest, Association Rules, Support Vector Machine, Clustering Analysis, Collaborative Recommended System, Time-series Analysis, Tableau, Excel-Miner.
Confidential
QE Tester
Responsibilities:
- Planned and managed the appropriate testing effort for any given module.
- Composed accurate and detailed Test Approach, scenarios and test cases (functional, usability and regression).
- Conducted end to end test executions on the entire application.
- Define and improve the processes undertaken to Test and deliver good quality product.
- Monitor the Test Execution Progress and Manage the Issues during Test Execution.
- Identified software defects and interacted with developers to resolve them and provided full support to the Test team using the QC.
- Share test reports (Daily / Weekly /Monthly) to all the stakeholders and management.
- Performed Integration Testing, Regression Testing and System Testing.
- Writing test plan, test cases for functional, Firmware, Integration testing.
- Work with the development / support teams to fix the environmental issues encountered during test executions.
- Prepared and shared the daily status report on the test execution.
- Communicate defects using Quality Center (QC) with proper Severity and Priority.
- Execution of System Integration Testing, User Acceptance Testing.
- Analyzed Test cases and Test Scenarios based on the Requirements.
- Created Test matrix, Test Summary Report, UAT summary Report and UAT Sign off Report.
Environment: Test Link, Java, Selenium Web Driver, MS Office, Microsoft Excel, MS Word, Windows XP/7/8, Unix, Chrome, Firefox, Internet Explorer, PL/SQL, GitHub, Oracle 10g.