Lead Data Scientist Resume
FL
SUMMARY
- Data Scientist with 9 years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping and programming languages like R and Python including Big Data technologies like Hadoop, Spark.
- Deep understanding of Statistical Analysis & Modeling, Algorithms and Multivariate Analysis. Familiar with model selection, testing, comparison and validation.
- Experienced in Machine learning (ML) techniques: Supervised, Unsupervised and Reinforcement learning techniques in building models.
- Worked on large sets of structured, semi - structured, and unstructured data.
- Statistical: Descriptive statistics, Distance measures, Hypothesis testing, Chi-Square, ANOVA.
- Experienced in Linear/ Logistics Regression, Random Forest, Decision Trees, CART, Naive Bayes, Association Mining, K-Means, hierarchical clustering, Gradient boosting.
- Experienced in implementing Factor Analysis and Principal Component Analysis Dimension reduction techniques.
- Performed Support vector machine (SVM) and Artificial Neural Networks (ANN) in building models.
- Expertise in deep neural network topologies such as convolutional nets and recurrent nets.
- Good experience with deep learning frameworks like H2O and TensorFlow.
- Experience using Deep Learning to solve problems in Image or Video analysis.
- Good understanding of Apache Spark features & advantages over map reduce or traditional systems.
- Very good hands-on in Spark Core, Spark Sql and Spark machine learning using R and Python programming languages.
- Working knowledge of current techniques and approaches in natural language processing (NLP).
- Developed Time series analysis that guides business people in determining key strategies.
- Capable in solving problems and producing flexible solutions using analytical and creative skills.
- Implemented techniques like forward selection, backward elimination and step wise approach for selection of most significant independent variables.
- Evaluated model performance using RMSE score, Confusion matrix, ROC, Cross-validation and A/B testing to in both simulated environment and real world.
- Experience in improving accuracy of models by using Boosting and Bagging techniques
- Performed Exploratory Data Analysis and also visualized data using R, Python and hadoop.
- Performed Clustering Algorithms in segmenting clients using Social Media data.
- Participate in daily agile meeting, weekly and monthly staff meetings and collaborate with various teams to develop and support ongoing analyses.
- Conducted data accuracy analysis and support stakeholders for decision-making.
- Sound RDBMS concepts and extensively worked with DB2, SQL Server, MySQL
- Developed interactive dashboards, created various Ad Hoc reports for users in Tableau and plotly and ggplot/shiny in R, matplotlib/seaborn/bokeh in Python by connecting various data sources.
TECHNICAL SKILLS
Languages: R, SQL, Python, Shell scripting
IDE: R Studio, Jupyter, Atom.
Databases: SQL Server, MS Access, MySQL, MongoDB (noSQL)
Big Data Ecosystems: Hadoop, HDFS, Hive, Pig, Spark MLLib, ETL.
Operating Systems: Windows XP/7/8/10, Unix, Linux
Packages: ggplot2, caret, dplyr, RWeka, gmodels, RCurl, tm, C50, Wordcloud, Kernlab, Neuralnet, twitter, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2, Tensorflow, H2O
Data Analytics Tools: R console, Python (numpy, pandas, scikit-learn, scipy), SPSS
BI and Visualization: Tableau
Version Controls: GIT
PROFESSIONAL EXPERIENCE
Lead Data Scientist
Confidential, FL
Responsibilities:
- Applying skills such as statistical modeling and computational algorithms in the field of unsupervised and supervised machine learning to address a specific societal need in a commercial setting on caring for the aging using leading edge hardware and software systems.
- Creating algorithms to clean and process the ‘Tempo device’ data and make it readily available for use for future analysis and building prediction models.
- Performing data mining to detect outliers and errors and misclassified data and readying data for visualization on dashboard for various different customers around the globe.
- Predominantly using R, Python and AWS (Amazon web services), and MySQL along with NoSQL (mongodb) databases for meeting end requirements and building scalable real time system.
- Using Highchart, Dygraphs with R for visualization, H2o (deep learning), Tensorflow (deep learning), Hadoop, parallelization processes, regularization methods, time series analysis, regression techniques, K means, and other supervised and unsupervised expertise to build scalable machine learning algorithms to support clients in early detection of symptoms among elders.
- Responsible in data pattern recognition and data cleaning. Identify missing, invalid values and outliers, analyze and categorize variables of datasets.
- Actively involved in designing and developing data ingestion, aggregation, integration and advanced analytics in Hadoop.
- Worked on large sets of Unstructured, and Structured data.
- Involved in analyzing large data sets to develop multiple custom models and algorithms to drive innovative business solutions.
- Performed extensive data quality analyses to build and evaluate data processes and recommended process fixes for accurately and TEMPeffectively generate and measure data for analytics and reporting.
Director, Data Science and Machine learning
Confidential, NY
Responsibilities:
- Collaborated with the development team to create the working model on predicting the particular film/documentary/short films’ revenue to kick start the company’s vision of making it in advertisement, marketing, newspaper and media industry.
- Predominantly used tools like Python, MySQL, and AWS to create and maintain advanced machine learning algorithms on production level system.
- Prepared pitch decks, reports, proof of concepts and visualizations for investment meetings.
- Application of various machine learning algorithms like decision trees, regression models, neural networks, SVM, clustering to identify profiles budget, revenues, reach using scikit-learn package in python.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Ensured that the model has low False Positive Rate.
- Addressed over-fitting by implementing of the algorithm regularization methods like L2 and L1.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Created and designed reports that will use gatheird metrics to infer and draw logical conclusions of past and future behavior.
- Implemented rule based expertise system from the results of exploratory analysis and information gatheird from the people from different departments.
- Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
Data Science Trainer
Confidential
Responsibilities:
- Trainer for online "Data Scientist Certification Training (R, SAS & Excel)."
- Provided live training to batches and created presentations, conducted assessments and assignments/projects.
Data Scientist
Confidential, PA
Responsibilities:
- Automated data collection from several different web sources for quality council department and machine learning in natural language processing for quality observations.
- Worked with sensitive data about prescriptions, risk factors, related side TEMPeffects, profiles of drugs distributed by Confidential globally and worked on responses from FDA and other governing agencies.
- Individually managed the project for data collection from several different web sources. Automated manual procedures & performed machine learning on natural language for finding patterns and insights utilizing Regular expressions, TF-IDF and word cloud using R and Python
Research Assistant
Confidential, WV
Responsibilities:
- Assisted in formulating sampling methods for collecting data on laboratory animals’ behavior and determined the overall TEMPeffect of light and sound exposure on their normal activities.
- Measured the normal healthy conditions and calibrated them as baseline standards for newer facility rooms to be prepared.
- Compared the data, performing data analysis, setting up new levels of hearing tolerance in laboratory animals and overall assisted in the entire research processes utilizing excel and R with Linear regression and Raven for sound/light frequency.
Research Analyst-Predictive Modeler
Confidential
Responsibilities:
- Studied consumer preferences and buying habits for tobacco consumption.
- Collected, mined and refined data from market surveys and produced reports for research and extensive data analysis.
- Developed and coordinated analytics and quantitative modeling associated with research team. Prepared multiple models and compared them to choose final model using EXCEL, R for neural networks along with Python for scraping and manipulation.
- Used R for ExploratoryDataAnalysis, A/B testing, Anova test and Hypothesis test to compare and identify the TEMPeffectiveness of Creative Email Campaigns.
- Created clusters to classify Control and test groups and conducted group campaigns.
- Performed Market Basket analysis and identified business rules to boost the revenue sales by 12%.
- Undertook extensive Pro-forma analysis, DCF valuation, Cost Benefit analysis to generate and optimize project and product pricing.
- Performed extensive market basket analysis using Machine Learning algorithms to evaluate and optimize existing models which generated 8-10% incremental sales.
- Created various types ofdatavisualizations using R and Tableau.
- Promoted RDLs to Reporting Service Server (SSRS).
- UsedR, SQL to create machine learning algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Matrix factorization models, Bayes collaborative models to target users with mobile offer campaigns and native Ads.
Business Data Analyst
Confidential
Responsibilities:
- Coordinated with end users for designing and implementation of analytics solutions as per project proposals and formulated procedures for integration of R programming for building models and algorithms, plans with database sources and delivery systems through Python, SPSS, EXCEL and other traditional and advanced tools.
- Market Research- worked with Secondary Research as well as Primary Research. Applied technical and Business Market Analysis, wrote RFP/RFI and various proposals.
- Acted as a liaison between the client and technical teams as well as non-technical teams.
- Planned and monitored the project and carried out requirement management and communication.
- TEMPEffectively communicated with the stakeholders to gather requirements for different projects
- Used MySQL DB package and Python-MySQL connector for writing and executing several MYSQL database queries from Python.
- Created functions, triggers, views and stored procedures using My SQL.
- Worked closely with back-enddeveloper to find ways to push the limits of existing Web technology.
- Involved in the code review meetings.
- Identified inconsistencies in data collected from different source.
- Worked with business owners/stakeholders to assess Risk impact, provided solution to business owners.
- Determined trends and significant data relationships Analyzing using advanced Statistical Methods using Python.