We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

3.00/5 (Submit Your Rating)

New, YorK

SUMMARY:

  • Over 6 + Years Of Data Analyzing Experience Encompassing In Machine Learning, Data mining With Large Datasets Of Structured And Unstructured Data, Data Acquisition, Data Validation, Predictive Modeling, Data Visualization.
  • Hands - On Experience with Machine Learning Algorithms Such As Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
  • Strong Programming Skills in a Variety of Languages Such As Python, R, SAS and SQL.
  • Proficient in Machine Learning Techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, Svm, Bayesian, K-Nearest Neighbors).
  • Statistical Modeling In Forecasting/ Predictive Analytics, Segmentation Methodologies, Regression Based Models, Hypothesis Testing, Factor Analysis/ Pca.
  • Experience In Designing Visualizations Using Tableau And Ggplot2 And Storyline On Web And Desktop Platforms, Publishing And Presenting Dashboards.
  • Hands On Experience In Implementing Lda, Naive Bayes And Skilled In Decision Trees, Random Forests, Linear And Logistic Regression, Svm, Clustering, Neural Networks And Good Knowledge On Recommender Systems.
  • Adept in statistical programming languages like Rand also Python including Big Data technologies like Hadoop, Hive.
  • Experience Developing Sql Procedures on Complex Datasets for Data Cleaning and Automating the Reports.
  • Experience Developing SAS Macros for Ad-Hoc Reporting In SAS Enterprise Guide Using Query Builder and Sql.
  • Knowledge Of Using Teradata Tools Like Sql Assistant And Microsoft Sql Server For Accessing And Manipulating Data On ODBC-Compliant Database Servers.
  • Expertise In Transforming Business Requirements Into Building Models, Designing Algorithms, Developing Data Mining And Reporting Solutions That Scales Across Massive Volume Of Unstructured Data And Structured.
  • Having Good Domain Knowledge on Retail, Payment Processing, Supply Chain and Healthcare.
  • Well Experienced In Normalization & De-Normalization Techniques For Optimum Performance In Relational And Dimensional Database Environments.
  • Hands-On Experience with Machine Learning Algorithms Such As Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
  • Strong Programming Skills in a Variety of Languages Such As Python, R, SAS and SQL.
  • Proficient in Machine Learning Techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, Svm, Bayesian, K-Nearest Neighbors).
  • Statistical Modeling In Forecasting/ Predictive Analytics, Segmentation Methodologies, Regression Based Models, Hypothesis Testing, Factor Analysis/ Pca.
  • Experience In Designing Visualizations Using Tableau And Ggplot2 And Storyline On Web And Desktop Platforms, Publishing And Presenting Dashboards.
  • Hands On Experience In Implementing Lda, Naive Bayes And Skilled In Decision Trees, Random Forests, Linear And Logistic Regression, Svm, Clustering, Neural Networks And Good Knowledge On Recommender Systems.
  • Adept in statistical programming languages like Rand also Python including Big Data technologies like Hadoop, Hive.
  • Experience Developing Sql Procedures on Complex Datasets for Data Cleaning and Automating the Reports.
  • Experience Developing SAS Macros for Ad-Hoc Reporting In SAS Enterprise Guide Using Query Builder and Sql.

PROFESSIONAL EXPERIENCE:

Confidential, New York

Sr. Data Scientist

Roles & Responsibilities:

  • Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
  • Built machine learning models to identify fraudulent applications for loan pre-approvals and to identify fraudulent credit card transactions using the history of customer transactions with supervised learning methods.
  • Extracted data from database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like down-sampling, up-sampling and SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
  • Used PCA and other feature engineering techniques to reduce the high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.
  • Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models such as Logistic regression, Gradient Boost Decision Tree and Neural Network.
  • Used cross-validation to test the models with different batches of data to optimize the models and prevent over fitting.
  • Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.
  • Implemented a Python-based distributed random forest via PySpark and MLlib.
  • Used AWS S3, Dynamo DB, AWS lambda, AWS EC2 for data storage and models' deployment.
  • Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.
  • In preprocessing phase, used Pandas to clean all the missing data, data type casting and merging or grouping tables for EDA process.
  • Used PCA and other feature engineering, feature normalization and label encoding Scikit-learn preprocessing techniques to reduce the high dimensional data (>150 features).
  • In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.
  • Experimented with predictive models including Logistic Regression, Support Vector Machine (SVC), Random Forest provided by Scikit-learn, XG Boost, Light GBM and Neural network by Keras to predict showing probability and visiting counts.
  • Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the models' significance.
  • Implemented, tuned and tested the model on AWS Lambda with the best performing algorithm and parameters.

Environment: Oracle 11g, Hadoop 2.x, HDFS, Hive, Pig Latin, Spark/PySpark/MLlib, Python 3.x (Numpy, Pandas, Scikit-learn, Matplotlib, Seaborn), Jupyter Notebook, AWS, Github, Linux, Machine learning algorithms, Tableau.

Confidential, Des Moines, IOWA

Sr. Data Scientist

Roles & Responsibilities:

  • Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation.
  • Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
  • Developed a Machine Learning test-bed with different model learning and feature learning algorithms.
  • By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
  • Used Text Mining and NLP techniques find the sentiment about the organization.
  • Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
  • Participated in all phases of Data mining, Data cleaning, Data collection, developing models, Validation, Visualization and Performed Gap analysis.
  • Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
  • Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
  • Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
  • Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.
  • Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
  • Involved in defining the Source to Target data mappings, Business rules, data definitions.
  • Worked with different data science teams and provided respective data as required on an ad-hoc request basis
  • Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.

Environment: R Studio 3.5.1, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, ML base, ad-hoc, MAHOUT, No SQL, Pl/Sql, MDM, MLLib & Git.

Confidential, San Francisco, CA

Data Scientist

Roles & Responsibilities

  • Responsible for Retrieving data using SQL/Hive Queries from the database and perform Analysis enhancements.
  • Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
  • Worked as a RLC (Regulatory and Legal Compliance) Team Member and undertook user stories (tasks) with critical deadlines in Agile Environment.
  • Applied Regression in identifying the probability of the Agent's location regarding the insurance policies sold.
  • Used advanced Microsoft Excel functions such as Pivot tables and VLOOKUP in order to analyze the data and prepare programs.
  • Performed various statistical tests for clear understanding to the client.
  • Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment.
  • Cleaned data by analyzing and eliminating duplicate and inaccurate data using R.
  • Experience in retrieving unstructured data from different sites such as in html, xml format.
  • Worked with Data frames and other data interfaces in R for retrieving and storing the data.
  • Responsible in making sure that the data is accurate with no outliers.
  • Applied various machine learning algorithms such as Decision Trees, K-Means, Random Forests and Regression in R with the required packages installed.
  • Applied K-Means algorithm in determining the position of an Agent based on the data collected.
  • Read data from various files including HTML, CSV and sas7bdat file etc using SAS/Python.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Coded, tested, debugged, implemented and documented data using R.
  • Researched on Multi-layer classification algorithms as well as building Natural Language Processing model through ensemble.
  • Worked with Quality Control Teams to develop Test Plan and Test Cases.
  • Worked closely with data scientists to assist on feature engineering, model training frameworks, and model deployments implementing documentation discipline.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
  • Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
  • Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.
  • Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
  • Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
  • Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas.

Environment: R 3.5, Decision Trees, K-Means, Random Forests, Microsoft Excel, Agile, SAS, SQL, NLP

Confidential

Data Scientist

Roles & Responsibilities:

  • Responsible for data identification, collection, exploration, and cleaning for modeling, participate in biological model development.
  • Performed data analysis using industry leading text mining, data mining, and analytical tools and open source software.
  • Used Jira for defect tracking and project management.
  • Worked on writing and as well as read data from CSV and excel file formats.
  • Visualize, interpret, report findings, and develop strategic uses of data by R Libraries like ggplot2, The Cancer Genome Atlas (TCGA) Data Portal, ClinVar, ENCODE.
  • Responsible for loading, extracting and validation of client data.
  • Creating statistical analysis using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Missing value treatment, outlier detection and anomalies treatment using statistical methods, deriving customized key metrics by using R package software.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
  • Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Experienced in parsing json within R or turn R data frames into json by using Mongo DB.
  • Experienced in using rob mixglm v1.0-2 package to implements robust generalized linear models (GLM) using a mixture method.
  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, space-time.
  • Strong skills in data visualization like ggplot2, shiny, Plotly, creating different charts such as Heat maps, Bar charts, Line charts.
  • Responsible for creating / revising and implementing standard operation procedures (SOPs), laboratory records and other related documentation
  • Analyse the ClinVar data to propose the NonHotspot rule proposal.
  • UAT testing for patient variant files
  • Experienced in handling complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.
  • Worked on production issues and resolving production tickets.
  • Involved in the integration of multiple layers in the application.
  • Knowledge in generating Hibernate Mapping Files and Java Classes and Creating the Reverse Engineering File and Creating Hibernate Mapping Files and POJOs From a Database
  • Basic knowledge in creating an XML configuration file for Hibernate - Database connectivity
  • Responsible to review PIK3CA novel a MOI reports and the sub protocol for the document for ARM Z1F. Summarized the evidence used for the sub protocol variants and compare to that used for the novel a MOIs.
  • Performed QA testing on the application.
  • Held meetings with client and worked for the entire project with limited help from the client.

Environment: Java 1.8, Core Java, Eclipse, Tomcat, Apache Tomcat 5.0, JSP, XML, JIRA, RDBMS, SQL, JSON, JavaScript, HTML5, CSS3, GIT, PL/SQL, GRID, Linux.

We'd love your feedback!