Sr. Data Scientist Resume New York - Hire IT People

SUMMARY:

Over 6 + Years Of Data Analyzing Experience Encompassing In Machine Learning, Data mining With Large Datasets Of Structured And Unstructured Data, Data Acquisition, Data Validation, Predictive Modeling, Data Visualization.
Hands - On Experience with Machine Learning Algorithms Such As Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
Strong Programming Skills in a Variety of Languages Such As Python, R, SAS and SQL.
Proficient in Machine Learning Techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, Svm, Bayesian, K-Nearest Neighbors).
Statistical Modeling In Forecasting/ Predictive Analytics, Segmentation Methodologies, Regression Based Models, Hypothesis Testing, Factor Analysis/ Pca.
Experience In Designing Visualizations Using Tableau And Ggplot2 And Storyline On Web And Desktop Platforms, Publishing And Presenting Dashboards.
Hands On Experience In Implementing Lda, Naive Bayes And Skilled In Decision Trees, Random Forests, Linear And Logistic Regression, Svm, Clustering, Neural Networks And Good Knowledge On Recommender Systems.
Adept in statistical programming languages like Rand also Python including Big Data technologies like Hadoop, Hive.
Experience Developing Sql Procedures on Complex Datasets for Data Cleaning and Automating the Reports.
Experience Developing SAS Macros for Ad-Hoc Reporting In SAS Enterprise Guide Using Query Builder and Sql.
Knowledge Of Using Teradata Tools Like Sql Assistant And Microsoft Sql Server For Accessing And Manipulating Data On ODBC-Compliant Database Servers.
Expertise In Transforming Business Requirements Into Building Models, Designing Algorithms, Developing Data Mining And Reporting Solutions That Scales Across Massive Volume Of Unstructured Data And Structured.
Having Good Domain Knowledge on Retail, Payment Processing, Supply Chain and Healthcare.
Well Experienced In Normalization & De-Normalization Techniques For Optimum Performance In Relational And Dimensional Database Environments.
Hands-On Experience with Machine Learning Algorithms Such As Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
Strong Programming Skills in a Variety of Languages Such As Python, R, SAS and SQL.
Proficient in Machine Learning Techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, Svm, Bayesian, K-Nearest Neighbors).
Statistical Modeling In Forecasting/ Predictive Analytics, Segmentation Methodologies, Regression Based Models, Hypothesis Testing, Factor Analysis/ Pca.
Experience In Designing Visualizations Using Tableau And Ggplot2 And Storyline On Web And Desktop Platforms, Publishing And Presenting Dashboards.
Hands On Experience In Implementing Lda, Naive Bayes And Skilled In Decision Trees, Random Forests, Linear And Logistic Regression, Svm, Clustering, Neural Networks And Good Knowledge On Recommender Systems.
Adept in statistical programming languages like Rand also Python including Big Data technologies like Hadoop, Hive.
Experience Developing Sql Procedures on Complex Datasets for Data Cleaning and Automating the Reports.
Experience Developing SAS Macros for Ad-Hoc Reporting In SAS Enterprise Guide Using Query Builder and Sql.

PROFESSIONAL EXPERIENCE:

Confidential, New York

Sr. Data Scientist

Roles & Responsibilities:

Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
Built machine learning models to identify fraudulent applications for loan pre-approvals and to identify fraudulent credit card transactions using the history of customer transactions with supervised learning methods.
Extracted data from database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
Tackled highly imbalanced Fraud dataset using sampling techniques like down-sampling, up-sampling and SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
Used PCA and other feature engineering techniques to reduce the high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.
Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models such as Logistic regression, Gradient Boost Decision Tree and Neural Network.
Used cross-validation to test the models with different batches of data to optimize the models and prevent over fitting.
Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.
Implemented a Python-based distributed random forest via PySpark and MLlib.
Used AWS S3, Dynamo DB, AWS lambda, AWS EC2 for data storage and models' deployment.
Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.
In preprocessing phase, used Pandas to clean all the missing data, data type casting and merging or grouping tables for EDA process.
Used PCA and other feature engineering, feature normalization and label encoding Scikit-learn preprocessing techniques to reduce the high dimensional data (>150 features).
In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the patient admission and discharge data.
Experimented with predictive models including Logistic Regression, Support Vector Machine (SVC), Random Forest provided by Scikit-learn, XG Boost, Light GBM and Neural network by Keras to predict showing probability and visiting counts.
Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the models' significance.
Implemented, tuned and tested the model on AWS Lambda with the best performing algorithm and parameters.

Environment: Oracle 11g, Hadoop 2.x, HDFS, Hive, Pig Latin, Spark/PySpark/MLlib, Python 3.x (Numpy, Pandas, Scikit-learn, Matplotlib, Seaborn), Jupyter Notebook, AWS, Github, Linux, Machine learning algorithms, Tableau.

Confidential, Des Moines, IOWA

Sr. Data Scientist

Roles & Responsibilities:

Lead the full machine learning system implementation process: Collecting data, model design, feature selection, system implementation, and evaluation.
Worked with Machine learning algorithms like Regressions (linear, logistic), SVMs and Decision trees.
Developed a Machine Learning test-bed with different model learning and feature learning algorithms.
By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
Used Text Mining and NLP techniques find the sentiment about the organization.
Developed unsupervised machine learning models in the Hadoop/Hive environment on AWS EC2 instance.
Used clustering technique K-Means to identify outliers and to classify unlabeled data.
Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.
Participated in all phases of Data mining, Data cleaning, Data collection, developing models, Validation, Visualization and Performed Gap analysis.
Used R programming language for graphically critiquing the datasets and to gain insights to interpret the nature of the data.
Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company's core business.
Data wrangling to clean, transform and reshape the data utilizing Numpy and Pandas library.
Contribute to data mining architectures, modeling standards, reporting, and data analysis methodologies.
Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.
Involved in defining the Source to Target data mappings, Business rules, data definitions.
Worked with different data science teams and provided respective data as required on an ad-hoc request basis
Assisted both application engineering and data scientist teams in mutual agreements/provisions of data.

Environment: R Studio 3.5.1, AWS S3, NLP, EC2, Neural networks, SVM, Decision trees, ML base, ad-hoc, MAHOUT, No SQL, Pl/Sql, MDM, MLLib & Git.

Confidential, San Francisco, CA

Data Scientist

Roles & Responsibilities

Responsible for Retrieving data using SQL/Hive Queries from the database and perform Analysis enhancements.
Used R, SAS and SQL to manipulate data, and develop and validate quantitative models.
Worked as a RLC (Regulatory and Legal Compliance) Team Member and undertook user stories (tasks) with critical deadlines in Agile Environment.
Applied Regression in identifying the probability of the Agent's location regarding the insurance policies sold.
Used advanced Microsoft Excel functions such as Pivot tables and VLOOKUP in order to analyze the data and prepare programs.
Performed various statistical tests for clear understanding to the client.
Actively involved in Analysis, Development and Unit testing of the data and delivery assurance of the user story in Agile Environment.
Cleaned data by analyzing and eliminating duplicate and inaccurate data using R.
Experience in retrieving unstructured data from different sites such as in html, xml format.
Worked with Data frames and other data interfaces in R for retrieving and storing the data.
Responsible in making sure that the data is accurate with no outliers.
Applied various machine learning algorithms such as Decision Trees, K-Means, Random Forests and Regression in R with the required packages installed.
Applied K-Means algorithm in determining the position of an Agent based on the data collected.
Read data from various files including HTML, CSV and sas7bdat file etc using SAS/Python.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
Coded, tested, debugged, implemented and documented data using R.
Researched on Multi-layer classification algorithms as well as building Natural Language Processing model through ensemble.
Worked with Quality Control Teams to develop Test Plan and Test Cases.
Worked closely with data scientists to assist on feature engineering, model training frameworks, and model deployments implementing documentation discipline.
Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
Worked with the ETL team to document the Transformation Rules for Data Migration from OLTP to Warehouse Environment for reporting purposes.
Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
Created Use cases, activity report, logical components to extract business process flows and workflows involved in the project using Rational Rose, UML and Microsoft Visio.
Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
Developed mappings to load Fact and Dimension tables, SCD Type 1 and SCD Type 2 dimensions and Incremental loading and unit tested the mappings.
Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, fact-less Fact, snowflake and star schemas.

Environment: R 3.5, Decision Trees, K-Means, Random Forests, Microsoft Excel, Agile, SAS, SQL, NLP

Confidential

Data Scientist

Roles & Responsibilities:

Responsible for data identification, collection, exploration, and cleaning for modeling, participate in biological model development.
Performed data analysis using industry leading text mining, data mining, and analytical tools and open source software.
Used Jira for defect tracking and project management.
Worked on writing and as well as read data from CSV and excel file formats.
Visualize, interpret, report findings, and develop strategic uses of data by R Libraries like ggplot2, The Cancer Genome Atlas (TCGA) Data Portal, ClinVar, ENCODE.
Responsible for loading, extracting and validation of client data.
Creating statistical analysis using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
Missing value treatment, outlier detection and anomalies treatment using statistical methods, deriving customized key metrics by using R package software.
Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
Experienced in parsing json within R or turn R data frames into json by using Mongo DB.
Experienced in using rob mixglm v1.0-2 package to implements robust generalized linear models (GLM) using a mixture method.
Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, space-time.
Strong skills in data visualization like ggplot2, shiny, Plotly, creating different charts such as Heat maps, Bar charts, Line charts.
Responsible for creating / revising and implementing standard operation procedures (SOPs), laboratory records and other related documentation
Analyse the ClinVar data to propose the NonHotspot rule proposal.
UAT testing for patient variant files
Experienced in handling complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.
Worked on production issues and resolving production tickets.
Involved in the integration of multiple layers in the application.
Knowledge in generating Hibernate Mapping Files and Java Classes and Creating the Reverse Engineering File and Creating Hibernate Mapping Files and POJOs From a Database
Basic knowledge in creating an XML configuration file for Hibernate - Database connectivity
Responsible to review PIK3CA novel a MOI reports and the sub protocol for the document for ARM Z1F. Summarized the evidence used for the sub protocol variants and compare to that used for the novel a MOIs.
Performed QA testing on the application.
Held meetings with client and worked for the entire project with limited help from the client.

Environment: Java 1.8, Core Java, Eclipse, Tomcat, Apache Tomcat 5.0, JSP, XML, JIRA, RDBMS, SQL, JSON, JavaScript, HTML5, CSS3, GIT, PL/SQL, GRID, Linux.

We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

New, YorK

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship