Data Scientist Resume NYC - Hire IT People

PROFESSIONAL SUMMARY:

Passionate Data Scientist with an advanced degree in Statistics and 6 years of industrial experience in Data Science, Machine Learning and Natural Language Processing, with proven technical expertise in developing and implementing large scale algorithms to significantly impact business revenues and guide businesses in taking quality data driven decisions.
Competent in end - to-end implementation of a data science project
Expertise in Statistical Analysis, Hypothesis Testing, Data Cleaning and Processing, Supervised and Unsupervised Machine Learning, Deep Learning and Natural Language Processing
Strong mathematical knowledge in Linear Algebra, Stochastic Theory, Game Theory, Markov Decision Process (MDP) and Non-Linear Dynamics
Experience in handling semi-structured, structured and sparse data
Proficient in balancing datasets by utilizing Resampling techniques such as SMOTE for Oversampling and Cluster Centroid Method for Undersampling
Experience in performing Normalizing and Standardizing for optimal performance in relational and dimensional databases
Adept at employing Feature Engineering and Feature Selection techniques for feature extraction
Hands on experience in Dimensionality Reduction algorithms such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDE) and t-Distributed Stochastic Neighbor Embedding (t-SNE)
Expertise in Supervised learning algorithms inclusive of, but not limited to Regression, K Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision trees and Ensemble models: Bagging, Random Forest s, Adaboost, Gradient Descent Boosting and XGBoost
Expertise in Unsupervised Machine Learning techniques including K-Means, Gaussian Mixture Models (GMM) and Hierarchical Clustering
Proficient in Deep Learning net families such as Multilayer Perceptrons, Artificial Neural Networks, and Recurrent Neural Networks with LSTM and GRU
Adept in hyperparameter tuning using Random Search, Grid Search and Bayesian Optimization
Experience employing Tensorflow and its framework Keras in Python for Deep Learning
Proficient in Natural Language methods for Sentiment Analysis using Word Embeddings like Word2Vec, tf-idf and Glove Methods
Experience in Data Integration, Validation and Data Quality control for ETL processes and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS
Adept in employing data visualization tools such as Tableau and Python libraries Matplotlib, Seaborn and Plotly to create visually appealing plots and interactive dashboards
Actively involved in all phases of the Data Science project life cycle including Data Extraction (ETL), Cleaning, Preprocessing, Visualizing, Modelling and Version Control (GIT)
Experience with cluster computing frameworks such as HADOOP ecosystem
Skilled at working in Windows and Linux platforms
Strong knowledge in all phases of the SDLC (Software Development Life Cycle) including analysis, design, development, testing, implementation and maintenance
Experience in Agile and SCRUM environments

TECHNICAL SKILLS:

Languages & Softwares: Python (numpy, scipy, pandas, matplotlib, seaborn, scikit-learn, tensorflow, NLTK, keras), R (dplyr, ggplot, caret, glmnet, e1071, nnet), MATLAB, WEKA, Minitab, Tableau

Statistical & Machine Learning Techniques: SVD, Standardization, Normalization, L1/L2 Regularization, Loss Minimization, Performance Measurement of Models, Featurization, Feature Engineering, Matrix Factorization, Model Calibration, A/B Testing, Randomization, Point and Interval Estimation, Confusion Matrix, ROC curve, Precision-Recall Curve, Graph Theory

Databases: SQL Server, MS-Access

Operating systems: Mac, Windows, Linux

PROFESSIONAL EXPERIENCE:

Confidential, NYC

Data Scientist

Responsibilities:

Worked alongside the data engineering and data science teams to build high-performance, low latency systems to manage high velocity data streams
Cleaned and Processed the unstructured fraudulent wire extraction data via Tokenizing, Stemming and Parts of Speech tagging to extract customer and bank information
Employed Regular Expressions and built Named Entity Recognition models using the Natural Language Tool Kit (NLTK) and SpaCy to pull out relevant customer information
Built a Conditional Random Fields (CRF) model in scikit-learn for pattern recognition and compared the model predictions against actual outputs
Employed python’s visualization libraries to draw patterns from customer credit history and payment activities on historical data to analyze customer behavior and generate reports to the business team
Assigned probability scores to credit card applicants based on their feature attributes to aid client make better decisions regarding an applicant’s credibility
Built NLP pipelines and collaborated with the DEVOPS team to deploy code into production
Performed web scraping from google alert links to pull out information for analysis
Utilized Random Forests and SVM’s to classify if the crime in question is of a Financial nature
Extracted fraudster information for financial crimes and wrote SQL queries to perform Teradata search to determine if the concerned person is an Confidential customer
Utilized results from Teradata search and customer information from Fraudulent wire Transactions to generate auto narratives of potential threats to Amex
Conceptualized and implemented Artificial Neural Networks as well as LSTM’s via dense Recurrent Neural Networks into the pipeline to process continuous data and gather information in sequence
Created distributed Tensor Flow environments across multiple CPUs and GPUs to run in parallel

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, Linux, Git, Microsoft Excel, HADOOP, PCA, Logistic Regression, CRF, Tensor Flow, Keras, Natural Language Tool Kit, Spacy, Named Entity Recognition, Natural Language Generation, Git

Confidential, Kansas City

Data Scientist

Responsibilities:

Performed Data Cleaning, Feature Scaling, Feature Engineering and Exploratory Data analysis to maximize insight, detect outliers and extract important features for modelling
Implemented Principal Component Analysis (PCA) and t-Stochastics Neighbor Embedding (t-SNE) dimensionality reduction algorithms to achieve reduced datasets
Implemented Clustering algorithms for market segmentation to analyze customer behavior
Trained several machine learning models on selected features to predict Customer churn
Utilized cross-validation techniques and LASSO regularization to avoid overfitting, then evaluated the models adopting performance metrics more robust to imbalanced classes
Tuned the model hyperparameters using Bayesian optimization and grid search to achieve higher levels of model performance
Improved model accuracy by 5% introducing Ensemble techniques: Bagging, Gradient, Xtreme Gradient and Adaptive Boosting
Feature engineered email data by employing NLP techniques like Word2Vec, BOW and tf-idf
Performed Sentiment Analysis on email feedback to understand the emotional tone behind words
Utilized data visualization tools such as Tableau and Python’s vast data visualization libraries to communicate findings to the data science, marketing and engineering teams
Generated ADHOC reports to the business teams in Tableau to make client make impactful data driven decisions

Environment: Microsoft Excel, SQL, NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, PySpark, Random Forests, SVM, t-SNE, PCA, K-Means, Tensor Flow, Keras, Natural Language Tool Kit, Tableau, Git

Confidential

Decision Scientist

Responsibilities:

Wrote production level code and built ETL pipelines in PostgreSQL and R to ingest huge volumes of messy data
Regularly performed A/B testing for evaluating test vs control to tighten the production line
Created goal driver models for branches of a regional bank in the US to white-box the product level goal objectives provided by one of the leading goal setting vendors
Iterated over different variations of regression models to arrive at final linear mixed models
Assigned probability scores using logistic regression to candidates applying for a call center representative role to a leading insurance company based on their application details to aid client make decisions regarding employee survival and performance
Wrote code in R to web scrape html to extract product data from e-commerce websites
Built a Shiny based interactive scheduling tool in R to investigate workplace attrition, client file redeliveries and client file delays
Created documentation for muESP which enables operationalization of real time and batch analytics

Confidential

SQL Developer

Responsibilities:

Involved in the Software Development Life Cycle (SDLC) process, analyzed business requirements and functional work flow of information from source to destination
Developed Stored Procedures, functions and database triggers while maintaining referential integrity and implementing complex business logic
Created SSIS packages for transferring data from various data sources like SAP R3 system, Oracle, MS Access, Excel, *. Txt file, *.CSV files
Wrote complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement the business logic and created clustered and non-clustered indexes
Tuned Complex SQL Queries using complex T-SQL statements and implemented various types of Constraints and Triggers for Data Consistency
Performed tuning and optimization of queries that took longer execution times using SQL Profiler, index tuning wizard and SQL Query Analyzer
Tested databases and fixed bugs

We provide IT Staff Augmentation Services!

Data Scientist Resume

NyC

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship