We provide IT Staff Augmentation Services!

Data Scientist Resume

Bronx, NY


  • Highly efficient, team - oriented Data Scientist with 6-year experience in Data Modelling, Statistical Modeling, Data Mining, Machine Learning, and Data Visualization.
  • Rich domain knowledge and experience in Healthcare, Banking, and Retail industries.
  • Involved in an entire data science project life cycle including Data Collection, Data Cleaning, Data Manipulation, Visualization, and Data Analysis, building models, Testing, Optimization, and Deployment.
  • Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, developing data mining and reporting solutions across a massive volume of structured and unstructured data.
  • Proficient in Machine Learning algorithm and Predictive Modeling including Regression, Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, Ada Boosting, Gradient Boosting, Stacking, KNN, Neural Network, K-mean Clustering, Elastic Net regularization, Lasso regularization.
  • Proficient in Statistical Methodologies including Hypothetical Testing, Time Series, Principal Component Analysis (PCA), ANOVA, Factor Analysis, Cluster Analysis, Sentiment Analysis.
  • Expertise in developing time series forecasting models such as Exponential Smoothing model, Seasonal Exponential Smoothing model, Holt-Winters model, and ARIMA model.
  • Worked in large scale database environments like Hadoop and MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Familiar with Hadoop ecosystem and Apache Spark framework such as HDFS, MapReduce, HiveQL, SparkSQL, PySpark.
  • Proficient in data visualization tools such as Tableau, Python Matplotlib/Saeborn, Rggplot2/Shiny to create visually powerful and actionable interactive reports and dashboards.
  • Experienced in Cloud Services such as Amazon Web Services (AWS) EC2, EMR, RDS, S3 to assist with big data tools, solve storage issue and work on deployment solution.
  • Experienced in RDBMS such as SQL server and Non-relational database such as MongoDB 3.x.
  • Adept in developing and debugging Stored Procedures, User-defined Function (UDF), Triggers, Indexes, Constraints, Transactions, and Queries using Transact-SQL (T-SQL).
  • Excellent understanding of the Systems Development Life Cycle (SDLC) such as Agile and SCRUM process.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.


Databases: Microsoft SQL Server 2012/2016, MySQL, Oracle 11g/12c, MongoDB 3.x

Programming Languages: Python 2.x/3.x, SAS, R Studio, SQL, T-SQL

Machine Learning: Regression, Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, Ada Boosting, Gradient Boosting, Stacking, KNN, Neural Network, K-mean Clustering, Natural Language Processing (NLP), Market Basket Analysis, Elastic Net regularization, Lasso regularization.

Hadoop Ecosystem: Hadoop 2.x, HDFS, MapReduce, Spark 2.x, Pig, Hive, Sqoop

BI Tools: Tableau 9.x/10.x, Microsoft Suite (SSIS/SSRS), SAS Enterprise Miner, Power BI

Cloud Services: Amazon Web Services (AWS) EC2/S3/Redshift, Microsoft Azure Machine Learning Studio

Project Management Tools: JIRA, GitHub, Slack, Google Shared Services

Operating Systems and Others: Microsoft Windows, Linux/Unix, Microsoft Office Suite (Word, Powerpoint, Excel)


Confidential, Bronx, NY

Data Scientist


  • Developed and updated SQL queries, stored procedures, clustered index & non-clustered index, and functions that meet business requirements using SQL Server 2014.
  • Used SSIS to create ETL packages to Validate, Extract, Transform and Load data into Data Warehouse and Data Mart.
  • Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
  • Matched customers and products by using seasonal exponential smoothing methodology to predict the price of new products.
  • Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential sales.
  • Built classification models using Random Forest, Support Vector Machine and Stacking methods to find potential customers.
  • Implemented Word2Vec, TextBlob to analyze the reviews of Natural Language Process (NLP) to match new and best-selling products for different subscription box customers.
  • Developed Sentiment Analysis and Speech Analytics to improve the subscription box system.
  • Used various metrics (RMSE, MAE, F-Score, ROC, and AUC) to evaluate the performance of each model.
  • Used big data tools Spark (Pyspark, SparkSQL, Mllib) to conduct real-time analysis of transaction based on AWS Linux/Unix.
  • Designed rich data visualizations to model data into human-readable form with Tableau, Shiny App and Matplotlib.

Environments: MS SQL Server 2014, Spark (Pyspark, MLlib, Spark SQL), Python (Scikit-Learn/ Numpy/Pandas/Matplotlib/Seaborn ), Tableau (Desktop 9.x/Server 9.x), AWS RedShift, Hadoop, S3, HDFS, JIRA, Linux/Unix

Confidential, New York, NY

Data Scientist


  • Gathered, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.
  • Improved Anti-Money Laundering prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Developed Naive Bayes, KNN, Logistic Regression, Random forest and SVM for rare event case and suspicious activities.
  • Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.
  • Tackled highly imbalanced Fraud dataset using undersampling, oversampling with SMOTE and cost-sensitive algorithms with Python Scikit-learn.
  • Explored optimized sampling methodologies for different types of datasets.
  • Explored and analyzed the suspicious transactions features by using Spark SQL.
  • Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, MapReduce, and HDFS.
  • Deployed machine learning models on cloud services including AWS Lambda and EC2 with Linux/Unix.
  • Conducted Data blending, Data preparation using Alteryx and SQL for tableau consumption and publishing data sources to Tableau server.

Environment: ETL, SSIS, Alteryx, Tableau (Desktop 9.x/Server 9.x), Python 3.x(Scikit-Learn/Scipy/Numpy/Pandas), AWS Redshift, Spark (Pyspark, MLlib, Spark SQL), Hadoop, MapReduce, HDFS, SharePoint, SQL Server 2012, Linux/Unix

Confidential, Mt Laurel, NJ

Data Scientist


  • Performed data analysis by using Spark to retrieve the data from Hadoop cluster, SQL to retrieve data from Azure Cloud.
  • Wrote complex Spark SQL queries for data analysis to meet business requirement.
  • Built an ETL process to ensure that data are well cleaned, and the data warehouse is up-to-date for reporting purpose.
  • Applied clustering algorithms like Hierarchical, K-means, KNN using Scikit and Scipy to determine the drug combinations.
  • Performs complex pattern recognition of automotive time series data and forecast demand through the ARMA and ARIMA models and exponential smoothening for multivariate time series data.
  • Used Agile methodology and Scrum process for project developing.
  • Designed, developed and maintained daily and monthly summary, trending and benchmark reports repository in Tableau Desktop.
  • Performed report visualization using Python package (Matplotlib, Seaborn).

Environment: Python 2.x (Scikit-Learn/Scipy/Numpy/Pandas), SAS, Tableau (Desktop 8.x/Server 8.x), Hadoop, Map Reduce, SSIS, SQL Server 2012

Confidential, New York, NY

Data Analyst


  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy.
  • Used SSIS to create ETL packages to Validate, Extract, Transform, and Load data into Data Warehouse and Data Mart.
  • Optimized the performance of queries with modification in T-SQL queries, removed the unnecessary columns, redundant data, normalized tables, established joins, and created the index.
  • Adept in developing and debugging Stored Procedures, User-defined Function (UDF), Triggers, Indexes, Constraints, Transactions, and Queries using Transact-SQL (T-SQL).
  • Built classification models based on Logistic Regression, Decision Trees, Random Forest Support Vector Machine, and Ensemble algorithms for a marketing campaign.
  • Built the prediction model using gradient boost, elastic-net regularization and lasso regularization in Python from sklearn package for the prices forecasting.
  • Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
  • Implemented business intelligence dashboards using Tableau and Shiny APP in R, producing different summary results based on requirements and role members.

Environment: Microsoft SQL Server 2012, SQL Server Management Studio, MS BI Suite (SSIS, SSRS), T-SQL, Visual Studio, Python 2.x, R studio, Tableau.

Confidential, Philadelphia, PA

BI Developer/Data Analyst


  • Designed SSIS packages to extract, transform and load existing data into SQL Server, using Pivot Transformation, Fuzzy Lookup, Derived Columns, Condition Split, Aggregate, Execute SQL Task, Data Flow Task and Execute Package Task.
  • Maintained and developed complex SQL queries, stored procedures, views, functions, and reports that meet customer requirements using Microsoft SQL Server 2008 R2.
  • Created Views and Table-valued Functions, Common Table Expression (CTE), joins complex subqueries to provide the reporting solutions.
  • Optimized the performance of queries with modification in T-SQL queries, removed the unnecessary columns and redundant data, normalized tables, established joins and created the index.
  • Migrated data from SAS environment to SQL Server 2008 via SQL Integration Services (SSIS).
  • Analyzed customer behavior for different loan products and reported portfolio reporting and provided ad-hoc analysis for upper management to drive critical business decisions.
  • Developed and implemented several types of Financial Reports (Income Statement, Profit& Loss Statement, EBIT, ROIC Reports) by using SSRS.
  • Developed parameterized dynamic performance Reports (Gross Margin, Revenue base on geographic regions, Profitability based on web sales and smartphone app sales).
  • Perform analyses such as regression analysis, logistic regression, discriminant analysis, cluster analysis using SAS programming.

Environment: SQL Server 2008 R2, DB2, Oracle, SQL Server Management Studio, SAS/ BASE, SAS/SQL, SAS/Enterprise Guide, MS BI Suite (SSIS/SSRS), T-SQL, SharePoint 2010, Visual Studio 2010

Hire Now