We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • A passionate and self - driven Data Scientist wif extensive experience in Data Mining, Machine Learning, ETL Data Pipeline, Data Visualization, and SQL Development
  • Expert in the entire Data Science Project Life Cycle including Data Acquisition, Data Wrangling, Data Manipulation, Feature Engineering, Machine Learning Modeling, Validation, Optimization, Deployment and Visualization.
  • Experienced in building machine learning solutions using PySpark for large sets of data on Hadoop ecosystem and familiar wif Big Data tools such as SparkSQL, PySpark, Map reduce, HDFS, HiveQL, and Pig.
  • Developed and Deployed Machine Learning Models for different Business Requirements in AWS SageMaker.
  • Experienced in building various machine learning and Deep Learning models using algorithms such as Linear Regression (Ridge, Lasso, Elastic Net), Logistic Regression, Clustering (K-means), Gradient Descent, Support Vector Machines (SVM), KNN, Decision Tree, Ensembles such as Random Forest, XGBoost, LightGBM, Neutral Networks (RNN).
  • Experienced in performing Time Series Analysis wif Seasonal ARIMAX and Recurrent Neural Networks methods such as LSTM and GRU.
  • Experienced wif Text Mining and Text Classifications Tasks (Natural Language Processing) using Keras and Pytorch framework in Python.
  • Experienced in using cloud services like Amazon Web Services (AWS) such as EC2 and S3 to work wif different virtual machines.
  • Experienced in tuning algorithms using methods such as Grid Search, Randomized Search, Bayesian Optimization, and K-Fold Cross Validation.
  • Experienced in dimensionality reduction techniques such as PCA, LDA, and SVD.
  • Expert at Extraction, Transformation and Loading (ETL) data from various sources into Enterprise Data Warehouse and multiple Data Marts using T-SQL, SSIS (Visual Studio) and SSMS.
  • Extensive hands-on experience wif T-SQL and MySQL in CRUD operations such as create, Insert, Delete, Update, Select, and a variety of SQL objects such as stored procedure, function (Windows Function, UDF), Temp table, CTE (common table expression), Subquery, view, Index, constraint, etc.
  • Proficient in data visualization tools such as Tableau, Power BI, SSRS (Tabular Report), Python Matplotlib, Seaborn, and R ggplot2 to create interactive reports and dashboards.
  • Excellent performance in blending and merging data from multiple sources and visualizing them using Donut Charts, Sankey Flow Charts, Histograms, Stacked Bar Chart, Geographic Maps, Gantt Charts, Tree Map, Heatmap etc. for reporting purpose using Tableau and Power BI (Data Model).
  • Strong knowledge of Software Development Life Cycle (SDLC) project management methodologies such as Agile/Scrum and Waterfall.

TECHNICAL SKILLS

Programming Languages: Python 2.x/3.x (Numpy, Pandas, Scikit-Learn, Matplotlib, Seaborn, Keras, Pytorch, BeautifulSoup, Request, Scipy, Matplotlib, Seaborn), R/ R Studio (dplyr, caret, forecast, ggplot2)

Databases: MS SQL Server 2008/2012/2017 (SSMS), MySQL, PostgreSQL

BI Tools: Tableau 9.x/10.x/2018.x/2019.x, Tableau Prep, Power BI, MS Suite 2012 (SSIS/SSRS), MS Excel (Pivot Table, VLookUp)

ETL Tools: SSIS (Visual Studio), Informatica

Big Data Tools: Spark 2.0 (PySpark, Spark SQL), HDFS, HiveQL

Cloud Server: AWS (EC2, S3, RDS, EMR, Redshift, Sage Maker, DynamoDB), Azure Databricks

ML Algorithms: Linear Regressions, Ridge Regression, Lasso Regression, Logistic Regression, KNN, SVM, Decision Trees, Naive Bayes, Random Forest, XGBoost, LightGBM, PCA, LDA, K-Means, Seasonal ARIMAX, Neutral Networks (LSTM, GRU), NLP, Ensemble Modeling

Other Tools: Jupyter, Visual Studio, Git/Github, JIRA/Confluence, Slack, Google Drive

Operating Systems: Windows 7/8/10, Mac OS

PROFESSIONAL EXPERIENCE

Confidential, NEW YORK, NY

DATA SCIENTIST

Responsibilities:

  • Prepared and analyzed the data, identified the patterns on the dataset by applying models and statistical analysis.
  • Worked wif stakeholders to collect the business requirements of the projects and identify opportunities for leveraging company data to drive business analytics solutions.
  • Performed data preparation, data manipulation, normalization, and predictive modelling. Improved accuracy by evaluating models in Python.
  • Used classification techniques including Logistic Regression, Random Forest, and LightGBM to identify users who are more likely to churn.
  • Performed Sentiment Analysis and Topic Classification to automate analysis of customer reviews using Bert and LSTM wif Keras framework.
  • Performed customer segmentation based on machine learning and statistical modelling effort including building predictive models and generating data products to support customer segmentation using K-means.
  • Collected data using Hadoop tools such as Spark SQL to retrieve the data required for building models from AWS S3.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas and NumPy.
  • Designed and developed analytics machine learning models and deployed ML/DL models in AWS Sagemaker.
  • Worked wif sales and Marketing team to partner and collaborate wif a cross-functional team to frame and answer important data questions prototyping and experimentation ML/DL algorithms and integrating into production systems for different business needs.
  • Built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs.
  • Good Knowledge in AWS Environment for loading data files from the cloud servers.
  • Presented Dashboards to Higher Management for more Insights using Tableau.
  • Improved Model Expandability using SHAP, Feature Importance Scores, and Decision Tree.
  • Performed Boosting method on predicted model to improve efficiency of the model.
  • Uploaded file folder to FTP Server using SSIS to support non-sensitive data exchanging wif business partners.
  • Collaborated wif the project managers and business owners to understand their organizational processes and halp design the necessary reports.
  • Collaborated wif the data engineer to construct and publish Tableau reports on Tableau Server wif row-level security and scheduled automatic extract refresh.
  • Maintained and developed complex SQL queries, stored procedures, views, and functions dat meet customer requirements using Microsoft SQL Server 2017.
  • Presented information and analysis using data visualization techniques such as Python Matplotlib, Seaborn, and Tableau.

Confidential, BROOK, IL

DATA SCIENTIST/ENGINEER

Responsibilities:

  • Participated in all phases of data mining, data collection, data cleaning, visualization, developing models, validation, and deployment to deliver data science solutions.
  • Built ETL Data Pipelines in SSIS to automate the process of data collection and transformation from different data sources and tan loading into SQL database in Azure.
  • Implemented Classification models such as Random Forest and Light GBM to identify patients who are highly likely to readmit wifin 30 days to better halp wif resource management.
  • Created Spark clusters in Azure Databricks for manipulating extreme large data sets.
  • Worked on fraud detection analysis on payments transactions using the history of transactions wif supervised learning methods.
  • Forecasted sales using Time Series Methods (Seasonal ARIMAX, LSTM, GRU) wif Keras framework in Python to halp optimize cash flow management for the new fiscal year.
  • Handled unbalanced fraud dataset using oversampling methods such as SMOTE algorithm.
  • Worked on Azure Databricks cloud virtual machine to do machine learning on big data.
  • Implemented a Python-based distributed random forest via PySpark.
  • Used Pandas, Numpy, Seaborn, Matplotlib, and Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Ridge regression, Logistic regression, Gradient Boosting, SVM and KNN, Random Forest, XGbooost, etc.
  • Used cross-validation to test the models wif different batches of data to optimize the models and prevent overfitting.
  • Used PCA and other feature engineering techniques for high dimensional datasets while maintaining the variance of most important features.
  • Performed Ensemble methods to increase the accuracy of the training model wif different Bagging and Boosting methods.
  • Designed, developed, and maintained daily and monthly summary, trending, and benchmark reports repository in Power BI Desktop.
  • Implemented visualizations and views like stacked bar charts, Donut charts, Geographic maps, Sankey flow chart, histogram, pivot table, etc.
  • Constructed and published Power BI reports on Power BI Server, configured security setup, and scheduled automatic extract refresh on regular basis.

Confidential, MORRIS, IL

DATA SCIENTIST/ BI ANALYST

Responsibilities:

  • Designed SSIS packages to extract, transform, and load (ETL) existing data into SQL Server using Union ALL, Merge Join, Lookup, Derived Columns, Conditional Split, Aggregate, Execute SQL Task, Data Flow Task, and Execute Package Task.
  • Claimed Payment Period Prediction using machine learning models such as SVM, Random Forest, and XGboost.
  • Built ML models to predict patient no-show probability and optimize appointment management process.
  • Analyzed and dealt wif large data sets using SparkSQL and PySpark.
  • Designed and developed ETL framework to support Data Migration project using SSIS and T-SQL.
  • Hands-on experience wif Multiprocessing and Multithreading to speed up program running process.
  • Worked wif high dimensional datasets retrieved from users, media agencies or third-party app and used methods such as PCA and LDA for dimensionality reduction.
  • Used different feature engineering methods in Python (Pandas, Numpy, Matplotlib, Seaborn) to cleanse high dimensional datasets and prepare it for modeling.
  • Collaborated wif data engineers, wrote, and optimized SQL queries to perform data extraction from databases.
  • Performed data integrity checks, data cleaning, exploratory analysis and feature engineering using Python.
  • Created TEMPeffective visualizations using tableau and investigated a dataset to create data visualizations and trends and patterns in data.
  • Implemented data refreshes on Tableau Server for biweekly and monthly increments based on business change to ensure dat the views and dashboards display the changed data accurately.
  • Designed business intelligence dashboards using Tableau Desktop and published the same on Tableau server, allowing executive management to view current and past performance sales trends, forecast sales and various KPIs MoM/YoY/weekly/monthly sales amount/quantity, Sell through Rate, Gross Margin and Items per purchase.
  • Published workbooks and extracted data sources to Tableau Server, implemented row-level security and scheduled automatic extract refresh.
  • Analyzed and visualized different segments of users to understand their advertisement behaviors better wif Tableau.

Confidential

DATA ANALYST

Responsibilities:

  • Collaborated wif Business Analyst to understand the business processes, user needs for different purposes of projects and translated them into functional and non-functional specifications.
  • Built ETL (Extraction, Transformation, and Loading) pipelines using T-SQL to extract data from data warehouses, transact raw data into specific-required data and load data into data platforms.
  • Segmented customers by RFM analysis using Python, selected customers wif high value and improved their retention rate by sending ads and coupons.
  • Predicted customer conversion likelihood using Classification models to improve marketing efficiency and reduce marketing cost.
  • Designed SSIS packages to extract, transform, and load existing data into SQL Server using Union ALL, Merge Join, Lookup, Derived Columns, Conditional Split, Aggregate, Execute SQL Task, Data Flow Task, and Execute Package Task.
  • Build SQL queries for performing various CRUD operations like create, update, read and delete.
  • Maintained and developed complex SQL queries, stored procedures, views, and functions dat meet customer requirements using Microsoft SQL Server 2012.
  • Wrote complex SQL queries using inner join, left join and temp table to retrieve data from the database for reporting purposes.
  • Prepared parameterized stored procedure in SSMS and developed various tabular reports such as sub reports, drill-through reports, and parameterized reports using SSRS.
  • Created technical reports and visualized data in Tableau to support marketing and project activities.
  • Created incremental refreshes for data sources on Tableau server.
  • Created views in Tableau Desktop dat were published to the internal team for review and further data analysis and customization using filters and actions.
  • Published Tableau reports and dashboards on Tableau Server, encouraged business users to explore data, and implemented row-level security and scheduled automatic extract refresh.
  • Blended data from multiple databases into one report by selecting primary keys from each database.

We'd love your feedback!