We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00 Rating

New York City, NY


  • 6 years of IT and E - commerce industry experience wif strong technical skills in Data Science, Data Warehouse, Business Intelligence, Data Visualization
  • Hands on experience in converting business needs into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data
  • Experienced in Software Development Life Cycle(SDLC) including Requirements Analysis, Design Functional Specification and Testing as per Cycle in Waterfall, Scrum and Agile team environments
  • Hands on experience wif Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting using Business Intelligence tools and advanced features in Excel
  • Strong experience in working and extracting data from various database source like Oracle, SQL Server 2008, MS Access and NoSQL database like MongoDB
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS
  • Ability to analyze raw data, extract data driven insights and develop recommendations through Data Mining, Machine Learning, Predictive Analysis, Data Curation using R and Python
  • Experience working wif A/B Test, Statistical Analysis, Hypothesis Test, Factor Analysis, Regression based Models (Linear, Logistics), ANOVA, Sentiment Analysis, K-means Cluster Analysis, Time Series Analysis, SVM, Naïve Bayes Classification, Random Forest
  • Worked wif Python libraries like NumPy, SciPy, Pandas, Scikit-learn, NTLK and Matplotlib
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features using SQL/MySQL
  • Working noledge of Big Data concepts including Hadoop/HDFS and Map-Reduce wif applications like Spark, HDInsight, Hive, HBase, Pig
  • Worked on Tableau, Power BI, Shiny, QlikView to create dashboards and visualizations
  • Worked wif Amazon Web Services cloud computing and Google Analytics
  • Able to work as part of a connected team, exceptionally detail-oriented and self-motivated


Languages \Operating System: \: Python 3.3/2.7, R, SQL, PL/SQL, T-SQL\Window 7/8/10, Linux, UNIX\

Application \Databases: \: MS Word, Excel, Power Point, Visio, MS \Oracle 11g/12c, SQL Server 2012/2014, \ Visual Studio, Google Analytics\MS Access, MongoDB, Hive\

Packages \Big Data Tools\: Pandas, Scipy, Numpy, Scikit-learn, \Hadoop, Hive, Spark, HDInsight, Pig, Hbase\ Matplotlib, plyr, ggplot2, cars\

Statistics \Reporting Tools: \: Linear Regression, Logistics Regression, \Tableau, Power BI, Qlikview, SAP \ANOVA, Time Series Analysis, Factor \Business Objects, Crystal Reports\Analysis, K-means classification, Cross-\Validation, Cluster Analysis\


Confidential, New York City, NY

Data Scientist


  • Collected comments under posts on various social media sites like Facebook, Instagram
  • Completed data cleansing, integration using Numpy, Scipy, Pandas in Python
  • Collaborated wif colleagues to tag positive and negative comments in training set manually
  • Measured tagging results from different raters using Cohen’s Kappa for accuracy
  • Implemented data cleansing wif Text mining skills like tokenization in Python
  • Built Naïve Bayes Classifier, SVM for text classification using Scikit-learn and NLTK in Python
  • Performed Sentiment analysis of social media comments to evaluate customer satisfaction
  • Automated weekly Statistical analysis report of consumer interaction on social media
  • Oversaw customer relationship data for NYC residential market
  • Collected data from complex internal database writing SQL queries wif Hive and Hadoop
  • Performed K-means Clustering to target optimal consumer segments
  • Created customer relationship dashboard for executives using Tableau
  • Supported A/B test wif sentiment analysis findings using Google Analytics to optimize Confidential .com

Environment: Python, Tableau, Google Analytics, SQL, HDFS/Hadoop, Hive, Spark

Confidential, New York, NY

Data Analyst


  • Converted time lag problems in order fulfillment into Data mining tasks
  • Performed Data Profiling to assess data quality using SQL through complex internal database
  • Improved sales and logistic data quality by data cleaning using Numpy, Scipy, Pandas in Python
  • Built Data warehouse to support end-user queries wif Oracle and MS Visual Studio
  • Designed and implemented Dimensional Data modeling for order fulfillment process
  • Deployed SSIS packages to complete ETL and Data Mapping process
  • Transformed data through methods like Aggregation, Slowly Changing Dimension, Splitting
  • Derived business intelligence report for order fulfillment using MS SSAS and SSRS
  • Determined regression model predictors using Correlation matrix for Factor analysis in R
  • Built Regression model to understand order fulfillment time lag issue using Scikit-learn in Python
  • Optimized predictive model by reducing insignificant variables using Stepwise Regression
  • Empowered decision makers wif data analysis dashboards using Tableau and Power BI

Environment: R, Python 2.7, MS Visual Studio, Tableau, Power BI, MS Excel, HDFS, Hive, Spark

Confidential, Syracuse, NY

Business Intelligence Analyst


  • Collected business requirements and translated into data modeling requirements
  • Designed and implemented Dimensional Data modeling for sales analysis
  • Built Data warehouse and Data marts for data reporting to supply end user queries
  • Created tables, triggers, views, indexes using T-SQL to store data and maintain database
  • Deployed SSIS packages to complete ETL process using MS Visual Studio
  • Transformed data through methods like Aggregation, Slowly Changing Dimension, Splitting
  • Built MOLAP cube using SSAS wif dimensions and measurements to support ad-hoc queries
  • Created Tabular reports for daily/weekly/monthly sales using Pivot table in Excel and SSRS
  • Measured key performance indicator(KPI) for Category to Product hierarchy based on total sales
  • Presented Time Series analysis using Drill down, Charts, Graphs, Maps using SSRS, Power BI
  • Wrote various complex queries for ad-hoc data report using SQL in MS SQL Server

Environment: MS SQL Server, MS Visual Studio, SSIS, SSSAS, SSRS, Power BI, Excel


Product Analyst


  • Collected and cleaned up internet user usage data from online questionnaires using Python
  • Identified target user demographic information from internal database using SQL in Hive
  • Performed Exploratory Analysis on user demographic data using Numpy, Scipy, Pandas
  • Assessed demographic difference between general internet user and target user using T-test
  • Evaluated teh user agreement level on new feature by calculating Cohen’s Kappa in R
  • Built Logistic Regression model to predict user’s willingness in using new technique
  • Forecasted usage of teh new technique by building predictive model using Scikit-learn in Python
  • Qualified predictive mode using K-fold Cross Validation using Scikit-learn in Python
  • Conserved needs of workers for correcting video caption by developing statistical algorithm in R

Environment: R, Python, SQL, HDFS/Hadoop, Hive, Spark, Logistic Regression


Data Analyst


  • Oversaw sales data and customer data for teh Greater China residential market
  • Collaborated wif colleagues to validate and maintain customer information manually
  • Implemented data cleansing and reforming using Python to upload into internal application
  • Categorized data for customer segmentation using Lookup features and Pivot table in Excel
  • Built Random Forest wif Decision Tree to predict sales opportunity in next month
  • Developed monthly/quarterly customer relationship reports using SAP Crystal
  • Supported teh development of Ad-hoc analysis, reports and data extracts

Environment: Python, MS Excel, SAP Crystal, Random Forest, Decision Tree, Classification Analysis


HR Data Analyst


  • Collected human resource requirements from stakeholder partners across cross-functional groups
  • Documented requirements using UML diagrams and ERD graphics using MS Visio
  • Implemented intern data management system for Nanjing headquarter in MS Access
  • Oversaw intern data and created weekly and monthly timesheet report using Excel
  • Conserved operation cost by rearranging teh interns’ timesheet for each department
  • Supported teh development of Ad-hoc analysis, reports and data extracts

Environment: MS Visio, MS Access, MS Excel

We'd love your feedback!