Data Scientist Resume
New York City, NY
SUMMARY
- 6 years of IT and E - commerce industry experience wif strong technical skills in Data Science, Data Warehouse, Business Intelligence, Data Visualization
- Hands on experience in converting business needs into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data
- Experienced in Software Development Life Cycle(SDLC) including Requirements Analysis, Design Functional Specification and Testing as per Cycle in Waterfall, Scrum and Agile team environments
- Hands on experience wif Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting using Business Intelligence tools and advanced features in Excel
- Strong experience in working and extracting data from various database source like Oracle, SQL Server 2008, MS Access and NoSQL database like MongoDB
- Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, SSRS
- Ability to analyze raw data, extract data driven insights and develop recommendations through Data Mining, Machine Learning, Predictive Analysis, Data Curation using R and Python
- Experience working wif A/B Test, Statistical Analysis, Hypothesis Test, Factor Analysis, Regression based Models (Linear, Logistics), ANOVA, Sentiment Analysis, K-means Cluster Analysis, Time Series Analysis, SVM, Naïve Bayes Classification, Random Forest
- Worked wif Python libraries like NumPy, SciPy, Pandas, Scikit-learn, NTLK and Matplotlib
- Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features using SQL/MySQL
- Working noledge of Big Data concepts including Hadoop/HDFS and Map-Reduce wif applications like Spark, HDInsight, Hive, HBase, Pig
- Worked on Tableau, Power BI, Shiny, QlikView to create dashboards and visualizations
- Worked wif Amazon Web Services cloud computing and Google Analytics
- Able to work as part of a connected team, exceptionally detail-oriented and self-motivated
TECHNICAL SKILLS
Languages \Operating System: \: Python 3.3/2.7, R, SQL, PL/SQL, T-SQL\Window 7/8/10, Linux, UNIX\
Application \Databases: \: MS Word, Excel, Power Point, Visio, MS \Oracle 11g/12c, SQL Server 2012/2014, \ Visual Studio, Google Analytics\MS Access, MongoDB, Hive\
Packages \Big Data Tools\: Pandas, Scipy, Numpy, Scikit-learn, \Hadoop, Hive, Spark, HDInsight, Pig, Hbase\ Matplotlib, plyr, ggplot2, cars\
Statistics \Reporting Tools: \: Linear Regression, Logistics Regression, \Tableau, Power BI, Qlikview, SAP \ANOVA, Time Series Analysis, Factor \Business Objects, Crystal Reports\Analysis, K-means classification, Cross-\Validation, Cluster Analysis\
PROFESSIONAL EXPERIENCE
Confidential, New York City, NY
Data Scientist
Responsibilities:
- Collected comments under posts on various social media sites like Facebook, Instagram
- Completed data cleansing, integration using Numpy, Scipy, Pandas in Python
- Collaborated wif colleagues to tag positive and negative comments in training set manually
- Measured tagging results from different raters using Cohen’s Kappa for accuracy
- Implemented data cleansing wif Text mining skills like tokenization in Python
- Built Naïve Bayes Classifier, SVM for text classification using Scikit-learn and NLTK in Python
- Performed Sentiment analysis of social media comments to evaluate customer satisfaction
- Automated weekly Statistical analysis report of consumer interaction on social media
- Oversaw customer relationship data for NYC residential market
- Collected data from complex internal database writing SQL queries wif Hive and Hadoop
- Performed K-means Clustering to target optimal consumer segments
- Created customer relationship dashboard for executives using Tableau
- Supported A/B test wif sentiment analysis findings using Google Analytics to optimize Confidential .com
Environment: Python, Tableau, Google Analytics, SQL, HDFS/Hadoop, Hive, Spark
Confidential, New York, NY
Data Analyst
Responsibilities:
- Converted time lag problems in order fulfillment into Data mining tasks
- Performed Data Profiling to assess data quality using SQL through complex internal database
- Improved sales and logistic data quality by data cleaning using Numpy, Scipy, Pandas in Python
- Built Data warehouse to support end-user queries wif Oracle and MS Visual Studio
- Designed and implemented Dimensional Data modeling for order fulfillment process
- Deployed SSIS packages to complete ETL and Data Mapping process
- Transformed data through methods like Aggregation, Slowly Changing Dimension, Splitting
- Derived business intelligence report for order fulfillment using MS SSAS and SSRS
- Determined regression model predictors using Correlation matrix for Factor analysis in R
- Built Regression model to understand order fulfillment time lag issue using Scikit-learn in Python
- Optimized predictive model by reducing insignificant variables using Stepwise Regression
- Empowered decision makers wif data analysis dashboards using Tableau and Power BI
Environment: R, Python 2.7, MS Visual Studio, Tableau, Power BI, MS Excel, HDFS, Hive, Spark
Confidential, Syracuse, NY
Business Intelligence Analyst
Responsibilities:
- Collected business requirements and translated into data modeling requirements
- Designed and implemented Dimensional Data modeling for sales analysis
- Built Data warehouse and Data marts for data reporting to supply end user queries
- Created tables, triggers, views, indexes using T-SQL to store data and maintain database
- Deployed SSIS packages to complete ETL process using MS Visual Studio
- Transformed data through methods like Aggregation, Slowly Changing Dimension, Splitting
- Built MOLAP cube using SSAS wif dimensions and measurements to support ad-hoc queries
- Created Tabular reports for daily/weekly/monthly sales using Pivot table in Excel and SSRS
- Measured key performance indicator(KPI) for Category to Product hierarchy based on total sales
- Presented Time Series analysis using Drill down, Charts, Graphs, Maps using SSRS, Power BI
- Wrote various complex queries for ad-hoc data report using SQL in MS SQL Server
Environment: MS SQL Server, MS Visual Studio, SSIS, SSSAS, SSRS, Power BI, Excel
Confidential
Product Analyst
Responsibilities:
- Collected and cleaned up internet user usage data from online questionnaires using Python
- Identified target user demographic information from internal database using SQL in Hive
- Performed Exploratory Analysis on user demographic data using Numpy, Scipy, Pandas
- Assessed demographic difference between general internet user and target user using T-test
- Evaluated teh user agreement level on new feature by calculating Cohen’s Kappa in R
- Built Logistic Regression model to predict user’s willingness in using new technique
- Forecasted usage of teh new technique by building predictive model using Scikit-learn in Python
- Qualified predictive mode using K-fold Cross Validation using Scikit-learn in Python
- Conserved needs of workers for correcting video caption by developing statistical algorithm in R
Environment: R, Python, SQL, HDFS/Hadoop, Hive, Spark, Logistic Regression
Confidential
Data Analyst
Responsibilities:
- Oversaw sales data and customer data for teh Greater China residential market
- Collaborated wif colleagues to validate and maintain customer information manually
- Implemented data cleansing and reforming using Python to upload into internal application
- Categorized data for customer segmentation using Lookup features and Pivot table in Excel
- Built Random Forest wif Decision Tree to predict sales opportunity in next month
- Developed monthly/quarterly customer relationship reports using SAP Crystal
- Supported teh development of Ad-hoc analysis, reports and data extracts
Environment: Python, MS Excel, SAP Crystal, Random Forest, Decision Tree, Classification Analysis
Confidential
HR Data Analyst
Responsibilities:
- Collected human resource requirements from stakeholder partners across cross-functional groups
- Documented requirements using UML diagrams and ERD graphics using MS Visio
- Implemented intern data management system for Nanjing headquarter in MS Access
- Oversaw intern data and created weekly and monthly timesheet report using Excel
- Conserved operation cost by rearranging teh interns’ timesheet for each department
- Supported teh development of Ad-hoc analysis, reports and data extracts
Environment: MS Visio, MS Access, MS Excel