Data Scientist Resume Stamford, CTÂ - Hire IT People

SUMMARY:

Data scientist with 6 years of healthcare, technology, fashion, and e - commerce experience.
Over 4 plus years of experience involved in the entire data science project life cycle, including Data Acquisition, Data Cleaning, Data Manipulation, Data Mining, Machine Learning Algorithms, Data Validation, and Data Visualization.
Expertise in transforming business requirements into analytical models, applying algorithms, and reporting solutions that scales across massive volume of structured and unstructured data.
Experienced with linear regression and logistic regression, Bayesian inference, SVM, neural networks, ANOVA, Gaussian mixture, recommendation system and maximum likelihood estimation analysis.
Strong skills in statistical methodologies and dimension reduction methods like PCA and correspondence analysis, variable clustering.
Worked with testing and validation using k-fold cross validation and regularization.
Extensive experience in developing time series modeling, including but not limited to ARIMA and GARCH modeling, using SAS 9.4, SAS Enterprise Miner & SAS Enterprise Guide and SAS/JMP.
Worked with Python 3.3 in developing machine learning algorithms, like decision tree, random forest, lasso regression, k-mean clustering analysis, using Numpy, Pandas, Scikit-learn, SFrame, Scipy and Matplotlib, nltk packages.
Strong ability to write and optimize diverse SQL queries, working knowledge of RDBMS and NoSQL Database, such as MySQL, SQL Server, HBase, Cassandra, MongoDB.
Adept and deep understanding of text mining, generating data visualizations, delivering projects using various packages in R, like ggplot2, dplyr, caret, twitteR, NLP, rjson, openNLP, tm, GoogleVis, Shiny.
Deep understanding of Map Reduce with Hadoop and Spark. Good knowledge of big data ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Spark (SparkSql, Spark MILib, Spark Streaming).
Excellent performance in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau 9.4/9.2
Good understanding of web design based on HTML5, CSS3, and JavaScript.
Excellent understanding of SDLC, Agile, and Scrum.
Experience with version control tool - Git.
Effective team player with strong communication and interpersonal skills, possess a strong ability to adapt and learn new technologies and new business lines rapidly.

TECHNICAL SKILLS:

BI Tools \ Languages: \: Tableau 9.4/9.2, SharePoint 2016/2013, \ Python 3.3/2.7, R 3, SQL, SAS 9.4, VBA, \MS Office (Word/Excel/PowerPoint/Visio)\ HiveQL, Pig Latin\

Big Data Tools \ Operating Systems: \: Hadoop 2 (Hive, HDFS, Pig, Impala), Spark 2.1 \ Windows 10/8/7, UNIX, Linux\(SparkSql, MILib), MapReduce\

Packages \ Database: \: Python (Numpy, Pandas, Scikit-learn, SFrame, \ Oracle 11g, MS Access 2013, SQL Server \Scipy, Matplotlib, nltk) R (ggplot2, dplyr, caret, \ 2014/2012, MySQL 5.5, HBase 1.2, MongoDB \Twitter, NLP, openNLP, rjson, tm, \ 3.2, Cassandra 3.0\GoogleVis, Shiny)\

PROFESSIONAL EXPERIENCE:

Confidential, Stamford, CT

Data Scientist

Responsibilities:

Continuously collected business requirements during the whole project life cycle
Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in target database
Pulled unstructured data from MongoDB and ensured data aggregation
Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python
Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python
Identified the variables that significantly affect the target
Conducted model optimization and comparison using stepwise function based on AIC value
Worked on model selection based on confusion matrices, minimized the Type II error
Generated cost-benefit analysis to quantify the model implementation comparing with the former situation
Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers

Environment: Tableau 9.4, Python 3.3, Numpy, Pandas, Matplotlib, Scikit-Learn, Machine Learning, MongoDB, Oracle 11g, SQL

Confidential

Data Scientist

Responsibilities:

Collected business requirements and data analysis needs from other departments
Performed data parsing and data profiling from large volumes of varied data to learn about behavior with various features based on transactional data, call center history data and customer personal profile, etc.
Processed the primary quantitative and qualitative market research and loaded the survey responses into database, in preparation of data exploration
Developed python scripts to automate data sampling process. Ensured the data integrity by checking for duplication, completeness, accuracy, and validity
Worked on data cleaning and ensured data quality, consistency, integrity using Numpy, SFrame in Python
Used k-means clustering technique to identify outliers and to classify unlabeled data
Applied Principal Component Analysis method in feature engineering to analyze high dimensional data
Application of various machine learning algorithms and statistical modeling - decision tree, lasso regression, multivariate regression to identify key features using scikit-learn package in python
Evaluated models using k-fold cross validation, log loss function
Ensured that the model has low false positive rate, validated model by interpreting ROC Plot
Experimented text mining based on customer complaints using nltk in Python
Built repeatable processes in support of implementation of new features and other initiatives
Created various type of data visualization using Tableau
Communicated and presented the results with product development team for driving best decisions

Environment: Python 3.3, Hadoop 2, HiveQL, HBase, MapReduce, Tableau 9.4, Numpy, SFrame, Scikit-Learn, nltk

Confidential, Hartford, CT

Data Scientist

Responsibilities:

Implemented and delivered all requirements that are outlined within the contractual agreement between company and the university
Prepared and executed complex SparkSql queries involving multiple joins and advanced analytical functions to validate the ETL processed data in target database
Searched and collected data from external sources, integrated with the primary database. Created SparkSql Context to load data from JSON files and performed SQL queries
Extracted and compiled data, conducted data manipulation to ensure data quality, consistency, and integrity using SFrame in Python
Performed time series model (ARIMA) to capture data pattern and traffic trends, conducted the forecasting of the occupancy rate by different parking lots
Effectively communicated with the business development team, ensured to implement, and complete the initiative that may increase opportunities
Efficiently delivered data interpretation by creating interactive analysis reports using data visualization tools - Tableau, to identify business solutions and to support business decisions on marketing and operation

Environment: Hadoop 2, Spark, SparkSql, MS Office (Excel), Tableau 9.2, Python, SFrame

Confidential

Healthcare Data Analyst

Responsibilities:

Extracted and amalgamated information on the data working. Create primary and secondary competitive intelligence gathering for distribution of impactful bi-weekly and monthly reports
Performed initial descriptive data analysis on datasets using SAS, generated statistical report by PROC UNIVARIATE and FREQ
Conducted hypothesis tests and analysis on the content of clinical datasets to assess quality, completeness, and volumes of data
Coordinated with research team and system owners, in order to understand the origins, contents, and structure of datasets, ensured that research objectives were able to be met
Effectively communicated the results and reported to colleagues and partners
Created decision-driving competitive intelligence reporting from scientific conferences
Comprehensive knowledge of drug development and commercial landscapes

Environment: SAS 9.4, SAS Enterprise Guide, SQL server 2012, MS Office 2013 (Access/PowerPoint/Word/Excel), SPSS

Confidential

Business Analyst

Responsibilities:

Prepared and executed complex SQL queries involving multiple joins and advanced analytical functions to validate the ETL processed data in target database
Accomplished the study of client, including buying behaviors, client profile, segmentations
Analyzed the traffic and business performance of commercial and marketing operations in an approach of continuous improvement of digital devices. Created the data visualization using Shiny in R, in order to track the performance of business campaigns (newsletters, mailing)
Implemented the strategic initiatives with history data, built and tested the predictive models to better estimate the impact of new campaigns
Developed the recommendation system by applying collaborative filter and content-based filter, based on large scale of data set, improved the accuracy and the promptitude of customized recommendation
Created materials on emphasizing product knowledge, brand heritage, website user experience, and luxury service to support CRM initiative and drive sales results
Assembled monthly product performance analysis for use by c-level executives

Environment: MySQL, R, dplyr, caret, mle2, Shiny

Confidential

Business Analyst

Responsibilities:

Participated in data entry, data extraction using SQL queries with MySQL
Identified the key parameters by clearly defining treatment and control groups and marking target audiences who would be incremental and profitable to business
Conducted A/B testing for the implementations of new initiatives and conducted documentation in support of the Web design team
Created different kinds of charts to visualize data analysis results
Successfully generated decision-driving reports

Environment: Python 2.7, MySQL, R, MS Office (Excel/PowerPoint/Word), Pandas

We provide IT Staff Augmentation Services!

Data Scientist Resume

Stamford, Ct

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship