Data Scientist Resume New York, NY - Hire IT People

SUMMARY

Highly efficient Data Scientist with 6+ years of experience in Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in banking, travel services, and manufactory industries.
Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
Experienced in Machine Learning and Statistical Analysis with Python Scikit - Learn.
Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
Strong SQL programming skills, with experience in working with functions, packages and triggers.
Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.
Worked with RDBMS including MySQL, DB2 and Oracle SQL.
Worked with NoSQL Database including Hbase, Cassandra and MongoDB.
Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
Experienced in Spark 1.6, Spark SQL and PySpark.
Experienced inData IntegrationValidationandData Qualitycontrols forETLprocess andData WarehousingusingMS Visual Studio SSIS, SSAS, SSRS.
Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
Worked in development environment like Git and VM.
Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS

Python 2.x/3.x, R 3.x, SQL, SAS 9.x, Visual \Hadoop 2.x, MapReduce, HBase, Spark 2.x,
Basic for Applications, VB.NET, SPSS, \PySpark
Minitab, JMP
MySQL 5.x, Oracle SQL, HBase 0.98, \Pandas, numpy, scipy, scikit-learn, matplotlib
Cassandra, MongoDB 3.x\ggplot, ggplot2, dplyr, plyr, gsub, Spark SQL
Regression analysis, classification, K-Means \Regression Models, Confidence Intervals
Clustering, Bayesian Methods, Decision \Bayes Law, Principal Component Analysis
Trees, Random Forests, Support Vector \(PCA), Cross-Validation, Analysis of variance
Machines, neural networks, Logistic \ANOVA, Z-test, T-test, Hypothetical testing
Regression, Data Mining Methods, Factor \Normal distribution
Analysis, Cluster Analysis, recommendation

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Data Scientist

Responsibilities:

Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
Collected, exported, merged and massaged data from multiple sources and platforms, and SQL Server to meet the analytical requirements.
Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongDB connector for Hadoop.
Performed data processing in PySpark.
Performed data cleaning and feature selection using Numpy and Pandas packages in Python.
Performed partitional clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
Used Python to perform ANOVA test to analyze the differences among hotel clusters.
Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
Determined the most accurately prediction model based on the accuracy rate.
Used text-mining process of reviews to determine customers’ concentrations.
Delivered analysis support to hotel recommendation and providing an online A/B test.
Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
Developed hybrid model to improve the accuracy rate.
Delivered the results to operation team for better decisions and feedbacks.

Environment: Python, PySpark, Tableau, MongoDB, Hadoop, SQL Server, SDLC, recommendation systems, Machine Learning Algorithms, text-mining process, A/B test

Confidential, Wilmington, DE

Data Scientist

Responsibilities:

Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
Collaborated with data engineers and operation team to collect data from internal system to fit the analytical requirements.
Redefined many attributes and relationships and cleansed unwanted tables/columns using SQL queries.
Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
Performed data imputation using Scikit-learn package in Python.
Performed data processing using Python libraries like Numpy and Pandas.
Worked with data analysis using ggplot2 library in R to do data visualizations for better understanding of customers’ behaviors.
Visually plotted data using Tableau for dashboards and reports.
Implemented statistical modeling with XGBoost machine learning software package using R to determine the predicted probabilities.
Delivered the results with operation team for better decisions.

Environment: Python, R, SQL, Tableau, Spark, Machine Learning Software Package, recommendation systems

Confidential

Data Analyst

Responsibilities:

Extracted customer reviews from Excel and utilized Spark SQL to perform SQL queries to do data analysis.
Used text-mining process to implement customer reviews and determine what service that customers were more likely to be focused on.
Conducted text-mining process each month since we should update the database all the time to keep focusing on customer feedbacks.
Delivered analysis focusing on promoting sales and providing an online A/B test.
Application of machine learning algorithms and statistical modeling in t-test and compared the significant difference in R.
Presented findings and data to team to improve strategies and operations.

Environment: Spark, R, SQL, text-mining process, A/B test and machine learning algorithms

Confidential, Des Moines, IA

Data Analyst

Responsibilities:

Involved in migration of various objects like stored procedures, tables, and views from various data source to SQL Server.
Filing the accounts receivable, accounts payable, inventory, andoperationsmanagement.
Preparation of data and mapping of ER diagrams that send business a good understanding.
Conducted Conceptual/Logical/Physical Modeling and coordinating with business executives.
Involved in Data Modeling using ERwin (Logical and Physical Design of Databases).
Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly basis.
Presented findings and data to team to improve strategies and operations.

Environment: SQL, SQL Server, ER diagrams, Conceptual/Logical/Physical Modeling, ERwin

We provide IT Staff Augmentation Services!

Data Scientist Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship