Data Scientist Resume
3.00/5 (Submit Your Rating)
New York, NY
SUMMARY
- Highly efficient Data Scientist with 6+ years of experience in Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in banking, travel services, and manufactory industries.
- Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
- Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
- Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
- Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
- Experienced in Machine Learning and Statistical Analysis with Python Scikit - Learn.
- Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
- Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
- Strong SQL programming skills, with experience in working with functions, packages and triggers.
- Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.
- Worked with RDBMS including MySQL, DB2 and Oracle SQL.
- Worked with NoSQL Database including Hbase, Cassandra and MongoDB.
- Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
- Experienced in Spark 1.6, Spark SQL and PySpark.
- Experienced inData IntegrationValidationandData Qualitycontrols forETLprocess andData WarehousingusingMS Visual Studio SSIS, SSAS, SSRS.
- Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Worked in development environment like Git and VM.
- Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.
TECHNICAL SKILLS
- Python 2.x/3.x, R 3.x, SQL, SAS 9.x, Visual \Hadoop 2.x, MapReduce, HBase, Spark 2.x,
- Basic for Applications, VB.NET, SPSS, \PySpark
- Minitab, JMP
- MySQL 5.x, Oracle SQL, HBase 0.98, \Pandas, numpy, scipy, scikit-learn, matplotlib
- Cassandra, MongoDB 3.x\ggplot, ggplot2, dplyr, plyr, gsub, Spark SQL
- Regression analysis, classification, K-Means \Regression Models, Confidence Intervals
- Clustering, Bayesian Methods, Decision \Bayes Law, Principal Component Analysis
- Trees, Random Forests, Support Vector \(PCA), Cross-Validation, Analysis of variance
- Machines, neural networks, Logistic \ANOVA, Z-test, T-test, Hypothetical testing
- Regression, Data Mining Methods, Factor \Normal distribution
- Analysis, Cluster Analysis, recommendation
PROFESSIONAL EXPERIENCE
Confidential, New York, NY
Data Scientist
Responsibilities:
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
- Collected, exported, merged and massaged data from multiple sources and platforms, and SQL Server to meet the analytical requirements.
- Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongDB connector for Hadoop.
- Performed data processing in PySpark.
- Performed data cleaning and feature selection using Numpy and Pandas packages in Python.
- Performed partitional clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
- Used Python to perform ANOVA test to analyze the differences among hotel clusters.
- Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
- Determined the most accurately prediction model based on the accuracy rate.
- Used text-mining process of reviews to determine customers’ concentrations.
- Delivered analysis support to hotel recommendation and providing an online A/B test.
- Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
- Developed hybrid model to improve the accuracy rate.
- Delivered the results to operation team for better decisions and feedbacks.
Environment: Python, PySpark, Tableau, MongoDB, Hadoop, SQL Server, SDLC, recommendation systems, Machine Learning Algorithms, text-mining process, A/B test
Confidential, Wilmington, DE
Data Scientist
Responsibilities:
- Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
- Collaborated with data engineers and operation team to collect data from internal system to fit the analytical requirements.
- Redefined many attributes and relationships and cleansed unwanted tables/columns using SQL queries.
- Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
- Performed data imputation using Scikit-learn package in Python.
- Performed data processing using Python libraries like Numpy and Pandas.
- Worked with data analysis using ggplot2 library in R to do data visualizations for better understanding of customers’ behaviors.
- Visually plotted data using Tableau for dashboards and reports.
- Implemented statistical modeling with XGBoost machine learning software package using R to determine the predicted probabilities.
- Delivered the results with operation team for better decisions.
Environment: Python, R, SQL, Tableau, Spark, Machine Learning Software Package, recommendation systems
Confidential
Data Analyst
Responsibilities:
- Extracted customer reviews from Excel and utilized Spark SQL to perform SQL queries to do data analysis.
- Used text-mining process to implement customer reviews and determine what service that customers were more likely to be focused on.
- Conducted text-mining process each month since we should update the database all the time to keep focusing on customer feedbacks.
- Delivered analysis focusing on promoting sales and providing an online A/B test.
- Application of machine learning algorithms and statistical modeling in t-test and compared the significant difference in R.
- Presented findings and data to team to improve strategies and operations.
Environment: Spark, R, SQL, text-mining process, A/B test and machine learning algorithms
Confidential, Des Moines, IA
Data Analyst
Responsibilities:
- Involved in migration of various objects like stored procedures, tables, and views from various data source to SQL Server.
- Filing the accounts receivable, accounts payable, inventory, andoperationsmanagement.
- Preparation of data and mapping of ER diagrams that send business a good understanding.
- Conducted Conceptual/Logical/Physical Modeling and coordinating with business executives.
- Involved in Data Modeling using ERwin (Logical and Physical Design of Databases).
- Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly basis.
- Presented findings and data to team to improve strategies and operations.
Environment: SQL, SQL Server, ER diagrams, Conceptual/Logical/Physical Modeling, ERwin
