We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Highly efficient Data Scientist with 6+ years of experience in Statistical Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data in banking, travel services, and manufactory industries.
  • Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.
  • Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Experienced in Machine Learning and Statistical Analysis with Python Scikit - Learn.
  • Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.
  • Worked with complex applications such as R, SAS, Matlab and SPSS to develop neural network, cluster analysis.
  • Strong SQL programming skills, with experience in working with functions, packages and triggers.
  • Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.
  • Worked with RDBMS including MySQL, DB2 and Oracle SQL.
  • Worked with NoSQL Database including Hbase, Cassandra and MongoDB.
  • Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
  • Experienced in Spark 1.6, Spark SQL and PySpark.
  • Experienced inData IntegrationValidationandData Qualitycontrols forETLprocess andData WarehousingusingMS Visual Studio SSIS, SSAS, SSRS.
  • Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
  • Worked in development environment like Git and VM.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS

  • Python 2.x/3.x, R 3.x, SQL, SAS 9.x, Visual \Hadoop 2.x, MapReduce, HBase, Spark 2.x,
  • Basic for Applications, VB.NET, SPSS, \PySpark
  • Minitab, JMP
  • MySQL 5.x, Oracle SQL, HBase 0.98, \Pandas, numpy, scipy, scikit-learn, matplotlib
  • Cassandra, MongoDB 3.x\ggplot, ggplot2, dplyr, plyr, gsub, Spark SQL
  • Regression analysis, classification, K-Means \Regression Models, Confidence Intervals
  • Clustering, Bayesian Methods, Decision \Bayes Law, Principal Component Analysis
  • Trees, Random Forests, Support Vector \(PCA), Cross-Validation, Analysis of variance
  • Machines, neural networks, Logistic \ANOVA, Z-test, T-test, Hypothetical testing
  • Regression, Data Mining Methods, Factor \Normal distribution
  • Analysis, Cluster Analysis, recommendation

PROFESSIONAL EXPERIENCE

Confidential, New York, NY

Data Scientist

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
  • Collected, exported, merged and massaged data from multiple sources and platforms, and SQL Server to meet the analytical requirements.
  • Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongDB connector for Hadoop.
  • Performed data processing in PySpark.
  • Performed data cleaning and feature selection using Numpy and Pandas packages in Python.
  • Performed partitional clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
  • Used Python to perform ANOVA test to analyze the differences among hotel clusters.
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
  • Determined the most accurately prediction model based on the accuracy rate.
  • Used text-mining process of reviews to determine customers’ concentrations.
  • Delivered analysis support to hotel recommendation and providing an online A/B test.
  • Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
  • Developed hybrid model to improve the accuracy rate.
  • Delivered the results to operation team for better decisions and feedbacks.

Environment: Python, PySpark, Tableau, MongoDB, Hadoop, SQL Server, SDLC, recommendation systems, Machine Learning Algorithms, text-mining process, A/B test

Confidential, Wilmington, DE

Data Scientist

Responsibilities:

  • Participated in all phases of research including data collection, data cleaning, data mining, developing models and visualizations.
  • Collaborated with data engineers and operation team to collect data from internal system to fit the analytical requirements.
  • Redefined many attributes and relationships and cleansed unwanted tables/columns using SQL queries.
  • Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
  • Performed data imputation using Scikit-learn package in Python.
  • Performed data processing using Python libraries like Numpy and Pandas.
  • Worked with data analysis using ggplot2 library in R to do data visualizations for better understanding of customers’ behaviors.
  • Visually plotted data using Tableau for dashboards and reports.
  • Implemented statistical modeling with XGBoost machine learning software package using R to determine the predicted probabilities.
  • Delivered the results with operation team for better decisions.

Environment: Python, R, SQL, Tableau, Spark, Machine Learning Software Package, recommendation systems

Confidential

Data Analyst

Responsibilities:

  • Extracted customer reviews from Excel and utilized Spark SQL to perform SQL queries to do data analysis.
  • Used text-mining process to implement customer reviews and determine what service that customers were more likely to be focused on.
  • Conducted text-mining process each month since we should update the database all the time to keep focusing on customer feedbacks.
  • Delivered analysis focusing on promoting sales and providing an online A/B test.
  • Application of machine learning algorithms and statistical modeling in t-test and compared the significant difference in R.
  • Presented findings and data to team to improve strategies and operations.

Environment: Spark, R, SQL, text-mining process, A/B test and machine learning algorithms

Confidential, Des Moines, IA

Data Analyst

Responsibilities:

  • Involved in migration of various objects like stored procedures, tables, and views from various data source to SQL Server.
  • Filing the accounts receivable, accounts payable, inventory, andoperationsmanagement.
  • Preparation of data and mapping of ER diagrams that send business a good understanding.
  • Conducted Conceptual/Logical/Physical Modeling and coordinating with business executives.
  • Involved in Data Modeling using ERwin (Logical and Physical Design of Databases).
  • Optimized data collection procedures and generated reports on a weekly, monthly, and quarterly basis.
  • Presented findings and data to team to improve strategies and operations.

Environment: SQL, SQL Server, ER diagrams, Conceptual/Logical/Physical Modeling, ERwin

We'd love your feedback!