We provide IT Staff Augmentation Services!

Data Scientist Resume

Atlanta, GA

SUMMARY:

  • Data Scientist/Data Analyst with 8+ years of experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data and expertise working in a variety of industries including Retail and Communication Industries.
  • Expert in Data Science process life cycle: Data Acquisition, Data Preparation, Modeling (Feature Engineering, Model Evaluation) and Deployment.
  • Equipped with experience in utilizing statistical techniques which include hypothesis testing, Principal Component Analysis (PCA), ANOVA, sampling distributions, chi - square tests, time-series analysis, discriminant analysis, Bayesian inference, multivariate analysis
  • Efficient in preprocessing data including Data cleaning, Correlation analysis, Imputation, Visualization, Feature Scaling and Dimensionality Reduction techniques using Machine learning platforms like Python Data Science Packages (Scikit-Learn, Pandas, NumPy).
  • Expertise in building various machine learning models using algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Support Vector Machines (SVM), Decision trees, KNN, K-means Clustering, Ensemble methods (Bagging, Gradient Boosting).
  • Experienced in tuning models using Grid Search, Randomized Grid Search, K-Fold Cross Validation.
  • Strong Understanding with artificial neural networks, convolutional neural networks, and deep learning.
  • Skilled in using statistical methods including exploratory data analysis, regression analysis, regularized linear models, time-series analysis, cluster analysis, goodness of fit, Monte Carlo simulation, sampling, cross-validation, ANOVA, A/B testing, etc.
  • Applied statistical and econometric models on large datasets to measure results and outcomes and identified causal impact and attribution, predicted future performance of users or products.
  • Used effective project planning techniques to break down basic and occasionally moderately complex projects into tasks and ensure deadlines are kept.
  • Communicated data-driven insights and deliver action plans that steer business strategy and decision-making for one or more business segments.
  • Collaborated with the team in order to improve the effectiveness of business decisions through the use of data and machine learning/predictive modeling
  • Familiar with key data science concepts (statistics, data visualization, machine learning, etc.). Experienced in Python, R, MATLAB, SAS, PySpark programming for statistic and quantitative analysis.
  • Formulated new and evolve existing methodologies to provide more accurate statistical outputs to help enhance and improve recommendations to the business.
  • Knowledge on Time Series Analysis using AR, MA, ARIMA, GARCH and ARCH model.
  • Experience in building production quality and large-scale deployment of applications related to natural language processing and machine learning algorithms.
  • Experience with high performance computing and building real-time analysis with Kafka and Spark Streaming. Knowledge using Qlik, Tableau, and Power BI.
  • Provided team members with testing, optimization and statistics support regarding design of experiments, recommendations and expertise to develop testing parameters.
  • Collaborate with our Sales, Operations, and Tech teams to provide analytic and machine learning support.
  • Extensive experience working with RDBMS such as SQL Server, MySQL, and NoSQL databases such as MongoDB, Generated data visualizations using tools such as Tableau, Python Matplotlib, Python Seaborn, R.

TECHNICAL SKILLS:

Data Sources: MS SQL Server, Hive, MySQL, Teradata

Statistical Methods: Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Time Series, Correlation (Chi-square test, covariance), Multivariate Analysis, Bayes Law.

Machine Learning: Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), K-Means Clustering, K-Nearest Neighbors (KNN), Random Forest, Gradient Boosting Trees, Ada Boosting, PCA

Hadoop Ecosystem: Hadoop, Spark, MapReduce, Hive QL, HDFS

Data Visualization: Tableau, Python (Matplotlib, Seaborn), R(ggplot2), Power BI, QlikView, D3.js

Languages: Python (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R, SQL, MATLAB, Spark, Java, C

Operating Systems: UNIX Shell Scripting (via PuTTY client), Linux, Windows, Mac OS.

PROFESSIONAL EXPERIENCE:

Confidential, Atlanta, GA

Data Scientist

  • Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed data analysis by retrieving the data from the Hadoop cluster.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Explored and analyzed the customer specific features by using Matplotlib in Python and ggplot2 in R.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Python (NumPy, SciPy, pandas, Scikit-learn, seaborn) and R to develop a variety of models and algorithms for analytic purposes.
  • Experimented and built predictive models including ensemble models using machine learning algorithms such as Logistic regression, Random Forests, and KNN to predict customer churn.
  • Conducted analysis of customer behaviors and discover the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.
  • Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different models’ performance.
  • Designed and implemented a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Technology Stack: Hadoop, HDFS, Python, R, Tableau, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), JIRA, GitHub, Agile/ SCRUM

Confidential, CA

Data Scientist

  • Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.
  • Built machine learning models to identify whether a user is legitimate using real-time data analysis and helped to flag fraud users based on the historical data using supervised learning techniques.
  • Extracted data from SQL Server Database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
  • Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
  • Utilized PCA, t-SNE and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of scikit-learn library
  • Developed various machine learning models such as Logistic regression, KNN, and Gradient Boosting with Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python.
  • Used cross-validation to test the model with different batches of data to find the best parameters for the model and optimized, which eventually boosted the performance.
  • Formulated new and evolve existing methodologies to provide more accurate statistical outputs to help enhance and improve recommendations to the business.
  • Experimented with Ensemble methods to increase the accuracy of the training model with different Bagging and Boosting methods.
  • Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.

Technology Stack - Machine Learning, Python (Scikit-learn, SciPy NumPy, Pandas, Matplotlib, Seaborn), SQL Server, Hadoop, HDFS, Hive, Pig Latin, Apache Spark/PySpark/MLlib, GitHub, Linux, Tableau.

Confidential, Duluth GA

Data Scientist

  • Lead the development of large and complex machine learning applications that drove significant business value.
  • Developed and operationalized new data system components and designed ML model to predict the sales
  • Queried and retrieved data from SQL Server database to get the sample dataset and cleaned data from different sources by automating the cleaning process using Python scripts
  • In preprocessing phase, used Pandas to clean all the missing data, data type casting and merging or grouping tables for EDA process.
  • Used PCA and other feature engineering, feature normalization and label encoding Scikit-Learn preprocessing techniques to reduce the high dimensional data (> 150 features)
  • In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the buying pattern of the customers.
  • Experimented with predictive models including Support Vector Machine (SVM), Random Forest provided by Scikit-Learn, XGBoost, LightGBM to predict showing inventory for 6 months in advance.
  • Experimented with Machine Learning models and techniques, making appropriate trade-offs between performance and complexity.
  • Gathered project requirements, and translated into data design documents, and partnered with data engineering to integrate into existing systems using technologies like Python and SQL
  • Presented prediction model results (accuracy of 78%) to a variety of internal audiences with appropriate regard for the level of technical detail
  • Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the model’s significance.
  • Provided feedback to senior company leadership on strategic direction and high-impact initiatives based on the exploratory data analysis performed.

Technology Stack: SQL Server 2012/2014, Linux, Python 3.x (Scikit-Learn, NumPy, Pandas, Matplotlib), Machine Learning algorithms, Tableau.

Confidential

Data Analyst

  • Created Database in Microsoft Access by using a blank database and create tables and entered dataset manually and data types, performed ER Diagram and Basic SQL Queries on that database.
  • Used Microsoft Excel for formatting data as a table, visualization and analyzing data by using certain methods like Conditional Formatting, Remove Duplicates, Pivot and Unpivot tables, Created Charts and Sort and Filter Data Set
  • Wrote application code to do SQL queries in MySQL and to organize useful information based on the business requirements
  • Applied concepts of probability distribution and statistical inference on the given dataset to unearth interesting findings using comparison, T-test, F-test, R-squared, P-value etc.
  • Performed Statistical Analysis and Hypothesis Testing in Excel by using Data Analysis Tool
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting
  • Integrated data from disparate sources, mined large data set to identify patterns using predictive analysis
  • Conducted intermediate and advanced statistical analysis, such as linear regression, ANOVA, time-series analysis, classification models, and forecasting future sales.
  • Created Entity Relationship Diagrams and Data mapping for a better understanding of the dataset
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs
  • Creating customized business reports and sharing insights into the management
  • Identified patterns, data quality issues, and opportunities and leveraged insights by communicating opportunities with business partners.
  • Performed module specific configuration duties for implemented applications to include establishing role-based responsibilities for user access, administration, maintenance, and support
  • Worked closely with internal business units to understand business drivers and objectives which can be enabled through effective application deployment

Technology Stack: SQL Server, Tableau, Excel, SQL server management studio, Microsoft BI Suite, SQL, Visual Studio.

Hire Now