We provide IT Staff Augmentation Services!

Data Scientist Resume

Atalanta, GA


  • Over 7 years of experience in Machine Learning, Datamining with large Data Sets of Structured and Unstructured Data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping Statistical Modeling, Data Mining and Natural Language Processing (NLP)
  • Adept in statistical programming languages like Python and R including Bigdata technologies like Hadoop and Hive
  • Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, Engineering, features scaling, features engineering, statistical modeling (Decision Trees, Regression Models, Neural Networks, Support Vector Machine (SVM), Clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and Data Visualization
  • Adept and deep understanding of Statistical Modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining and reporting solutions that scales across massive volume of structured and unstructured Data
  • Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods including describe Data contents, compute descriptive statistics of Data, regex, split and combine, remap, merge, subset, reindex, melt and reshape
  • Experience in using various packages in Python and R like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2
  • Extensive experience in Text Analytics, generating Data Visualization using Python and R creating dashboards using tools like Tableau
  • Hands on experience with Big Data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql
  • Hands on experience in implementing LDA, NaiveBayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis
  • Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary Data for analysis from different sources, prepared Data for Data exploration using Data munging
  • Good industry knowledge, analytical & problem solving skills and ability to work well with in a team as well as an individual
  • Highly creative, innovative, committed, intellectually curious, business savvy with good communication and interpersonal skills
  • Deep understanding of MapReduce with Hadoop and Spark. Good knowledge of Big Data ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Spark (SparkSql, Spark MILib, Spark Streaming)
  • Excellent performance in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau
  • Good understanding of web design based on HTML5, CSS3, and JavaScript
  • Excellent understanding of Systmes Development Life Cycle (SDLC), Agile, Scrum and waterfall
  • Experience with version control tool - Git
  • Effective team player with strong communication and interpersonal skills, possess a strong ability to adapt and learn new technologies and new business lines rapidly
  • Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau


Languages: Python, R, C, C++

Machine learning: Linear Regression, Logistic Regression, Naïve Bayes, SVM, Decision Trees, Random Forest, Boosting, Kmeans, Bagging etc.

Machine Learning Library: pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-Learn.

Deep Learning Frameworks: Tensor flow, Keras

Data Analysis and Visualization: Numpy, Pandas, MatPlotLib, Seaborn, Sci-kit learn, Excel, Tableau

Databases: Oracle, MySQL, SQL Management Studio

Front End Technologies: CSS, HTML, XML, JSON and jQuery

Environment: s: Jupyter, R Studio, Anaconda, Spyder, Python Console, Pycharm


Confidential - Atalanta, GA



  • Analyze and Prepare data, identify the patterns on dataset by applying historical models. Collaborating with Senior Data Scientists for understanding of data
  • Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating model in Python and R
  • This project was focused on customer segmentation based on machine learning and statistical modelling effort including building predictive models and generate data products to support customer segmentation
  • Used Python and R for programming for improvement of model. Upgrade the entire models for improvement of the product
  • Develop a pricing model for various product and services bundled offering to optimize and predict the gross margin
  • Built price elasticity model for various product and services bundled offering
  • Under supervision of Sr. Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables
  • Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc
  • Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimentation ML/DL algorithms and integrating into production system for different business needs
  • Worked on Multiple datasets containing two billion values which are structured and unstructured data about web applications usage and online customer surveys
  • Good hands on experience on Amazon Red shift platform
  • Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values
  • Design, built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs
  • Segmented the customers based on demographics using K-means Clustering
  • Explored different regression and ensemble models in machine learning to perform forecasting
  • Presented Dashboards to Higher Management for more Insights using Power BI
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
  • Performed Boosting method on predicted model for the improve efficiency of the model
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI
  • Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports

Environment: MS SQL Server, R/R studio, SQL Enterprise Manager, Python, Red shift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office, Outlook, AS E-Miner

Confidential - West Chester, PA



  • Used various approaches to collect the business requirements and worked with the business users for ETL application enhancements by conducting various Joint Requirements Development (JRD) sessions to meet the job requirements
  • Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in Python and R
  • Build models based on domain knowledge and customer business objectives
  • Extracted data from the database using Excel/Access, SQL procedures and created Python and R datasets for statistical analysis, validation and documentation
  • Extensively understanding BI, analytics focusing on consumer and customer space
  • Innovate and leverage machine learning, data mining and statistical techniques to create new, scalable solutions for business problems
  • Performed Data Profiling to assess data quality using SQL through complex internal database
  • Improved sales and logistic data quality by data cleaning using Numpy, Scipy, Pandas in Python
  • Designed data profiles for processing, including running SQL, Procedural/SQL queries and using Python and R for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks
  • Used R to generate regression models to provide statistical forecasting
  • Conducted data/statistical analysis, generated Transaction Performance Report on monthly and quarterly basis for all the transactional data from U.S., Canada, and Latin America Markets using SQL server and BI tools such as Report services and Integrate services(SSRS and SSIS)
  • Used drill downs, filter actions and highlight actions in Tableau for developing dashboards in Tableau
  • Implemented Key Performance Indicator (KPI) Objects, Actions, Hierarchies and Attribute Relationships for added functionality and better performance of SSAS Warehouse
  • Applied Clustering Algorithms such as K-Means to categorize customers into certain groups
  • Performed data management, including creating SQL Server Report Services to develop reusable code and an automatic reporting system and designed user acceptance test to provide end with an opportunity to give constructive feedback
  • Used Tableau and designed various charts and tables for data analysis and creating various analytical Dashboards to showcase the data to managers.
  • Create a model for forecast revenue
  • Applied association rule mining & chain model to identify hidden patterns and rules in remedy ticket analysis which aid in decision making
  • Segmenting ABO population and developing demographic profile against each fragment
  • Isolating customer behavioral patterns by analyzing millions of customer data records over a period of time and correlating multiple customers' attributes
  • Empowered decision makers with data analysis dashboards using Tableau and Power BI

Environment: R/R Studio, SAS, SSRS, SSIS, Oracle Database 11g, Oracle BI tools, Tableau, MS-Excel, Python, Naive Bayes, SVM, K- means, ANN, Regression, MS Access, SQL Server Management Studio, SAS E-Miner

Confidential - Dallas, TX



  • Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional work flow of information from source systems to destination systems
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL,, Unix Commands, NoSQL, Hadoop
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python
  • Analyzed sentimental data and detecting trend in customer usage and other services
  • Analyzed and Prepared data, identify the patterns on dataset by applying historical models
  • Collaborated with Senior Data Scientists for understanding of data
  • Used Python and R scripting by implementing machine algorithms to predict the data and forecast the data for better results
  • Used Python and R scripting to visualize the data and implemented machine learning algorithms
  • Experience in developing packages in R with a shiny interface
  • Used predictive analysis to create models of customer behavior that are correlated positively with historical data and use these models to forecast future results
  • Predicted user preference based on segmentation using General Additive Models, combined with feature clustering, to understand non-linear patterns between user segmentation and related monthly platform usage features (time series data)
  • Perform data manipulation, data preparation, normalization, and predictive modeling
  • Improve efficiency and accuracy by evaluating model in Python and R
  • Used Python and R script for improvement of model
  • Application of various machine learning algorithms and statistical modeling like Decision Trees, Random Forest, Regression Models, neural networks, SVM, clustering to identify Volume using scikit-learn package
  • Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values
  • Developed a predictive model and validate Neural Network Classification model for predict the feature label
  • Performed Boosting method on predicted model for the improve efficiency of the model
  • Presented Dashboards to Higher Management for more Insights using Power BI and Tableau
  • Hands on experience in using HIVE, Hadoop, HDFS and Bigdata related topics

Environment: R/R studio, Python, Tableau, Hadoop, Hive, MS SQL Server, MS Access, MS Excel, Outlook, Power BI


Data Analyst


  • Participated in the test environment setup and in ensuring that the facilities, test tools and scripts are in place to successful perform the required testing effort
  • Acted as a liaison between the Oracle deployment team and the business finance group.
  • Interviewed various personnel including broker dealers and traders to understand the current process and the future requirements
  • Tested user interface and navigation controls of the application using QuickTest Pro
  • Handle exceptional situations in test scripts using Recovery Scenario Manager in QuickTest Pro
  • Developed Base-line scripts in VBScript for performing regression testing on future releases of the application
  • Developed test scripts in VBScript for data-driven testing. Executed the test scripts and analyzed the results
  • Developed PL/SQL Functions, Procedures, Oracle PL/SQL Programs
  • Verified the application’s functionality on different Configurations with QuickTest Pro
  • Handled dynamic Objects using regular expression in QuickTest Pro
  • Involved in both manual testing and developed automated test scripts using VBScript in QuickTest Pro
  • Maintained various versions of Test Scripts and performed various testing strategies
  • Backend testing using database checkpoints in QuickTest Pro
  • Created and maintained SQL Scripts and Unix Shell scripts to perform back-end testing
  • Responsible for communicating with a team of 10 people working offshore
  • Involved in Data Analysis, Data Modeling and Logical Data Specification
  • Involved in the Data Movement between Systems validated the Business Requirements
  • Experienced working with Agile Scrum and Waterfall Models

Environment: Oracle, Windows, UML, MS-Visio, Toad, QC 11.0, QTP 11.0, VS 2008, HP ALM

Hire Now