We provide IT Staff Augmentation Services!

Data Scientist Resume

Sunnyvale, CA


  • 10+ years of software industry experience as Data Scientist in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Exploration, Feature Engineering, Predictive modeling, Data Visualization and Algorithm Development
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Developed predictive models using Decision Tree, Random Forest, Naive Bayes, Logistic Regression, Cluster Analysis, Neural Networks, and ensemble methods like bagging, boosting to improve the efficiency of the predictive model and good knowledge on Recommender Systems.
  • Worked on NLP, Text Mining and sentiment analysis for extracting the unstructured data from various social Media platforms like Facebook, Twitter and Reddit
  • Skilled in Advanced Regression Modelling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts
  • Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Well adapted in Statistical Programming Languages and adept at writing code in R, Python and cloud platform as AWS ML and Azure ML
  • Extensively worked on using major statistical analysis tools such as R, SQL, Python, Advanced Excel, and MATLAB
  • Experience in designing stunning visualizations using Tableau and ggplot2 for publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Strong SQL programming skills, with experience in working with functions, packages, triggers and stored - procedures
  • Skilled in using dplyr and pandas in R and Python for performing Exploratory data analysis.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Quick learner of various new technical concepts in machine learning/data science/deep learning field
  • Excellent track record in delivering quality software on time to meet the business priorities (4 s)


Python (7+ years): Pandas, Numpy, Scikit-learn, data cleaning and imputation, creating machine learning pipelines, model selection and evaluation

SQL (7+ years): Designing and querying Relational Databases, Tables, Relations, Joins, Grouping, Stored Procedures, Functions, Triggers and Indexes Using PostgreSQL, SQLite and pgAdmin3

Visualization and Reporting (5+ years): Matplotlib, Tableau, ggplot2, dplyr, tidyR

Business Analytics (3+ years): Google Analytics, Google Adwords, Jira

Machine Learning (6+ years): Prediction with Lasso, Ridge, Linear and Logistic, classification with KNN, Decision Trees, Random Forest and gradient descent, clustering and time series analysis, CNN, RNN, LSTM, GRU

Model and Feature Evaluation (4+ years): ROC, cross-validation, bootstrapping, PCA, grid-search, A/B split testing

Deep Learning (1+ years): Familiar with TensorFlow, Distributed machine learning and running models on GPUs

Data Engineering (15+ years): SSIS, ETL, DTS

Software Development Life Cycle (8+ years): Natural Language Processing and Topic Modeling (4+ years) Sentiment analysis, TF-IDF, NLTK


Programming Languages: Python (scikit-learn, pandas, numpy, scipy), R, Java, C++, C, SAS, R, SQL

Softwares/Tools: PostgreSQL, LIBSVM, ggplot, dplyr, weka-tool

Cloud Services: AWS S3, EC2, Lambda, DynamoDB, ElastiCache, RDS, SNS, CloudWatch

Applications: Tableau, R Studio, Matlab, MS Excel

Data Visualization: R, Python, Weka, Azure ML, Tableau

Machine learning Algorithms: Classification, KNN, Regression, Random Forest, Clustering(K-means), Neural Nets, SVM, Bayesian Algorithm, Social Media Analytics, Sentimental analysis, Market Base Analysis, Bagging, Boosting

Domain Knowledge: Banking, Finance, Insurance, Healthcare, Energy


Confidential, Sunnyvale, CA

Data Scientist


  • Developed an asynchronous event based microservices based system for Confidential that can serve millions of sports enthusiastic fans
  • Worked on data mining, data cleansing and transformation of game data prior to the building machine learning models
  • Operationalizing machine learning models and ad hoc analysis in R using micro services
  • Built number of intelligent features powered by advanced data-analytics in very short period of time
  • Designed effective analytical approach to streamline game analysis to significantly cut-down the computation time by 50%
  • Worked closely with CTO on data modeling, Confidential 's data pipeline and data analytics
  • Scaled Confidential platform to process millions of game data events in an AWS environment using EC2, S3, lambda functions, SNS, etc.
  • Big data analytics with Spark, Kafka, Hadoop, Hive and Scala functional programming
  • Applications of machine learning algorithms, including random forest and boosted tree, SVM, neural network, and deep learning using CNTK and Tensorflow
  • Performed Data preparation on a High dimensional (Big data with large volume and variety) Data sample collected from the live customer data.
  • Hands on deep learning packages and libraries: Caffe, Tensorflow, Theano, Keras, numpy etc
  • Implemented a deep convolutional neural network (CNN) to identify images and also familiar with RNN, LSTM and GRU

Environment: R, Python, Machine Learning, SQL, AWS, Postgres, Tableau, Data Mining, TensorFlow, Hadoop, Spark, Scala, MapReduce, A/B Testing, Caffe, Azure ML, CNN, RNN, LSTM, GRU

Confidential, Sunnyvale, CA

Senior Software Engineer


  • Played key role in adding machine learning intelligence to Confidential network and security products
  • Designed and built machine learning solutions for traffic prediction and outlier detection
  • Only Employee from Confidential APAC to give a tech talk for Connect' 15 at Confidential HQ
  • Technology and acquisition consultant to senior executives who looks into core Machine Learning companies
  • Collaborated with CTO and chief architect on Machine Learning architecture that enables dynamic security rules
  • Exploratory data analysis and report generation using python visualisation libraries (Seaborn, Matplotlib) which led to streamlining the sales process and finding the issues, reduction of cost and increase of revenue
  • Mentored and trained both entry level and mid-level career employees

Environment: R, Python, Machine Learning, AWS, Azure ML, SQL, Postgres, Tableau, Data Mining, Spark, Tensor Flow, Caffe


Software Development Engineer


  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Responsible for design and development of advanced R/Python programs to transfrom dataset in preparation for modeling.
  • Provided R/SQL programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
  • Worked in a team of programmers and data analysts to develop insightful deliverables that support data-driven marketing strategies.
  • Retrieving data from database through SQL as per business requirements.
  • Experience on coding in Java, C, C++ programming languages.
  • Manipulation of Data using BASE SAS Programming

Environment: Java, Python, Data Mining, Machine Learning, Matlab, SQL, R, Postgres, C++, C


Data Analyst


  • Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
  • Developed new hybrid statistical and data mining technique known as hidden decision trees and hidden forests
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Scraped, merged and cleaned data from different websites for events and business opportunities

Environment: Informatica 9.0, Java, Python, R, ODS, OLTP, Oracle 10g, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, PL/SQL

Hire Now