We provide IT Staff Augmentation Services!

Machine Learning Developer/data Scientist Resume

San Jose, CA


  • More than 4+ years of experience in Data Science, Data analysis, Machine Learning and Data visualization, Natural Language Processing, Deep Learning
  • Has solid understanding on system development methodologies and has wide knowledge on the concepts of Data Mining, Supervised Learning, Unsupervised Learning, recommendation system and association rules system.
  • Has very rich experience in working with large data sets and classification of data.
  • Can analyze and extract relevant information from large amounts of data to help automate for self - monitoring, self-diagnosing and optimize key process. Has outstanding proficiency in understanding statistical and other tools/languages such as R, Python, C, C++, MATLAB
  • Specialties: Data Mining, Natural Language Processing, Collaborative Filtering, Cosine Similarity, TF-IDF, N-Grams, Vector Space Model, Random forest, ADAboost, decision trees, Support Vector Machine, Time Series Analysis, Modeling and Forecasting, including Regression Forecasting, ARIMA, Random Walk, Spectral Analysis, Stationary Models, Multivariate Data Analysis, including Discriminate, Factor, and Cluster Analysis, Forecasting with predictors, Event and Intervention Analysis, Regression Analysis and Modeling, Bayesian Models Design and Analysis of Experiment, Neural Networks-based Modeling and Project Management.


Data Analytics Tools: Python (NumPy, SciPy, pandas, Nltk, Blob, Matplotlib, Scikit-learn, Seaborn, Plotly), R (Caret, Weka, ggplot).

Data Visualization: Tableau, Visualization packages, Microsoft Office.

Software & Tools: MS Projects, Excel, Jupyter, Matlab, SAS Enterprise Miner, Excel Miner, Rapid Miner

Databases: MySQL, Oracle SQL, Ms Access.

Languages: Python, R, C, C++

Collaborative Filtering, Cosine Similarity, TF: IDF, N-Grams, Vector Space Model, Lemmatization, Stemming

Machine Learning: Regression, Classification, Clustering, Association, Simple Linear Regression, Multiple Linear Regression, Decision Trees, Random forest, Logistic Regression, K-NN, SVM, Recommendation system, Association Rules, Apriori


Confidential, San jose, CA

Machine Learning Developer/Data Scientist


  • Collaborated with the business analyst on the requirements of the project and explored the data from the database querying (SQL) search techniques, web services etc
  • Created and deployed models that find out malwares in a machine. Performed extensive analysis on performance of machine and provided recommendations to Manager about targeting features in a machine
  • Developed useful insights about performance and difficulty of each feature
  • Preparing data using techniques like dimensionality reduction for reduction of features using (PCA, t-SNE), cleaning the data using libraries of Python
  • Applying advanced statistical techniques (Bayesian, sampling and experimental design) while performing machine learning algorithms on the heterogenous data
  • Used advanced analytical tools and programming languages such as Python (NumPy, pandas, SciPy, Scikit learn) for data analysis
  • Constructed and evaluated various types of datasets by performing machine learning models using algorithms and statistical modeling techniques such as classification, regression, anomaly detection, sequential pattern discovery, from Python libraries
  • Performing the Post pruning techniques in machine learning to reduce the complexity of the final classifier which results in improving the predictive analysis by reducing over fitting, using python libraries
  • Performing predictive analytics and machine learning algorithms especially supervised (SVM, Logistic Regression, Naïve Bayes) and Ensemble methods
  • Obtained better predictive performance of 81% accuracy using ensemble methods like Bootstrap aggregation (Bagging) and Boosting (Light GBM, Gradient)
  • Read the different data formats like API (JSON), XML, CSV, Rich Text Format (.rtf), Open Document Text (. odt), HTML (.htm, .html)
  • Visualized graphs and reports using matplotlib, seaborn and panda packages in python on datasets for analytical models to know the missing values, outliers, correlation between the features
  • Utilizing Tableau visualization software for visualizing the results of the model by transforming data into dashboards that look amazing and are also interactive
  • Creating user stories, sub tasks, epics in JIRA for the project. To track the flow of the project used Kanban board throughout different phases of lifecycle

Environment: MySQL Workbench 5.7, Python 3.6.3, Jupyter notebook 5.0.0, Tableau 10.4, JIRA, Machine Learning, Classification, Regression

Confidential, Dallas TX

Data Analyst /Machine Learning Engineer


  • Developed, tested and productionized a machine learning system for UI optimization, boosting CTR from 18% to 24% for the company’s website.
  • Built core NLP techniques (Collaborative Filtering, Cosine Similarity, TF-IDF, Correlation, N-Grams, Vector Space) to collect and analyze a large volume of data on customer’s ratings, tickets and other miscellaneous features
  • Implemented the end-end platform for performing network provider services using machine Learning classification and regression algorithms (Naïve Bayes, SVM, Logistic Regression )
  • Developed predictive models and learning algorithms to track customer lifetime value, lead scoring, retention, attribution and propensity
  • Worked closely with business partners to identify, develop, and implement targeting improvements across various network services that led to incremental revenue.
  • Implemented a predictive model that forecasts whether a session involves a click on the ad/promotion would help them extract the maximum out of the huge clickstream data of placing ads/promotions for advertising of various customer services on a plethora of web pages that they have collected.
  • Worked on different data formats such as JSON, XML and performed Machine Learning algorithms in Python.
  • Performed data preprocessing on huge data sets containing millions of rows including missing data imputation, noise and data consolidation and much more.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine Learning applications, executed Machine Learning use cases under Spark ML and Mllib.
  • Generalized feature extraction in the machine learning pipeline which improved efficiency throughout the system
  • Designed several high-performance prediction models using various packages in Python like Pandas, Numpy, Seaborn, SciPy, Matplotlib, Scikit-learn, Pandas-data reader, and Stats models.
  • Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources
  • Developed several ready-to-use templates of machine learning models based on specifications given and assigned clear descriptions of purpose and variables given as input into the model.
  • Developed architecture around models for multi-task learning, distributed training on multiple machines, and integration into consumer facing API
  • Created various types of data visualizations using Python and Tableau.
  • Involved in creating a monthly retention marketing campaign which improved customer retention rate by 15%
  • Prepared reports and presentations using Tableau, MSOffice, ggplot2that accurately convey data trends and associated analysis.

Environment: My SQL, Natural Language Processing, Text Mining, Python, Tableau, Business Objects, Hadoop, Spark, Scala, Google analytics, Data Mining, Seaborn, Regression, Cluster Analysis, Windows/Linux.

Confidential, CA

Data Analyst / Data Modeler


  • Responsible for managing a diverse team of highly skilled data scientists and analysts and articulating designs to analytic practitioners and data developers. Personalization of functions with modeling, algorithm design, data mining.
  • Determine methodologies needed and applying the relevant methodologies for Modeling, designing and developing Algorithms for clustering data.
  • Define data needs, evaluate data quality, and extract/transform data for analytic projects and research.
  • Interface closely with Marketing Systems and Operations to understand/define requirements, domain knowledge/models, and data needs.
  • Effectively communicate and document technical analyses and results.
  • Ensure analysis and solutions drive business decisions.
  • Responsible for successful design, modelling, execution of Advanced Analytics Solutions in support of large-scale data analytics for large Enterprises.
  • Help to define key business problems to be solved formulate mathematical approach using MATLAB and analyze data to solve those problems. Support key operational efficiency, product effectiveness, and growth opportunity goals by applying advanced mathematical modeling, data mining and machine learning techniques.
  • Monitored power usage creates awareness on energy consumption to consumers and found out the ways to develop Eco-friendly Smart Homes.
  • An analytics & visualization model has been created from the power consumption data collected to show our clients on the power consumption patterns of home owners.
  • A predictive model/analytics report has been built on this data to help a home owner to make decisions about altering power consumptions

Environment: R,Time Series Analysis, Regression Forecasting, Machine Learning, Holt’s Exponential, Seasonal, Trend


Data Analyst


  • Responsible for successful design, modelling, execution of Advanced Analytics Solutions in support of large-scale data analytics for large Enterprises.
  • Design, and develop innovative analytic models for analyzing large scale structured and unstructured data and gain actionable insights.
  • Design and develop predictive models, data mining, text analytics solutions including custom algorithm solutions like Recommendation Engines, Decision Support Engines
  • Design, implement, and operate comprehensive data warehouse systems to balance optimization of data access with batch loading and resource utilization factors, according to customer requirements.
  • Develop data warehouse models, including sourcing, loading, transformation, and extraction.
  • Create or implement metadata processes and frameworks.
  • Write new programs or modify existing programs to data management requirements, using current programming languages and technologies.
  • Review designs, codes, test plans, or documentation to ensure quality.
  • Formulated a model to recommend mobile phone services to a million users depending on their subscription and ratings. Implemented Machine Learning and NLP Algorithms recommend mobile network services that are similar to what a particular user preferred in the past, present and similar decisions made by other users

Environment: Python, Machine Learning, Natural Language Processing, Collaborative Filtering

Hire Now