We provide IT Staff Augmentation Services!

Data Scientist Resume

Santa Clara, CA


  • 8+ years of experience in building Data Science solutions using Machine Learning, Statistical Modeling, Data Mining, Natural Language Processing (NLP) and Data Visualization.
  • Scripting and Programming skills (Spark SCALA, Python, PySpark, R) and experience in operating in Big Data Pipelines (Spark, Hive, SQL engines) batch and streaming.
  • Good understanding of ensemble techniques (AdaBoost, Random forest), classification algorithms (SVM, logistic regression, Decision Trees) and clustering techniques (DB Scan, Hierarchical).
  • Know semantic technologies (RDF, OWL) and natural language processing, Text Mining, search algorithm development and development in Java/J2EE/Scala.
  • Experience in Deep Learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK, and Keras to help our customers build DL models.
  • Experience using SparkML and Amazon Machine Learning (AML) to build ML models.
  • Experience with platforms (Google Cloud, Azure, and AWS).
  • Ability to work efficiently under Unix/Linux environment with experience with source code management systems like GIT .
  • Ability to work with a variety of databases (SQL, ElasticSearch, Solr, Neo4j).
  • Work with DevOps consultants to operationalize models after they are built.
  • Strong technical skills on machine learning/AI with proven track record. These technical skills include, but not limited to, regression techniques, neural networks, decision trees, clustering, pattern recognition, probability theory, stochastic systems, Bayesian inference, statistical techniques, deep learning, supervised learning, unsupervised learning.
  • Experience with Open source NLP libraries - CoreNLP, OpenNLP, mallet, etc.
  • Experience with data visualization tools such as Tableau, Quicksight, Qlick.
  • Exposure to big data tools and cloud technologies such as AWS.
  • Knowledge of applied mathematics (probability and statistics, multivariable calculus, linear algebra, ordinary and partial differential equations, stochastic processes, graph theory).
  • Analytical, creative, and innovative approach to solving problems.
  • Strong written and verbal communication, positive and energetic attitude.
  • Work with AI and big data technologist/architect to explore new or assess known technology to solve specific natural language processing (NLP) related problems.
  • In-depth knowledge of various modeling algorithms e.g. Linear, GLMs, trees based models, neural networks, clustering, PCA, and time series models.
  • Proficiency in R (e.g. ggplot2, cluster, dplyr, caret), Python (e.g. pandas, scikit-learn, bokeh, nltk), Spark - MLlib, H20, or other statistical tools.
  • Experience with big data/advanced analytics concepts and algorithms (text mining, social listening, recommender systems, predictive modeling).
  • Assist customers by being able to deliver a ML project from beginning to end, including understanding the business need, aggregating data, exploring data, building & validating predictive models, and deploying completed models with concept-drift monitoring and retraining to deliver business impact to the organization.
  • In-depth knowledge of databases, data modeling, Hadoop, and distributed computing frameworks and experience in software development environment, Agile, and code management/versioning (e.g. git).
  • Collaborate and nurture cross-functional relationships with data scientists, data engineers, application developers and business stakeholders to understand data needs and build reliable solutions.
  • Mentor and grow a team of data scientists and help accelerate data driven culture.
  • Use AWS AI services (Personalize), ML platforms (SageMaker), and frameworks (MXNet, TensorFlow, PyTorch, SparkML, scikit-learn) to help our customers build ML models.
  • Experience writing code in Python, R, Scala, Java, C++ with documentation for reproducibility.
  • Experience handling terabyte size datasets, diving into data to discover hidden patterns, using data visualization tools, writing SQL, andworking with GPUs to develop models
  • Experience writing and speaking about technical concepts to business, technical, and lay audiences and giving data-driven presentations.
  • Develop data, statistical, and ML models across a broad spectrum of topics, including both causal inference and predictive models.
  • Stay up-to-date and introduce latest trends from technology and data science research.
  • Conduct scalable data research off, and ultimately on, the cloud.
  • Implement automated processes for efficiently producing scale models.
  • Act as a liaison and collaborates with other business development groups across the analytics lifecycle (conceptualization, data collection, analysis, and recommendations) or when subject matter expertise is required.
  • Collect and preprocess data with hands-on work to explore, train or re-train models to solve domain specific problems and provide solution for our products.
  • Customize and optimize models with various techniques (quantization/binarization, pruning, etc) to reduce model size and computation cycles while keeping similar performance.
  • Deep technical skills, consulting experience, and business savvy to interface with all levels and disciplines within the organization.
  • Demonstrable track record of dealing well with ambiguity, prioritizing needs, and delivering results in a dynamic environment.
  • Ability to work in a rapidly changing and fast paced environment.
  • Self-motivated, highly organized, and able to prioritize and manage multiple tasks.


Languages: C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R, Python 2.x/3.x, Java, C, SQL, Shell Scripting

NO SQL Databases: Cassandra, HBase, MongoDB, Maria DB

Statistics: Hypothetical Testing, ANOVA, Confidence Intervals, Bayes Law, MLE, Fish Information, Principal Component Analysis (PCA), Cross-Validation, correlation.

BI Tools: Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse

Algorithms: Logistic Regression, Random Forest, XG Boost, KNN, SVM, Neural Network rk, Linear Regression, Lasso Regression, K-means.

Big Data: Hadoop, HDFS, HIVE, PuTTy, Spark, Scala, Sqoop

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies


Confidential - Santa Clara, CA

Data Scientist


  • Collected, collated and carried out complex data analysis in support of management&bank requests and shared statistical findings across teams.
  • Derive insights from data and apply advanced statistical methods to model user behavior, media performance, and attribution.
  • Solved business problems including segmenting customers by purchasing behavior, modelling customer profitability and lifetime, forecasting financial metrics on the scale of months or years, predicting win/loss rates in contract negotiations.
  • Leverage multiple modeling techniques from regression to classification to segmentation.
  • Deploy ML solutions (ie - new bidding models ) into production: supervised, unsupervised, reinforcement and deep learning.
  • Lead and develop marketing experiments to create and optimize campaigns, using a variety of causal inference techniques
  • Work with various data owners to discover and select available data from internal sources and external vendors (e.g. lending system, payment system, external credit rating system, and alternative data) to fulfill analytical needs.
  • Apply scripting / programming skills to assemble various types of source data (unstructured, semi-structured, and structured) into well-prepared datasets with multiple levels of granularities (e.g., demographics, customers, products, transactions).
  • Build customer journey analytic maps and utilize NLP to enhance the customer experience and reduce customer friction points.
  • In partnership with Marketing, utilize Machine Learning to improve customer retention and product deepening across all SVB'sFinancial products, including mortgages, cards and auto.
  • Works closely with other model risk governance, credit bureaus, external consultants and compliance and regulatory response to ensure proper development and installation of performance reporting and model tracking.
  • Leverage a broad stack of technologies — Python, Docker, AWS, Airflow, and Spark to reveal the insights hidden within huge volumes of numeric and textual data.
  • Build machine learning processes that monitor and provide feedback on how or where to improve predictive models already deployed.
  • Run SQL code to assess, clean, validate and analyze large datasets.
  • Define digital data pipelines and reporting infrastructure for the BI and marketing stakeholders to drive measurable results through data-informed insights.
  • Analyzed and processed complex data sets using advanced querying, visualization and analytics tools.
  • Developing and executing statistical and mathematical solutions to business problems. Framing problem, developing roadmap, and communicating intended approach and quantitative methods to develop solution.
  • Improving products and services or solving problems using best practice and knowledge of internal and or external business issues.
  • Analyze internal processes to identify opportunities for improvement, as well as devise and implement new innovative workflow solutions to improve the time to market/quality/create efficiency in your dataset
  • Build database schema and maintain workflow configurations for critical functions such as acquisition, data extraction, data loading, and quality control
  • Contribute to the creation of best practices and guidelines for governance
  • Using analytical rigor and statistical methods to analyze large amounts of data, extracting actionable insights using advanced statistical techniques such as data analysis, data mining, optimization tools, and machine learning techniques and statistics.
  • Work with data users to determine, create, and populate optimal data architectures, structures, and systems. Plan, design, and optimize data throughput and query performance.
  • Participate in the selection of backend database technologies (SQL, NoSQL, HPC), their configuration and utilization, and the optimization of the full data pipeline infrastructure to support the actual content, volume, ETL, and periodicity of data to support the intended kinds of queries and analysis to match expected responsiveness.
  • Gather and assess business information needs and prepare system requirements.
  • Interact with user community to develop and produce reporting requirements.
  • Responsible for prototyping solutions, preparing test scripts, and conducting tests and for data replication, extraction loading, cleansing, and data modeling for data warehouses.
  • Maintain knowledge of software tools, languages, scripts, and shells that effectively support the data warehouse environment in different operating system environments.
  • Designed and developed interactive and user friendly applications using Rshiny.
  • Explored and Implemented various algorithms like kmean, Apriori, PageRank and kNN using R.
  • Automated reporting process by fetching data from many data sources like excel, hadoop, mongodb, Oracle, SqlServer and generating reports and plots using R and Python.
  • Explored and compared various Data Mining tools like Weka, Rattle, Rapid Miner.
  • Developed Tableau visualizations and dashboards using Tableau desktop.
  • Hands on experience with importing and exporting data from Relational databases to HDFS, Hive using Spark.
  • Knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker,Name Node, Data Node
  • Analyzed and understood the ETL work flows developed.
  • Designed and Developedreports, applied transformation for the Data Model, Data validation, established data relationships in Power BI and created supporting documentation for Power BI.
  • Organize business needs into ETL/ELT logical models and ensure data structures are designed for flexibility to support scalability of business solutions.
  • Craft and implement data pipelines utilizing Glue, Lambda, Spark, Python.
  • Work with Data Engineers to determine how to best source data, including identification of potential proxy data sources, and designbusiness analytics solutions, considering current and future needs, infrastructure and security requirements, load frequencies, etc.

Environment: s:Python, PyCharm, Keras, Tensorflow, Jupyter Notebook, Spyder, R, Tableau, Power BI, AWS, MySQL.


Data Scientist


  • Developing and deploying predictive models based on historical data that provide future predictions about customer experience.
  • Building forecasts, recommendations and strategic/tactical plans based on applying data science techniques to business data.
  • Testing, validating and extending advanced image processing algorithms, preparing and extending processing pipelines, and applying these pipelines on magnetic resonance brain imaging datasets from various studies involving both structural and functional data.
  • Perform exploratory data analysis and predictive modeling in pediatric biomedical research using machine learning, statistical, and mathematical analysis incorporating heterogeneous and complex data type independently based on milestone assignments from senior staff.
  • Contribute to assessing and implementing computational, algorithmic, and predictive analytics approaches to address assigned biomedical research questions in areas such as clinical decision support and population health surveillance.
  • Contribute to the experimental design, execution, test and critical evaluation of methods as applied to translational data science research projects.
  • Use a flexible, analytical approach to extract optimal value from biomedical data.
  • Contribute to design and conduct of continuous validation plans for production systems that incorporate models and algorithms, providing guidelines and support for large-scale implementation.
  • Contribute to the creation, adoption, and adherence to best practice methodologies for performing data analysis and predictive modeling experiments.
  • Work with Digital Engineering tasks including use of Natural Language Processing (NLP), Artificial Intelligence (AI), and Machine Learning (ML) methods, techniques, and tools. This includes technically working on multiple projects in a matrix environment while creating new programs and starting new projects.
  • Locate and extract data from new data sources, finds new uses for existing data sources, structures and drives new data collection efforts, andprovides recommendations on scaling new methods more broadly.
  • Integrate multiple systems to build large and complex data sets and make them usable (e.g., transforming and cleaning data, working incomplete data sources, implementing and validating quality procedures).
  • Collect and organize final quantitative and imaging results, perform quality control on the results, and communicate with clinical collaborators for the dissemination and discussion of the results.
  • Showing consistent exercise of independent judgment and discretion in matters of significance.
  • Serving as a team leader within a work group or on cross-functional teams as well as mentoring and training junior team members.

Environment: s:Python (Scikit-Learn/Keras/Tensorflow/Scipy/Numpy/Pandas/ Matplotlib/Seaborn), Machine Learning (Linear and Non-linear Regressions, Deep Learning, SVM, Decision Tree, Random Forest, XGboost, Ensemble andKNN), MySQL, AWS RedShift, S3, Hadoop Framework, HDFS, Spark (Pyspark, MLlib, Spark SQL), Tableau Desktop and Tableau Server.


Data Scientist


  • Responsible for exploring and examining data, for use in predictive and prescriptive modelling to deliver insights to the business stakeholders and support them to make better decisions
  • Influenced product and strategy decisions by visualization of data and enable change through extensive knowledge of the business drivers that make the product successful.
  • Answered complex business questions by using appropriate statistical techniques on available data or designing and running experiments to gather data.
  • Adapted standard or develop novel applications of classification forecasting, simulation, optimization, and summarizationtechniques.
  • Created an optimize an easily understood experimentation framework that can be leveraged for measurement across any digital experience. Ensure alignment on approach across Finance, Marketing Analytics, Product Analytics and digital business leadership
  • Educated partners across all teams involved with experimentation to ensure they understand the approach and their role in it.
  • Created and maintained a single view of all experiments, both server-side and client-side so that everyone has a clear view of the totality of experimentation happening across the digital experiences. Leverage this to also create an user-friendly library of experimentation results
  • Provided thought-leadership and guidance on how to evolve our current experimentation processes and tools. As needed, work cross-functionally to lead RFPs for new tools to move our capabilities to the next level
  • Partnered with Product Analytics and Finance, develop an approach to more consistently measure the increment impact of new features over time
  • As needed, provided guidance to Product Analysts in their design, execution and analysis of A/B and multi-variate tests, particularly those which are more complex and can benefit from your expertise
  • Worked closely with cross functional teams to encourage best practices for experimental design and data analysis.
  • Manage modelling projects, identify requirements and build tools that are statistically grounded, explainable and able to adapt to changing attributes.
  • Worked closely with cross functional groups to enhance analytical capabilities, identify new growth opportunities, and develop persuasive interactive visualizations.
  • Develop and deploy different analytical building blocks that support the requirements of the capability area and customer needs
  • Use root cause research to identify process breakdowns and provide data through the use of various skill sets to find solutions to the breakdown
  • Work closely together with other data scientists and cross functions to produce all required design specifications and ensure that data solutions work together and fulfill business needs
  • Optimized data collections procedures and generated reports on weekly, monthly, and quarterlybasis.
  • Developed SQL Reports using advanced queries in OLTP.
  • Coordinate with Marketing and Portfolio Managers to understand business objectives, identify best business practices, and to develop analytic plans to drive results
  • Work with Marketing to develop innovative go to market strategies including targeting and segmentation strategies, model use and strategy, and develop robust tracking and performance measurement processes
  • Work directly with channel managers to optimize direct response marketing (Direct Mail, E-Mail, Digital) and OB/IB Sales channels
  • Be responsible for conducting rigorous data analysis to help improve customer experiences and identify growth opportunities for the Business Financing Solutions team
  • Work cross-functionally with and provide analytic support to Product, Risk, Engineering, and Marketing towards a common business lending initiative
  • Identify product metrics and monitor performance through reports / dashboards and ad hoc analysis
  • Develop a robust testing agenda to ensure continued evolution of marketing programs and offers.
  • Wrote several stored procedures, functions and cursors to build consistent reports for the sales
  • Assisted in designing and developingtechnical architecture, requirements and statistical models.
  • Worked on Data Analysis&Visualization using R packages like tidyverse, dplyr, ggplot2, ggvis, shiny, shinydashboard), Adv.Excel, PivotTables & Charts, Solver, Complex formulas, Data Analysis, Python, Lookups, SAS,Tableau (integration with R).
  • Integrated R with Mongodb, Hadoop, .Net, Java.
  • Responsible for creating Test Data based on test cases for Unit and Integration
  • Testing and documenting the results for future reference.
  • Did black box testing to find out Invalid data, Boundary Condition, Decision Table Data Set, No data.

Environment: Python, Hive, Oozie, Tableau, HTML5, CSS, XML, MySQL, JavaScript, AWS, S3, EC2, Linux, Jupyter Notebook, RNN, ANN, Spark, Hadoop, Machine Learning, Deep Learning, R, TensorFlow, Spark 2.2, Scala, Linux, Spark SQL.

Hire Now