We provide IT Staff Augmentation Services!

Data Scientist Resume

Houston, TX


  • Over 8 years of working experience in designing, building and implementing analytical and enterprise application using machine learning, Python, R.
  • Data Scientist with proven expertise in Data Analysis, Machine Learning, and Modeling.
  • Experience in applying predictive modeling and machine learning algorithms for analytical reports.
  • Experience using technology to work efficiently with datasets such as scripting, data cleaning tools, statistical software packages.
  • Developed predictive models using Decision Tree, Random Forest, Na ve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Very Strong in Python, statistical analysis, tools, and modeling.
  • Experienced in Machine Learning and Statistical Analysis with Python Scikit - Learn.
  • Strong programming skills in a variety of languages such as Python, R and SQL.
  • Valuable experience working with large datasets and Deep Learning algorithms with Tensor Flow.
  • Worked on various applications using Python integrated IDEs such as Anaconda and PyCharm.
  • Used several Python modules and controls to rapidly build the application.
  • Experience in Data Cleaning, Transformation, Integration, Data Imports and Data Exports.
  • Proficient code writing capability in a major programming language such as Python and R.
  • Experienced with machine learning algorithm such as logistic regression, random forest, Xgboost, KNN, SVM, neural network, linear regression, and k-means.
  • Good Knowledge in Data Validation, Data Cleaning, Data Verification and Identifying data mismatch.
  • Experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools.
  • Experienced with tuning parameters for different machine learning models to improve performance.
  • Interacted with various clients, teams to update and modify deliverables to meet the business needs.
  • Collaborated with of clients for requirement gathering, use-case development, business process flow and modeling.
  • Have hands on experience in applying SVM, Random Forest, K means clustering.
  • Experienced in writing complex SQL Queries like Stored Procedures, triggers, joints, and Sub Queries.
  • Extensive experience in Text Analytics, developing different Statistical Machine Learning models, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
  • Used Python to generate regression models to provide statistical forecasting and applied Clustering Algorithms such as K-Means to categorize customers into certain groups.
  • Performed data manipulation, data preparation, normalization, and predictive modeling. Improved efficiency and accuracy by evaluating model in Python.
  • Worked on SQLServer concepts SSIS (SQL Server Integration Services), SSAS (Analysis Services) and SSRS (Reporting Services).
  • Experience building and optimizing big data pipelines, architectures, and data sets Hadoop, Spark, Hive UDF, Python.
  • Experience implementing machine learning back-end pipeline Spark ML-lib, Scikit-learn, Pandas, Numpy.
  • Strong analytic skills related to working with structured and unstructured datasets Hive UDF, Pandas.
  • Effective communication and team collaboration in a dynamic environment.
  • Experience with AWS cloud services EC2, S3.


Programming Languages: Python, SQL, R

Scripting Languages: Python

Data Sources: SQL Server, Excel

Data Visualization: Tableau, Power BI, SSRS

Predictive and Machine Learning: Linear Regression, Logistic regression, Principal Component Analysis (PCA), K-means, Random Forest, Decision Trees, SVM, K-NN, Deep learning, Time Series Analysis and Ensemble methods

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Spark

Operating System: Linux, Windows, Unix.


Data Scientist

Confidential, Houston, TX


  • Responsible for data identification, collection, exploration, and cleaning for modeling, participate in model development.
  • Visualize, interpret, report findings, and develop strategic uses of data by python Libraries like Numpy, Scikit - learn, MatPlotLib.
  • Performed Data Cleaning, features scaling, features engineering.
  • Responsible for loading, extracting and validation of client data.
  • Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution.
  • Missing value treatment, outlier capping and anomalies treatment using statistical methods, deriving customized key metrics.
  • Performed analysis using industry leading text mining, data mining, and analytical tools and open source software.
  • Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured data processing.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
  • Setup storage and data analysis tools in confidential cloud computing infrastructure.
  • Dummy variables were created for certain datasets to into the regression.
  • Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
  • Worked with several R packages including knitr, dplyr, Causal Infer, space-time.
  • Built multiple features of machine learning using python and R based on need.
  • Strong skills in data visualization like matPlotLib and seaborn library.
  • Create different charts such as Heat maps, Bar charts, Line charts, etc.
  • Collaborating with team to enhance modeling process and develop project.
  • Developed predictive models using Decision Tree, Random Forest, Na ve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Hands on experience with commercial data mining tools such as Splunk, R, Map reduced, Yarn, Pig, Hive, Floop, Oozie, Scala, HBase, Master HDFS, Sqoop, Spark, Scala (Machine learning tool) or similar software required depending on seniority level in job field.

Data Scientist

Confidential, Plano, TX


  • Data mining using state - of-the-art methods.
  • Enhancing data collection procedures to include information that is relevant for building analytic systems.
  • Processing, cleansing, and verifying the integrity of data used for analysis.
  • Doing ad-hoc analysis and presenting results in a clear manner.
  • Creating automated anomaly detection systems and constant tracking of its performance.
  • Strong command of data architecture and data modelling techniques.
  • Hands on experience with commercial data mining tools such as R and Python depending on job requirements.
  • Utilizing NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
  • Having knowledge to build predictive models to forecast risks for product launches and operations and help predict workflow.
  • Having experience with visualization technologies such as Tableau.
  • Draw inferences and conclusions, and create dashboards and visualizations of processed data, identify trends, anomalies.
  • Generation of TLFs and summary reports, etc. ensuring on-time quality delivery.
  • Participated in client meetings, teleconferences and video conferences to keep track of project requirements, commitments made and the delivery thereof.
  • Solved analytical problems, and effectively communicate methodologies and results.
  • Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.
  • Foster culture of continuous engineering improvement through mentoring, feedback, and metrics.

Data Scientist

Confidential, Seattle, WA


  • Responsible for reporting of findings that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Performed Multinomial Logistic Regression, Random Forest, Decision Tree, SVM
  • Used Principal Component Analysis & Factor Analysis in feature engineering to analyze high dimensional data in Python.
  • Worked on classification/scripting of multiple attribute models by applying text - mining, NLP, SVM and Regular Expressions given product features like title, description etc. & predicting product attribute values using Python/R
  • Used R machine learning library to build and evaluate different models.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Collected data needs and requirements by interacting with the other departments.
  • Created various types of data visualizations using Python and Tableau schemas.
  • Communicated the results with operations team for taking best decisions.
  • Utilized Python to develop and execute Machine learning applications, prototyped Machine Learning use cases under SciKit-Learn.
  • Led technical implementation of advanced analytics projects and defined the time-series approaches.
  • Develop new and effective analytics algorithms and wrote the key pieces of mission-critical source code.
  • Implemented advanced machine learning algorithms including regression trees, kernel PCA, among other methods in Python and R and in other tools and languages as needed.

Data Analyst

Confidential, Houston, TX


  • Assisted in gathering requirements and technical documentation.
  • Data acquisition from various sources for customer data analysis.
  • Interpreted complex patterns and trends in datasets using python and R.
  • Analyzed the credibility of customer data to check for loan eligibility.
  • Developed various use cases, class diagrams and sequence diagrams using UML.
  • Collaborated with clients for requirement gathering, use - case development, business process flow and modeling.
  • Authored complex SQL queries for data retrieval, join and process.
  • Involved in mapping data elements from user interface to database.
  • Assisted in modifying the data for visualization depending on business requirement.
  • Documented logical, physical, relational and dimensional data models. Designed the data marts in dimensional data modeling using star and snowflake schemas.

Hire Now