We provide IT Staff Augmentation Services!

Data Scientist /python Resume

4.00/5 (Submit Your Rating)

Dayton, OhiO

SUMMARY

  • Over 5+ years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Microsoft Azure Machine Learning Certified for big Data Analytics.
  • Intensive experience using Machine Learning Algorithms Supervised, Unsupervised, Reinforcement Learning, Recommendation Engine, Active learning, Deep Learning and Artificial Intelligent techniques to solve teh business problem
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Experience with NLP for sentiment analysis, topic modeling, voice recognition, language understanding, language generation, and entity extraction.
  • Experience with mathematical and statistical Python Libraries such as numpy, pandas, skit - learn, Numpy, SciPy, Matplotlib, Seaborn, Beautiful Soup, Rpy2, NLTK, TensorFlow, Keras, Theano, R packages CaTools, Caret, rpart, dplyr, rjson, Rweka, tidytext, tm (text mining), stringr, snowball, pylr, RCurl, gmodels, C50, reshape2, twitter r.
  • Employing various SDLC methodologies such as Waterfall, Agile, Kanban and SCRUM methodologies.
  • Expert in teh entire Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization.
  • Hands - on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting.
  • Proficient in Python and its libraries such as NumPy, Pandas, Scikit-learn, Matplotlib and Seaborn.
  • Experience in working with Hadoop Big Data tools such as HDFS, Hive, Pig Latin and Spark.
  • Deep noledge of SQL languages for writing Queries, Stored Procedures, User-Defined Functions, Views, Triggers, Indexes and etc.
  • Designed, trained, validated end to end ML models & pipelines for image, video, timeseries, seq2seq using TensorFlow, Keras, Pytorch to setup deep CNN.
  • Maintenance and monitoring of Docker in a cloud-based service during production and Set up a system for dynamically adding and removing web services from a server using Docker.
  • Worked on integration of diverse mathematical and statistical procedures, pattern recognition, model building, creating various scientific and industrial packages within R.
  • Knowledge and experience in agile environments such as Scrum and using project management tools like Jira/Confluence, TFS and version control tools such as GitHub/Git.
  • Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction from Cloud and merging from Oracle.
  • Hands-on experience in importing and exporting data using Relational Database including Oracle, MySQL and MS SQL Server, and NoSQL database like MongoDB.
  • Experience in implementing data analysis with various analytic tools, such as Jupyter, Notebook, Spyder, Reshape, ggplot2, Dlpr, Car, Mass, SAS, Matlab and Excel.
  • Good team player and quick-learner; highly self-motivated person with good communication and interpersonal skills.

TECHNICAL SKILLS

Programming Languages: Python, SQL, R, Java

Scripting Languages: XML, Python, Bash.

Natural Languages Processing: NLTK, Standford CoreNLP(tokenization, part of speech, parsing, and name tag entity recognition), TextBlob, Gensim, Apache Spark, Scikit-learn

Data Sources: HDFS, Teradata, SQL Server, Excel

Data Visualization: PowerBI, TABLEAU, Matplotlib, Plotly, Seaborn, ggplot, Matlab, Penthau

Predictive and Machine Learning: Regression (Linear, Logistic, Bayesian, Polynomial, Ridge, Lasso), Classification (Logistic Reg., two/multiclass classification, Boosted Decision Tree, Random Forest, Decision Tree, Naïve Bayes, Support Vector Machines, k-Nearest Neighbors, Neural Network, and various other models), Clustering (K-means, Hierarchical), Anomaly Detection, LSTM, RNN

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Spark

Database Systems: Oracle, MySQL, Mongo DB, DB2, Teradata

Operating System: Linux, Windows, Unix, MacOS

Methodologies: Agile, Scrum, Waterfall

Version control: Git, SVN (subversion)

Other Skills: Data Analysis, Data transformation and visualization, Machine learning algorithms, ETL, Dashboard, Flask Apps, Rest API

PROFESSIONAL EXPERIENCE

Data Scientist /Python

Confidential, Dayton, Ohio

Responsibilities:

  • Applied Supervised Machine Learning Algorithms Logistic Regression, Decision Tree, and Random Forest for teh predictive modeling various types of problems: Successful Transition from Skilled Nursing Facility, identify predictors for Medicare Advantage members, lower teh cost of mitigating homelessness, issues management.
  • Developed NLP models for Topic Extraction, Sentiment Analysis for teh MA Disenrollment root cause Analysis. Work with NLTK library to NLP data processing and finding teh patterns.
  • Implemented Topic Modeling (LDA: Latent Dirichelet Allocation) for teh theme understanding what teh member is talking about to find teh issues on services.
  • me has been analyzing Big Data using Hive, Pig and Impala, implement MapReduce program using R and built Tableau Reports and Visualizations.
  • Implemented classification Models including Random Forest and Logistic Regression to quantify teh likelihood of each member enrollment for teh upcoming enrollment period.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization to deliver data science solutions.
  • Developed Sparkcode by using Scala and Spark-SQL for faster processing and testing
  • Used Linear Regression for teh member cost for teh upcoming enrollment period.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, Natural Language Processing (NLP), supervised and unsupervised, Regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python.
  • Used analytical programming languages such as R, Python.
  • Used different IDEs Jupyter Notebook, Visual Studio code, R studio.
  • Visualization of data using MS PowerBI, ggplot, Seaborn, matplotlib and plotly.
  • Deployed teh model using Flask, Microsoft Azure and Rest API for teh web application.
  • Extracting teh source data from Oracle tables, SQL Server, sequential files and excel sheets.
  • Developing and maintaining Data Dictionary to create metadata reports for technical.
  • Responsible in developing system models, prediction algorithms, solutions to prescriptive analytics problems, data mining techniques, and/ or econometric model.
  • Communicate teh results with operations team for taking best decisions and Collect data needs and requirements by Interacting with teh other departments.
  • Demonstrated and build statistical / machine learning systems to solve large-scale customer-focused problems and leveraging statistical methods and applying them to real-world business problems
  • Perform Data Profiling to learn about behavior with various features of turnover before teh hiring decision, when one has no on-teh-job behavioral data.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of through a discovery approach.
  • Built Artificial Neural Network using Tensor Flow in Python to identify teh customer's probability of canceling teh connections. (Churn rate prediction)
  • Understanding teh business problems and analyzing teh data by using appropriate Statistical models to generate insights.
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Ensure that teh model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.

Data Scientist/Python

Confidential, Phoenix, Arizona

Responsibilities:

  • Applied Regression, Classification, Clustering algorithms, Trend Analysis, Forecasting and NLP for product review, fraud detection, customer support, loan application/ credit application Status finding, default finding, pattern finding of customer spend, invest or making financial decision, forecasting, trend Analysis, product recommendation, loan offering.
  • Used classification techniques including Random Forest and Logistic Regression to quantify teh likelihood of each user referring.
  • Utilize machine learning algorithms such as logistic regression, multivariate regression, K-means, & Recommendation algorithms to extract teh hidden information from teh data.
  • Used Pandas, NumPy, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Linear regression, Logistic regression, Gradient Boosting, SVM and KNN.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed teh customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit teh analytical requirements.
  • Worked with Apache Sparkwhich provides fast and general engine for large data processing.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
  • Conducted analysis and patterns on customers' shopping habits in different location, different categories and different months by using time series modeling techniques.
  • Used RMSE/MSE, AIC (Akaike Information Criterion), SC (Schwartz Criterion) and BIC (Bayesian Criterion) to evaluate different models' performance.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Used Tensor-Flow library in dual GPU environment for training and testing of teh Neural Networks
  • Create and design reports that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Worked on Deep learning for image recognition using TensorFlow/Keras
  • Using NLP developed deep learning algorithms for analyzing text, over their existing dictionary-based approaches.
  • Perform Data Cleaning, features scaling, features engineering using python and R.
  • Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of teh data.
  • Create various types of data visualizations using Python and Tableau.
  • Collected data using Hadoop tools to retrieve teh data required for building models such as Hive and Pig
  • Generate reports to meet regulatory requirements.
  • Extensively involved in tuning of SQL to optimize teh performance.

Data Scientist

Confidential, Southfield, MI

Responsibilities:

  • Build model using Linear regression, Logistic regression, Decision Tree, k-means clustering, SVM and Random Forest intelligence decision models to analyze teh customer response behaviors, interaction pattern, and Sales prediction and forecasting.
  • Performed data extraction, manipulation, cleaning, analysis, modeling and data mining.
  • Acquired and cleaned data using R and python extracting from various sources including internal and external databases.
  • Compiled and Analyzed sales and marketing reports using SQL and statistical method.
  • Applied Probabilistic Graphical Methods (Bayesian and Gaussian network) to create Machine learning models.
  • Built Machine Learning classification models using supervised algorithm like Boosted Decision, Logistic Regression, SVM, and Random Forest, NaïveBayes, KNN.
  • Participated in all phases of data mining, data cleaning, data collection, developing models, validation, visualization, and performed Gap analysis.
  • A highly immersive Data Science program involving Data Manipulation &Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT.
  • Designed 20+ dashboards in Tableau for sales managers with instant access to personalized analytics portal, so they can access teh key business metrics.
  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit teh analytical requirements.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed teh customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Performed data imputation using Scikit-learn package in Python.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
  • Conducted analysis and patterns on customers' shopping habits in different location, different categories and different months by using time series modeling techniques.
  • Used RMSE/MSE to evaluate different models' performance.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Provided Level of Efforts, activity status, identified dependencies impacting deliverables, identified risks and mitigation plans, identified issues and impacts.
  • Designed database solution for applications, including all required database design components and artifacts.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose using Erwin report designer.

We'd love your feedback!