Sr. Data Scientist Resume
Cincinnati, OH
SUMMARY
- Highly accomplished Data Scientist/Data Analyst with 6+ years of experience in, Data Analysis, Data mining, Machine Learning with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
- Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison, and validation.
- Have ability to build advanced statistical and predictive models, such as generalized linear, decision tree, neural network models, ensembles models, Support Vector Machines (SVM), and Random Forest.
- Skilled in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models,clustering), dimensionality reduction using Principal Component Analysis, testing and validation using ROC plot, K - fold cross-validation and data visualization.
- Experience with a variety of NLP methods for information extraction, topic modeling, parsing, and relationship extraction with developing, deploying, and maintaining production NLP models with scalability. Creative thinking and propose innovative ways to look at problems by using data mining approaches on the set of information available.
- Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Experience in working with relational databases (Teradata, Oracle) with advanced SQL programming skills.
- Identifies/creates the appropriate algorithm to discover patterns, validate their findings using an experimental and iterative approach.
- Experience in designing visualizations using Tableau software and publishing and presenting dashboards.
- Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Experience in Big Data platforms like Hadoop platforms (Map-R, Hortonworks & others), Aster and Graph Databases
- Erperienced in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, and design
- Expertise in transforming business requirements into analytical models, building models, designing algorithms developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
- Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, reindex, merge, subset and reshape.
- Experience in using various packages in R and python-like ggplot2, gmodels, twitter, NLP, pandas, NumPy, Seaborn, SciPy, Matplotlib, sci-kit-learn.
- Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, PySpark, Spark SQL,PySpark
- Hands on experience in implementing LDA and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
- Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
- Good industry knowledge, analytical &problem-solving skills and ability to work well within a team as well as an individual.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Extensive experience in operating Big Data Pipelines (Spark, Hives, SQL engines) batch and streaming.
- Highly skilled in using visualization tools like, ggplot2, dash, Tableau,flask for creating dashboards.
- Experience with Data Analytics, Data Reporting, Graphs, Scales, Ad-hoc Reporting, OLAP reporting and PivotTables
- Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization.
- Extracted data from HDFS and prepared data for exploratory analysis using data munging.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Extensive experience in Data Visualization including producing tables, listings,graphs using various procedures and tools such as Tableau.
- Implemented, tuned and tested the model on AWS EC2 with the best algorithm and parameters.
- Highly innovative, creative, committed, intellectually curious, business savvy with good communication and interpersonal skills.
TECHNICAL SKILLS
Languages: Anaconda, Jupyter notebook, C, C++, XML, R/R Studio, SAS, SAS Enterprise Guide, R, Python 2.x/3.x, Java, C, SQL, Shell Scripting, Spark Sql, Maven, Scala, spark 2, 2.3, Spark Streaming, Hadoop, MapReduce, R - (Packages: Stats, Zoo, Matrix, data table, OpenSSL), HDFS, Eclipse, Anaconda, Jupyter notebook
BI Tools: Tableau,Tableau server, Tableau Reader, Splunk,SAP Business Objects,OBIEE,SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse
Statistics: Hypothetical Testing, ANOVA, Confidence Intervals, correlation,, Bayes Law, MLE, Fish Information, Principal Component Analysis (PCA), Cross-Validation.
NO SQL Databases: Maria DB, MongoDB, Cassandra
Big Data: Hadoop,HDFS,HIVE,PuTTY,Spark,Scale,sqoop
Data Analysis and Data Science: Neural Networks, graph/network analysis, and time series analysis(ARIMA model), NLP Deep neural network, Logistic regression,Decision Tress, Random Forests, KNN, XGBoost, Ensembles (Bagging, Boosting), Support Vector Machines.
Algorithms: Logistic Regression, Random Forest, XG Boost, KNN, SVM, Neural Network rk, Generalized Linear Models, Boxplots,, Lasso Regression, K-Means Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Neural networks, AI, Teradata, Tableau, H2O flow, Splunk, GitHub.
Database Design Tools and Data Modeling: Normalization and De-normalization techniques, Kimball &Inmon Methodologies, MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS
PROFESSIONAL EXPERIENCE
Confidential, Cincinnati, OH
Sr. Data Scientist
Responsibilities:
- Natural language processing was used to identify and separate words, to extract topics in a text, to build a fake news classifier.
- The libraries such as NLTK, gensim, spacy and Textblob was used to utilize deep learning to solve common NLP problems such as sentimental analysis
- Use of state of art NLP models like word2vec, doc2vec, BERT, SBERT, glove, and sentence encoder for document comparison.
- Deep learning (LSTM recurrent neural network (RNN)) package was used for the sequence-to-sequence (Seq2Seq) LSTM model.
- Convolutional Neural Networks (ConvNets or CNNs) were used for image recognition and classification.
- Autoencoders for Image Reconstruction in Python and Keras and variational autoencoder was used to decrease the reconstruction loss
- Developed Convolutional Neural Networks (CNNs) in Python using the TensorFlow deep learning library to address the image and recognition problem
- Microsoft's Computer Vision was used for Optical Character Recognition (OCR) to extract text, handwritten text, digits, and integer from images.
- Built and developed Predictive maintenance machine learning model to predict the unexpected failures and the root cause of problems in complex systems, whether an asset may fail in the near future, and to estimate the Remaining Useful Life ( RUL) time-series data of sensor measurements of a similar engine.
- Developed Machine Learning experiments using Microsoft Automated Azureutilizing multiple algorithms
- Azure Jupyter notebook to perform detailed predictive analytics and building
- Web Services model fo real-time failure predictions and predictive maintenance with remaining useful life (REL).
- Convolutional neural network (CNN) was used for Brain tumor segmentation to identify healthy tissue from tumorous regions of MRI images.
- Convolution Autoencoder was used to anomaly detection of brain tumor of CT image.
Confidential, Minnesota
Data Scientist/ML Engineer
Responsibilities:
- Different analysis processes such as ML modeling, trend analysis, machine learning and statistics, and other data analysis techniques were used to collect, explore, and identify the data to explain customer behavior and segmentation, text analytics,product-level analysis, big data analytics, and customer experience analysis.
- Used Data cleaning process applied Backward - Forward filling methods on a dataset for handling missing value.
- Analyzed and Performed Data Preparation by applying a historical model on the data set in AZURE ML.
- Develop, plan, and apply leading-edge analytic and quantitative tools and modeling techniques to help clients gain insights and improve decision-making.
- Perform Data Transformation method for Normalizing Variables and Rescaling.
- Apply various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
- Optimize algorithm with stochastic gradient descent algorithm Fine-tuned the algorithm parameter with manual tuning and automated tuning such as Bayesian Optimization.
- Extensively used Power BI, Pivot Table, and Tableau to manipulate large data and develop a visualization dashboard that resembles the vulnerability assets.
- Use Power BI for building predictive modeling forecasting monthly revenue and visualizing.
- Used the python packages, such as pandas, NumPy, Jupyter Notebook, scipy, scikit-learn, TensorFlow, Keras to design and implement logistic regression, neural networks, NLP, support vector machines, and ensemble trees, multivariate regressions, clustering algorithms, simulation, modeling, neural networks, GLM, Regression, Random Forest, Boosting, text mining,
- Performance was optimized using Hyperparameter tuning, debugging, parameter fitting, and troubleshooting of models and automated the processes.
- Used advanced SQL query to extract data from SQL server and to integrate the data with Tableau.
- Developed reports, charts, tables, and other visual aids in support of findings to recommend business direction or outcomes.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Implement machine learning model (logistic regression, XGBoost, SVM) with Python Scikit- learn.
- Work on Text Analytics, Naive Bayes, Sentiment analysis, creating word clouds, and retrieving data from Twitter and other social networking platforms.
- Successfully read data, cleaned data, filtered data, preprocessed data, removed outlier, subset data, read preprocessed data, used model ( Linear Regression, Random Forest Regression ), and selected the reasonable model on the basis of R-square and accuracy.
- Successfully connected to different data sources using SSH, SFTP from Hadoop cluster, and Azure data factory.
Confidential - Philadelphia, PA
Data Scientist/ML Engineer
Responsibilities:
- Applied state-of-the-art Machine Learning (ML) / Deep Learning (DL) technologies to evaluate the sentiment score by Re-trained semantic segmentation models.
- Used classifiers built in SciKit-learn, on top of hand-engineered linguistic features extracted by spaCy, an industrial-strength NLP library.
- Designed a light-weight convolutional neural network (CNN) on TensorFlow to recognize keywords, phrases, tone and inflection.
- Worked for Hyperparameter tuning, debugging and troubleshooting machine learning models and automated processes to optimize the performance.
- Analyzed performance, Conducted experiments with different types of algorithms and models to identify the best algorithms to employ.
- Migrated eDiscovery search engine from monolith to microservice architecture using NLTK, SciKit-Learn.
- Conducted a hybrid of Hierarchical and K-means Cluster Analysis
- Implemented use of real-time data using Storm and Spark as well as Spark ML.
- Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana etc.
- Documented logical data models, semantic data models and physical data models.
- Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
- Engineered pipelines for tokenization and word vector encoding with CoreNLP and Python and deployed them on premise.
- Built and tested different Ensemble Models such as Bootstrap aggregating, Bagged Decision Trees and Random Forest
- Gradient boosting, XGBoost, and AdaBoost to improve accuracy, reduce variance and bias, and improve stability of a model
- Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks .
- Worked on Cloud Services such as Amazon Web Services (AWS), big data in Hadoop ecosystem, S3, Redshift to assist with big data tools, solve storage issue.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana etc.
- Involved in the structure, execution, improvement and combination of an Artificial Intelligence arrangement.
- Sentimental analysis, mine unstructured information, and develop insights using Statistical Natural Language Processing
