Data Scientist Resume
VA
SUMMARY:
- Experienced D ata scientist with over 7+ years of experience in Data Extraction, Data Modelling, Statistical Modeling, Data Mining, Machine Learning and Data Visualization.
- Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, developing data mining and reporting solutions across a massive volume of structured and unstructured data.
- Involved in abundant industry/research projects through whole data science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment.
- Expertise in Natural Language Processing (NLP), Text Mining, Topic Modelling, Sentiment Analysis,
- Association Rules Analysis and Market Basket Analysis.
- Proficient at Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Neural Networks, Random Forest, Ensemble Models, SVM, KNN and K - means clustering.
- Experienced in data analysis with Time Series Analysis, Survival Analysis, Net Promoter Analysis, RFM Analysis, Customer Churn Rate Analysis, Market Basket Analysis
- Solid knowledge and experience in Deep Learning techniques including Feedforward Neural Network, Convolutional Neural Network (CNN), Recursive Neural Network (RNN), pooling, regularization.
- Excellent proficiency in model validation and optimization with Model selection, Parameter/Hyper-Parameter tuning, K-fold cross validation, Hypothesis Testing, Principle Component Analysis (PCA )
- Proficient with Python 3.x including Numpy, Scikit-learn, NLP, Pandas, Matplotlib and Seaborn.
- Proficiency with R-Studio data minning, basic machine learning, data modeling and information visualization including httr, dplyr, shiny, ggplot2
- Extensive experience in RDBMS such as SQL server 2012, Oracle 9i/10g and non-relational database such as MongoDB 3.x.
- Hand on experience on Hadoop 2.x ecosystem and Apache Spark 2.x framework such as Hive, Pig, and PySpark.
- Proficient at data visualization tools such as Tableau, R ggplot, Python Matplotlib and Seaborn.
- Experienced in Amazon Web Services (AWS), such as AWS EC2, EMR, S3, RD3, and Redshift.
- Management Studio (SSMS) and BI Suite (SSIS/SSRS)
- Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as Github.
TECHNICAL SKILLS:
Programming languages: Python, R,, SQL, Script, Pig, Hive, Spark, NoSQL, C, HTML, CSS,
Data Analytical tools: Tableau, Power BI, RStudio, MS Excel, Weka, MicroStrategy, VOS viewer, WEKA, Cognos.
Databases: Ms. Access, MySQL, MongoDB.
Skills: Natural Language Processing, Classification, Regression, Clustering, Hypothesis testing, Time-series modelling, Deep learning.
EXPERIENCE:
Data Scientist
Confidential, VA
Responsibilities:
- Built and tuned machine learning models to segment faculty researchers through their publication data.
- Handled exceptionally large amounts of data to develop projects on connecting the campus resources.
- Performed statistical analysis to observe patterns with in datasets.
- Evaluated the performances of multiple in-production machine learning models.
- Used Dimensionality reduction techniques such as PCA.
- Worked with the library staff (Non-technical) to provide a chronology of faculty work.
- Reported the funding analysis of various research departments using visualization techniques and created dashboards in Tableau.
- Presented the findings in ‘Mason Graduate Interdisciplinary Conference’.
Data Scientist
Confidential, Woodbury, NJ
Responsibilities:
- Gathered business requirements, definition and design of the data sourcing, worked in conjunction with the data warehouse architect on the development of logical data models.
- Conducted reverse engineering based on demo reports to understand the data without documentation and redefined the proper requirements and negotiated with our client.
- Generated different Data Marts and staging databases in order for hassle-free data gathering (Member info, Claim info, Info Update Log, Transaction info, Appointment info, Diagnose info) from living SQL Server Database and timing raw data files.
- Processed data using Python pandas to examine transaction data, identify outliers and inconsistencies and conducted exploratory data analysis using python NumPy and Seaborn to see the insights of data and validate each feature.
- Used decision path analysis from various decision tree/ random forest to identify possible factors which will lead to a no-show appointment.
- Used Python 3.X (numpy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for occasionally real-time analytic purposes.
- Performed a rough image recognition using CNN and Python Tensor Flow package to identify common type problem such as bone fracture categorizing, tumor detection and blood flow evaluation.
- Developed and implemented predictive models along with hyper-parameter tune such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, and KNN.
- Analyzed patient visit trend in daily basis for common medical resource allocation.
- Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Environment: Python 3.3, Tableau, TensorFlow, Keras, AWS RedShift, EC2, EMR, Hadoop Framework, S3, HDFS, Spark (Pyspark, MLlib, Spark SQL), Python 3.x (Scikit-Learn/SciPy/Numpy/Pandas/Matplotlib/Seaborn, Agile/SCRUM, SQL Server 2012, Teradata 15.0, SQL Server Data Tools 2010, SQL Server Integration Services
Data Scientist
Confidential
Technologies: Amazon Lex, Stanford Core NLP, NLTK, Spacy, Word2vec (Gensim), Scikitlearn, CRF Model (sklearn-crfsuite), Deep Learning (Stacked Bidirectional LSTM, CNN), Keras, Tensor flow, Bayesian optimization - Hyperas (hyperopt), Elastic search, Logstash, Kibana, Python, Flask, Google Translator API, Hadoop, Hive.
Responsibilities:
- Build core NLU engine which parse the user questions by extracting Intents and Entities.
- Implemented some important features (like Brown cluster, word2vec cluster, Word Windowing, LDA topic of the word) for CRF model to correctly tag a word with its corresponding NER tag.
- Implemented spell check with Edit and phonetic based match using Damerau-Levenshtein and Soundex distance algorithm respectively.
- Worked on deep learning model building and Optimization, regularization techniques such as Recurrent Dropout, Early Stopping, Gradient Clipping, Dynamic batching etc. to overcome the problem like over fitting and Network lately converge in deep learning model.
- Trained Bidirectional LSTM CNNs network for Named-Entity-Recognitions.
- Worked on Natural Language pipeline to structure text from various sources.
- Written different shallow parsing rules to extract various entities from the user questions.
- Ensemble Deep learning model, CRF model and NLP techniques to improve the model results.
- Written complex aggregation Elastic search query as per user questions using Python Query DSL.
- Worked on data ingestion activities from different sources (S3 server, hive) for Elastic search using Logstash and ES-Hadoop connector.
- Implemented language translator utility in chatbot to convert German to English language question & answers.
Environment: Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/Matplotlib/Seaborn), R Studio (ggplot2/ shiny/ httr), Tensorflow, AWS RedShift, EC2, EMR, Hadoop Framework, HDFS, Spark (Pyspark, MLlib, Spark SQL), Agile/SCRUM
Jr. Data Scientist
Confidential
Responsibilities:
- Built and improved projection ML models with better accuracy metrics and forecasted the volume of customer businesses.
- Organized, updated and maintained a large database; performed routine statistical analysis for business short term performance, long term growth and customer behaviour.
- Summarized data into readable formats; provided insights on data trends and anomalies.
- Gained expertise on high performance data integration solutions - Microsoft SQL Server Integration Service (SSIS)
- Queried data in SQL for new offers and modifying existing offers, mapping table attributes, analyzing the load and reject data.
- Built a time series model to segment customers on seasonal basis, to design targeted promotional offers.
- Optimized the social marketing expenses of the clients by performing text mining/NLP in python.
- Analysed a service company's customer data, including longevity and overall churn data.
- Pioneered a range of Tableau dashes on the forecasted data and presented to the executive team.
- Analysis of Amazon Review data using Natural Language Processing: Built an open source system to classify the helpfulness index of Amazon product reviews.
- System was built in python using text mining (Tokenizing, POS-tagging, Lemmatizing, Tf-Idf, GloVe) and classified using Logistic regression, Decision-trees, ensemble techniques, SVM (Linear, Radial) and deep learning algorithms (single/multi-layer perceptron, CNN)
- Shared Resource Usage Analysis: Acquired, cleaned, maintained and analysed the usage data of all library resources.
- Analysis included statistical procedures, and building machine learning models to optimise the distribution of shared resources.