Principle Data Scientist/Data Engineer Resume

SUMMARY

Extensive working experiences in the field of Data Science, Data Engineer, Machine Learning, Deep Learning, Data Mining, Predictive Modeling, Recommendation Systems, ETL Development and Data Visualization
Comprehensive programming skills inPython2/3, R, Scala, MATLAB, SQL, Bash, JavaScript, HTML5, CSS3, C, C# and Java
Expertise in Supervised Machine Learning Algorithms like Linear and Logistic Regression, Decision Tree, Ada - Boost, Gradient Boosting, XGBoost, Random Forest, Naïve Bayes, K-Nearest Neighbors, Support Vector Machines, LDA (Linear Discriminant Analysis), Neural Networks;and Unsupervised Learninglike K-Means Clustering, PCA (Principal Component Analysis)
Skilled in Deep Learning Framework: TensorFlow, Keras and PyTorch; Familiar with Deep Learning Models like DNN, CNN, RNN and LSTMs
Experienced in building Data Warehousing and Extract Transform Load (ETL) pipelines using Spark, Airflow and cloud tools
Experience in defining project scope across Data Science, Data Analytics projects in collaboration with senior management and client
Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification in both Waterfall and Agile methodologies
Adept in using Python libraries such as Pandas, NumPy, SciPy, Seaborn, Matplotlib, Scikit-learn, Keras, NLTK
Experience in using Anaconda Navigator (Jupyter Notebook), PyCharm, RStudio for Python and R programming
Working knowledge with Big Data technologies like Hadoop, MapReduce, Spark, SparkSQL, HDFS, Hive, HBase
Expert in designing visualizations usingTableau10.3, Dash, R-Shiny, Power BI and D3.js
Experience in using A/B test, Hypothesis test and ANOVA testing to find the accuracy of model
Professional experience with handling with structured and unstructured data (Social Media, Texting, Photographs and Videos) using relational databases like MySQL 5.X, Oracle 11g
Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
Involved in of existing solutions on-premise systems/applications to Azure cloud.
Implement medium to large scale BI solutions on Azure using Azure Data Platform services(Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
Datacenter Migration, Azure Data Services have strong virtualization experience.
Design and implement streaming solutions using Kafka or Azure Stream Analytics.
Expert in dealing with big data on NoSQL databases like Cassandra3.0 and MongoDB3.2
In-depth knowledge with Cloud Infrastructure like AWS S3, AWS EC2 and Docker
Experience in working with version control systems like GIT and used Source code management client tools like GitBash and GitHub
Excellent communication, analytical, interpersonal, and presentation skills; expert at managing multiple projects simultaneously
Familiar with current industry standards, such ISO, Six Sigma, and Capability Maturity Model (CMM)
Good knowledge in Microsoft Project, Microsoft Office, WordPress, Photoshop etc.

TECHNICAL SKILLS

Machine Learning/Deep Learning\: Regression models, Naive Bayes, Decisiontrees, Random Forests, Ada Boost, XG Boost,SVM, KNN, Bagging, Gradient Boosting, LDA, K-means, Neural Networks, CNN, RNN

Packages\: Numpy, Pandas, Scipy, Seaborn, Matplotlib, Plotly, Keras, Scikit-learn, NLTK, PyTorch, Beautiful Soup, WordCloud, TensorFlow, Flask

Languages\: Python2.7/3.6, R, SQL, JavaScript, Scala, Pig, HTML5,XML, CSS3, Shell, Markdown

Database\: MySQL 5.X, Oracle 11g, PostgreSQL9.6, MongoDB3.2, Cassandra3.0

BI Tools\: Tableau10.3, Microsoft Power BI, MicroStrategy, Dash, R-Shiny

Infrastructure\: Databricks, Docker, AWS, GCP, Microsoft Azure, Git, Bitbucket

Report/Document Tools\: MS Office 2016, MS Project, Outlook, Excel, Word, PowerPoint

BigData Tools\: Spark, SparkSQL, Hadoop, MapReduce, Hive, HBase

Operation Systems\: Linux, Ubuntu, Mac OS, CentOS, Windo

PROFESSIONAL EXPERIENCE

Confidential

Principle Data Scientist/Data Engineer

Responsibilities:

Project Development:Designed and developed scalable production-level recommendation systems leveraging Machine Learning, Deep Learning, Natural Language Processing, Statistical Modeling using Python to solve real-world business problems; collaborated with backend and frontend engineers to implement recommendation systems into Flask Rest Framework and successfully deployed on CentOS
Content-Based Dog Breed Selector: Builtnew algorithm flow based on current rule-based dog breed selector; designed a principle approach (TF-IDF + Cosine Similarity) to compute the similarity between text data; constructed a predictive model for predicting user behavior; implement the Personalized Recommendation API in order to deliver recommended breed to dog seekers
Machine Learning API: Deployed the dog breed selector model as a REST-API using Flask; built and pickled the Content-Based Model; deployed on Linux (CentOS) server

Environment: Experienced with ML-Flow framework to manage the Machine Learning cycle; evaluated different models and selected the best model shipping to production; created a docker-compose.yml file and managed multiple isolated environments on a single host

Environment: Python3.6, JavaScript, Docker Compose, CentOS, MySQL8.0.17, React, flask-restful, iTerm, AWS, Numpy, Pandas, Scikit-learn, Keras, nltk, Tensorflow, Git, JIRA, VS Code

Confidential

Data Scientist

Responsibilities:

Project Development:Designed and developed scalable production-level recommendation systems leveraging Machine Learning, Deep Learning, Natural Language Processing, Statistical Modeling using Python to solve real-world business problems; collaborated with backend and frontend engineers implemented recommendation systems into Django Rest Framework and successfully deployed on AWS EC2
Data Analysis: Translating numbers into meaningful facts for businesses to help them make better business decisions; Perform cleansing, manipulation, analysis, and visualization of client data; Generated data visualization dashboard using Tableau10.3 and Python library Matplotlib/ Seaborn
Data Preprocessing: Collected 6 GB data through company’s API, built Data Processing Pipeline and performed data cleaning, features scaling, features engineering using Pandas and NumPy packages in python; built streaming data ETL using Spark that write only the data that changed from previous batch
NLP (Natural Language Processing) Techniques: Built projects utilizing NLP knowledge including text mining, regex, bag of words, TF-IDF, Word2Vec, PCA, LSTMs, cosine similarity, sentiment analysis, NER, and information extraction
Log Classification:Applied feature selection based on tree importance to get 8 most important features from IVR data and extracted features from modem logs and trained Random Forest to classify intents(label), then builtcontent-based recommender that recommends improvement mimicking status ofgiven cable modem
Recommendation Algorithm: DesignedUser-Based and Item-Based Collaborative-Filtering based on Pearson correlationbetween users/items; hybridized content-based recommender with collaborative-filtering
Model Evaluation: Measured model performance using Confusion Matrix, AUC-ROC curve; and identified accuracy, precision, recall and F1 score using Confusion Matrix; used GridSearch to tune hyperparameters and evaluate a model for each combination of algorithm parameters specified in a grid, finally we increased accuracy by 5%
Agile Project Coordinator:Pitched machine learning ideas, showed exploratory data analysis(EDA) and presented project demo to front desk business users; suggested, collected and synthesized business requirements based on use cases, created an effective roadmap towards the deployment of a production-level machine learning application

Environment: Python3.6, Golang, Flask, Celery, iTerm, Numpy, Pandas, Seaborn, Matplotlib, nltk, Scikit-learn, AWS S3, Databricks, Tableau10.3, Spark, Spark SQL, PL/SQL, Git, JIRA

Confidential

Data Scientist

Responsibilities:

Strategies Building: Being a member of a five-person group charged with building resume-parsing systems using NLP related strategies for recruiting platform based on machine learning and deep learning
Implementation:Transformed resume from PDF, Word, and other forms to txt file using Tika; Created corpus word list including segment keyword list, university list and company list etc.; Searched segment keywords and created bounding box near keyword using Hierarchical Layout, then stored each sentence into respective segment;performed feature extraction by creating segment specific feature list and searched main feature in the respective segment
Machine Learning/Deep Learning: Developed machine learning algorithms for Named Entities Recognition(NER), such as recognizing candidate’s name and company’s name; used Support Vector Machine and Naïve Bayes Classifier to better generate segmentation result; applied Regular Expression for information extraction, such as extracting email address; Implemented Deep learning multi-class classification using RNN and CNN networks;Designed Confusion Matrixand calculated precision, recall and f1 score to measure model performance, the accuracy reached to 99.9%
Data Engineering:Constructed data pipeline on AWS by deploying Linux environment to use Jupyter notebook to query and clean data, enabling datapipelineETL, and preparing machine-learning oriented features table; Applied cloud technology (Google Cloud, AWS, and Databricks) to synchronize and deploy Parse Server (Docker Container)on AWS through EC2;Processed one millionresume files and increased time efficiency by 20 times
Interpersonal Communication and Leadership:Served as group leader for all interns to develop an adaptive information extraction algorithm based on about 100 academic papers; Reviewed and refined all interns' information extraction strategies by testing the results; Collaborating with product managers, marketing analytics, and front-end engineers to deliver features

Environment: Python3.6, JRE8, Docker, Postgres9.6, Apache Tika, Databricks, PyCharm, iTerm, AWS, Numpy, Pandas, Scikit-learn, Keras, nltk, Tensorflow, Git

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship