We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Little Elm, TX

SUMMARY

  • 6 years of experience in Data Science/Machine Learning which includes 1 year on Graph Platforms) with demonstrated history of extracting actionable insights from massive amounts of data and decision making across various domains, setting infrastructure on Linux platform, Skilled in data visualization, building predictive models using Machine Learning techniques to provide best results on unseen data.
  • Expertise in buildingdata models usingmachine learning techniques like Clustering Analysis, Market Basket Analysis, Association Rules, Naïve Bayes, Recommendation System, Dimension Reduction, Principal Component Analysis (PCA), Neural Networks, Natural Language Processing techniques.
  • Expertise to build predictive models like Decision Tree, Linear Regression, Logistic Regression, Support Vector Machine.
  • Understanding hidden patterns and detecting outliers to build sophisticated models.
  • Using Big data technology to build Recommender Systems.
  • Applying Deep learning models: Convolution Neural Networks (CNN), MCNN (Multi Column CNN) to real time implementations.
  • Expertise in working on Linux platforms and computing on HPC clusters, NVIDIA JETSON TX2.
  • Experience in Tableau, Power BI, Statistical Modeling & Graph Analytics.
  • Proficiency in operating and deploying models on Docker, Kubernetes.
  • Hands on experience with Azure data platform stack: Azure Databricks, Azure Data Factory, Azure Data Lake storage, Azure DevOps.
  • Expertise in Scrapping teh data, Data Labelling, Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
  • Demonstrated technical project “ Image Based Head Counting Using Machine Learning Models” Confidential IEEE conference held Confidential Double - Hilton tree, San Jose, CA
  • Expertise with data analysis languages such as C++, Python, SQL, SAS.
  • Participated in “Google Landmark Detection” Kaggle competition.
  • Expertise in Model Evaluation using Performance metrics like Precision, Recall, Accuracy, Confusion Matrix, K-fold Validation.
  • Proficiency in Hadoop ecosystem( Map Reduce, Pig, Hive, Apache Spark, YARN, HDFS, FLUME)
  • Expertise in Amazon Web Service(AWS) and Google Cloud Platform(GCP).
  • Strong, dynamic, and dedicated team player with effective communication skills and ready to address corporate challenges.

TECHNICAL SKILLS

Big Data and Web: Map Reduce, Pig, Hive, Apache Spark, YARN, HDFS, FLUME, ZOOKEEPER; JavaScript, HTML, CSS.

Cloud Technologies: Azure, AWS, Google cloud, Docker, Kubernetes.

Databases and OS: Cassandra, SQL Server, MongoDB, TigerGraph, RAI, Neo4J; Windows, Linux.

Programming Languages and ML Libraries and Tools: Python, R, SQL; NumPy, Pandas, SciPy, Scikit-learn, NLTK, Keras, Seaborn, Matplotlib, TensorFlow, Pytorch, OpenCV, Tableau, Ploty; Anaconda Navigator, PyCharm, Apache Spark, Apache Hadoop, CentOS(Linux), POWER BI.

Statistical Methods: Time Series, Regression models, Confidence intervals, Principal Component Analysis and Dimensionality Reduction.

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential, Little Elm, TX

Responsibilities:

  • Employed programming languages such as Python and C++ to write software prototypes. Analyzed 50+ complex simulation datasets with logistic regression models.
  • Researched teh software market for solutions to client needs.
  • Predicted product sales to an accuracy of 2% through predictive analytics algorithms. Translated business problems into deep learning models to produce results from data inputs.
  • Improved simulation accuracy by 15% through ML algorithms.
  • Managed full SDLC(Software Development Lifecycle) - reviewed requirements, coordinated with teh application development team, distributed tasks, created test plans, coordinated production installation, and participated in support.
  • Involved in Risk Management Plan (RMP), standard part of every Confidential Inc’s Software project and which is referenced in teh overall Project Management Plan.
  • Customer Service Chatbot was created using NLP and Fundamentals of Machine Translation, Python, TensorFlow, nltk, NumPy, scikit learn, Spacy, TextBlob, Word2Vec, RNN.
  • Used NLP for preprocessing, cleaning text data, Tokenization of texts and words.
  • Implementation of state-of-teh-art algorithms like LSTM and RNN.
  • Performing Data Visualization using Matplotlib and familiarity with NumPy and Pandas packages.
  • Building deep learning models using Keras, TensorFlow and Pytorch for product recommendation and deploying model on k8(Kubernetes) clusters.
  • Communicated weekly application status to business users and end users.
  • Unit testing done on completion of development of each unit.
  • Utilized SQL skills to query database.
  • Implemented a new incident management system to provide real time SLA and KPI reports to key stakeholders.
  • Monitored corporate endpoints for malware and unapproved software. Advised on and executed approval methods.
  • Provided guidance on deployment strategy of a security solution for over 80,000 endpoints.
  • Implemented SQL queries for ad-hoc reporting Confidential management and client request.
  • Building Ontology for 26 profiles, cleaning profiles and traversing through graph and designing object relational mapping for Confidential &T data.
  • Working on Atlantis feature store, performing semantic search using DevOps tools.
  • Deploying, managing, and operating scalable, highly available, and fault tolerant systems on MS Azure.
  • Strong presentation skills using MS PowerPoint, Power BI with teh ability to digest complex technical concepts and present them to technical teams for monitoring, logging and cost management tools dat integrate with Azure Databricks.
  • Develop and create test data, retrieve test data from servers by SQL queries.
  • Writing python scripts for loading data through Azure notebooks.
  • Creating Data Engines and using Rel programs to build graphs to further feed into ML pipelines.
  • Working on postman and Norma to perform Semantic search.
  • Working on Automatic Semantic Data Integration
  • Semantic Search and Recommendation Systems to suggest mobile plans and various subscription types to different category customers.
  • Implemented Graph Neural Networks to classify networks based on coverage, various types of broadband used, types of subscribers, types of account, churn rate using Azure Data Bricks.

MACHINE LEARNING ENGINEER

Confidential, Addison, TX

Responsibilities:

  • Working closely with teh Data Science team to halp prioritize teh work and provide acceptance criteria and build data analytics models for substantial number of graphs.
  • Creating graph schema using GSQL queries, installing Page-rank and Louvain algorithm to assist fraud detection team in detecting fraud rings.
  • Working with Application development teams and Data Modeling team to design and develop platforms to load huge data on Neo4j and Tiger graph platforms.
  • Predicting Loan Eligibility using Machine Learning Models, Logistic Regression, Decision Tree Random Forest, XGBoost.
  • Credit Card Fraud Detection using Logistic Regression, Linear Discriminant Analysis, K Nearest Neighbors (KNN), Classification Trees, Support Vector Classifier, Random Forest Classifier XGBoost Classifier.
  • Configuring LDAP and HA clusters to provide secure connection on platform.
  • Building credit card fraud detection system and detecting money laundering using customer service usage data present in graph databases, Tiger Graph and Neo4j.
  • Building Graph architecture using Neo4j for optimizing ML fraud detection model.
  • Performing Disaster recovery on Neo4j, setting up Autosys Jobs on POC, Dev, Prod servers and writing shell scripts for file watcher.
  • Backup and restoring data on Tiger Graph platform.
  • Designing and building Azure end-to-end data pipelines using Python, Azure data bricks and Azure data factory.
  • Working on Azure CI/CD Stories, Bugs, and Issue Management.
  • Raising ARM request to get access to Autosys, upgrading servers, monthly ingestion of data through Kafka pipelines, exporting and importing multiple graphs, defining User Privileges and Authentication roles, and working on Graph Studio User Interface.
  • Provide continuous testing with selenium testing which involves end-to-end, API and UI frameworks.
  • Used pytest framework to integrate several python tools for testing applications which involves xdist, mock, parallel, selenium, chrome, Firefox and provide html/xml reports.
  • Helped individual teams to set up their repositories in bit bucket and maintain their code and halp them setting up jobs which can make use of CI/CD environment.
  • Effective maintenance of resources using ansible and VMware and monitoring teh health every day.
  • Made use of Gradle and maven for building applications and written structured pom which can be consumed by Jenkins.
  • Built custom tools in python for generating email templates which are powerful enough to consume large amount of data and convey teh testing results in a simpler way.
  • Built custom dashboards with Smashing/dashing open-source widgets to display Operational tasks on displays.
  • Installed a Helm chart from teh rich library of existing charts .Iterated on a Helm chart dat was used to build.
  • Used Tilt with Helm charts for deployment to Kubernetes clusters.

MACHINE LEARNING ENGINEER

Confidential, New Jersey

Responsibilities:

  • Designed and implemented Capital Acceptance Model for marketing team using LSTM, XGBoost, K-means Clustering for sales forecasting and efficiently increased sales by 24%.
  • Implemented 5 different models, ARIMA modeling, regressive models like Linear Regression, Random Forest Regression, XG Boost, LSTM to predict product sales using python.
  • Implemented Logistic regression model for flagging adopted users by targeting responsive customers based on service usage behavior.
  • Data Migration onto Hadoop(YARN) and writing pig queries for Job Scheduling.
  • Effectively scaled deep learning workloads on GPUs to increase throughput and reduce latency.
  • Monitored ML-based applications for performance issues with ML-centric capabilities like data drift analysis, model-specific metrics, and alerts using Data Robot MLOps.
  • Conducted ExploratoryDataAnalysis using Python and created reports using Tableau.
  • Analyzing and classifying customer reviews using NLP spacy and trained deep learning models on AWS Data sage Maker.
  • Performing Data Visualization using Matplotlib and familiarity with NumPy and Pandas packages.
  • Building deep learning models using Keras, TensorFlow and Pytorch for product recommendation and deploying model on k8(Kubernetes) clusters.
  • Wrote Ansible Playbooks with PythonSSH as teh Wrapper to Manage Configurations of AWS Nodes and Test Playbooks on AWS instances using Python.
  • Created scripts in Python which integrated with Amazon API to control instance operations
  • Experienced in Automating, Configuring, and deploying instances on AWS, Azure environments and Data centers, also familiar with EC2, Cloud watch, Cloud Formation and managing security groups on AWS.
  • Manage teh planning and development of design and procedures for metric reports.
  • Performed Data Collection, Data Preparation, Feature Engineering, Hypothesis Testing, Data Reduction and Data Mining to halp Data Scientists developing mathematical and statistical models and working on graph database using TigerGraph, Neo4j.
  • Claim Management, Investigating and conducting studies on forecasts, demands, customer needs and capital of products.
  • Implemented Machine Learning algorithms for fraud detection along with SAS rules, filters join, and other processing logic required to finish out teh workflow. Also, performed visual mining using SAS.
  • Overcame teh limitations in market research innovation, by researching current data on industrial trends, positioning, and customer needs.
  • Implemented logistic regression model for flagging adopted users by targeting responsive customers based on service usage behavior.
  • Connecting teh software running on teh user’s local server and running teh model on a cloud HPC cluster of teh required size.
  • Trained deep learning deep learning models on neural network frameworks such as TensorFlow, Pytorch.
  • Designing and implementing testing frameworks for NVIDIA Deep Learning software stack.
  • Working with Big query ML(Machine Learning) on GCP platform to train and build models.
  • Collaborate and interact with internal GPU library teams to analyze and optimize training and inference for deep learning.
  • Work in a distributed computing setting to optimize for both scale-up (multi-GPU) and scale-out (multi-node) systems
  • Design and implement new systems for parallel and GPU processing of Deep Learning networks and layers
  • Exploiting parallelism on GPUs to effectively scale deep learning workloads to increase throughput and reduce latency.
  • Performed Web scraping, Sentiment Analysis and Natural Language Processing.
  • Data wrangled and collected client data for further processing.
  • Performed ETL(Extract Transform Load) using Data Stage and working with Onperm Linux on AWS to train teh models.
  • Built predictive model for client stay duration which improved teh returns over 7% in revenue.

MACHINE LEARNING ENGINEER

Confidential, Fremont, CA

Responsibilities:

  • Trained ML models on Kubernetes (K8s) with Nvidia NGC containers using docker to allocate teh resources.
  • Worked with Linux threading architecture and OpenMP to train ML models.
  • Implemented natural language processing techniques and information search using retrieval tools like Solr, Lucene, NLTK, OpenNLP to translate patient’s records into Risk adjustment and hierarchical condition categories and monitoring CI/CD pipelines.
  • Building Graph architecture to provide Real-time product recommendations.
  • Reduced training time by comparing single GPU with multi-CPU solution on GCP(Google Cloud Platform).
  • Designed and implemented an email servicing model to generate automated suggestions on email inquiries.
  • Used Neo4j for Regulatory Compliance and keeping customer-employee tracks.
  • Maintain database performance and capacity planning for Cassandra.

JUNIOR DATA SCIENTIST

Confidential, San Jose, CA

Responsibilities:

  • Installation, configuration, integration, and management of HPC systems, clusters, operating systems, peripherals, and system interface.
  • Solved BAC (Binding Affinity Calculation) problem with enormous computing power on HPC nodes.
  • Collaborating closely with teh architecture, research, libraries, tools, and system software teams to implement next-generation ML models.
  • Created a generalized data representation architecture dat can be applied on different raw event company data.
  • Applying Classification Algorithms to segregate patients’ records based on summary of diagnosis.
  • Building predictive models, working with clinical experts (e.g., doctors) to translate their subject matter expertise into models, extract and manipulate data from multiple large data sources, and present her / his work to technical and non-technical stakeholders using JETSON TX2 GPU.
  • Also, used Anaconda Navigator IDE(Integrated Development Environment) to build models.
  • Used shiny, ggplot2, glue to visualize teh data and to perform data wrangling.
  • Building software engineering pipelines to integrate with third party services to handle fraudulent activities.
  • Implemented ResNet, a computer vision-based pre-trained model to identify 12 different pathologies from chest X-ray using multiple GPUs.
  • Development and implementation of tools for continuous testing and for efficient GPU job scheduling.
  • Bypassing teh kernel in HPC bare-metal environments through standard way and achieved highest bandwidth and lowest latency, which is critical for many MPI applications.
  • All teh calculations necessary to train teh ML model are performed inside teh K8s cluster.
  • Worked on Kubernetes (K8s) with Nvidia NGC containers using docker.
  • Enabled K8s based workloads to allocate teh resources, train models.
  • Working with Linux threading architecture and OpenMP.
  • Proficient Confidential building robust Machine Learning, Deep Learning models, Convolution Neural Networks (CNN), Recurrent Neural Networks (RNN), LSTM using Tensor Flow and Keras. Adapt in analyzing large datasets using Apache Spark, PySpark, Spark ML and Amazon Web Services (AWS).
  • Ensure dat teh model TEMPhas low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Create and design reports dat will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Use MLLib, Spark's Machine learning library to build and evaluate different models.
  • Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of teh data. Create various types of data visualizations using Python and Tableau
  • Tools and Techniques used: Python, R, Multi-Class Logistics Regression Classifier, Boosted Regression Tree, Random Forest, Association Rules, Support Vector Machine, Clustering Analysis, Collaborative Recommended System, Time-series Analysis, Tableau, JETSON TX2, Linux, Docker, Kubernetes.

RESEARCH ASSISTANT

Confidential, San Jose, CA

Responsibilities:

  • Designed Real time Pothole Detection technique using Python following ML models like CNN, RNN, MCNN in Linux environment.
  • Successfully setup LabelMe tool converting video to image and performing ETL(Extract, Transform, Load) process using Informatica.
  • Implemented Real time Wildfire Detection using Deep Learning models and fine-tuned teh models using test results and statistical analysis.
  • Built Data pipeline using Kafka API to stream teh data and store collected text in MongoDB server.
  • Performed Data Preprocessing (Missing Value Treatment, Outlier Detection, Data Exclusion, Feature Engineering) on data obtained from Yellow Pages.
  • Created reports and dashboards using Tableau to explain and communicatedatainsights, crucial features, models scores and performance of new recommendation system for restaurant data from yelp using Python, TensorFlow, Seaborn, Naïve Bayes, Random Forest, Regression Model.
  • Implemented “Fraud Judger: Real-World Data Oriented Fraud Detection on Digital Payment Platforms” using Graph Adversarial Network and performed extensive graph mining on Neo4j.
  • Designed and implemented cross-validation and continuous statistical tests on ML model.
  • Data Preprocessing (Missing Value Treatment, Outlier Detection, Data Exclusion, Feature Engineering)
  • Improved pre-existing loss forecasting model for teh Capital Flex Loan product using ML modeling Techniques(Linear Regression, Logistic Regression, Count Data Regression, K-Nearest neighbors, K-Means Clustering).
  • Inferred customer ratings to build a proper model and uncovered latent features from customer datasets.
  • Improved teh Internal rate of return(IRR) forecasting model for Clothing, Electronic gadgets, Furnitures using Naïve Bayes, Bayesian Data Analysis, Support Vector Machines, Random Forests, DIM RED: PCA/stepAIC, Regularization, Cross Validation.
  • Used Matrix factorization approaches to use Alternating Least Squares algorithm and conducted hyperparameter tuning to return teh best recommendations possible.
  • Assessed teh model’s performance by cross validation.
  • Tools and Techniques used: Linux, Java, Angular JS, Kubernetes, CNN, RNN, MCNN, R, Python, MATLAB, Naïve Bayes, Random Forest, Regression Model, Tableau, NLTK, ggplot2.

We'd love your feedback!