We provide IT Staff Augmentation Services!

Data Scientist Resume

New, YorK

SUMMARY

  • An enthusiastic Data Science graduate from confidential skilled with Data Science knowledge - Machine Learning, Deep Learning, NLP, Big Data Analytics. Seeking new challenges by leveraging intellectual curiosity and skill.
  • Experience in major components like Python, SQL, R, AWS EC2, S3.
  • Expertise in Numpy, Pandas, Pyspark, NLTK, machine learning and deep learning libraries
  • Worked on Cloud Computing with Amazon Web Services like EC2, S3 which provide fast and efficient processing of Big Data.
  • Hands on in performingClustering, Regression and Classification on data using Machine learning libraries MLlib (Spark)
  • Hands on experience working with machine learning and Deep Learning libraries like CNN, Pytorch, TensorFlow, Sentiment Analysis packages.
  • Extensively Worked on writing high Performance SQL Queries
  • Well-versed with different stages of Model building in Machine Learning, Deep Learning and Natural Language Processing NLP
  • Excellent analytical, problem solving, communication and inter-personal skills to manage and interact with individuals at all levels.
  • Excellent with Microsoft tools like Excel (VLOOKUP, Pivot tables, Pivot charts), Powerpoint, Word
  • Quick learner with ability to master new concepts and applications.
  • Strong ability to understand any requirement & come up with a comprehensible approach.
  • Familiar with JSON based Web services and Amazon Web services.
  • Experienced in working with various Python IDE using PyCharm, PyScripter, Sypder, PyStudio and Google Colab, Jupyter

TECHNICAL SKILLS

Programming Languages: Python, R, SQL scripting, Java, C++, SAS

Database: SQL, MySQL, Oracle, Mongo DB, Neo4j

Web Technologies: PHP, Python, AJAX, CSS & CSS3, HTML 5, XML, Bootstrap, JavaScript

Software: Bash, MySQL Workbench, Tableau, Google Colab, Jupyter, MatLab, R- Studio, Eclipse, VMware, Google Cloud, Sublime Text, PyCharm, AWS, MS Excel, Powerpoint

Machine Learning Libraries: Pandas, NumPy, Tensor Flow, Scikit, Matplotlib, Seaborn, XG Boost, PySpark, Keras, NLTK, PyMongo

Version Control: GIT.

Frameworks: Django, Flask, PHP, JavaScript, JQuery, Node Js, React Js

Content Management Systems: WordPress, Medium

Application Servers: Flask, Apache, Tomcat

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

  • Worked on Data Wrangling on multiple datasets with over 4.5 million records
  • Performed data preprocessing and data cleaning before developing Python scripts to analyze key performance Indicators
  • Worked extensively using Numpy and Pandas libraries in python to perform Data Wrangling and analysis
  • Created user interactive dashboards and decks using Python Pivot charts and Excel pivot tables for the Client to make data driven decisions
  • Leveraged and worked on PySpark to perform Data Analysis on huge (>4.5 M rows) datasets to get the KPIs of the business

Confidential

Data Scientist

Responsibilities:

  • Built Machine Learning and deep learning models for Sentiment Analysis on Twitter data using twitter API
  • Worked with Twitter API and python libraries like TextBlob, Tweepy to collect twitter data and performed data preprocessing and data cleaning before building machine learning and deep learning models
  • Worked extensively using NLP libraries in python like NLTK to perform Sentiment Analysis to understand the Public sentiment on topics like Covid and Elections
  • Built a web application (hosted on AWS EC2 instance) that detects presence of COVID using medical images like CT scans with Flask server and using Inception V3 Deep Learning model
  • Worked with Bootstrap and Javascript technologies to build lucrative web applications which run machine learning and deep learning models in the backend
  • Built a web application that detects presence of Malware in an executable file (.exe file) using Machine Learning
  • Built a web application based on Machine Learning models like RandomForestClassifier, GradientBoostingClassifier to predict presence of Heart Disease based on the inputs collected from the user
  • Deployed applications hosted on AWS EC2 instances to production. All the health related AI applications are now publicly available for access
  • Built a Deep Learning model using Tensorflow and Keras to predict the Airport Wait Times using the data from the Customs and Border Protection website
  • Implemented different techniques like Earlystopping and Dropout to prevent regularization and overfitting of the models and thereby performing well on the unseen (real world) data
  • Took part in migrating the domain from internal server to AWS.
Confidential

Data Scientist

Responsibilities:

  • Developed multiple approaches for Data De-Duplication i.e to eliminate duplicate records
  • Implemented Cosine Similarity along with other string matching algorithms like Jaro Wrinkler, Levenshtine Distance to perform string comparisions
  • Implemented innovative algorithm that helps detect duplicates in datasets with Names and addresses
  • Implemented Active learning where learning algorithm can interactively query a user to label new data points with the desired outputs.
  • Developed an algorithm to perform linkaging records in Python to eliminate redundancies using recordlinkage python Open source library
  • Designed and developed machine learning models that will perform Named Entity Recognition tagging on the Summary of Product Characteristics in the Medical arena
  • Leveraged Python open source libraries like Spacy, Medacy, SparkNLP, Pyspark to perform the NER on different types of structured and unstructured datasets
  • Developed various Machine Learning and Deep Learning models to suit various use cases especially to solve regression problems
  • Implemented Naive Bayes Classifier and Lasso Regression to identify and predict those patients most at risk for costly readmissions using R and RStudio
  • Built end-to-end web applications and deployed to production that extensively use cutting edge Data Science models

Confidential, New York

Data Science Intern

Responsibilities:

  • Worked as a Data Science Intern to develop the trading technology for e-Trading using the cutting edge Data Science knowledge
  • Built models for Chat transcriptions to extract trading products and prices using Natural Language Processing - Deep NLP model
  • Performed paramount Data cleaning and data pre-processing steps like stopwords removal, stemming, lemmatization etc., on the data extracted from the Bloomberg Terminal
  • Built a web application with Java Servlets to generate and analyze Garbage Collector reports that were generated by the Java Virtual Machine
  • Leveraged Bootstrap techniques and Javascript functions to build lucrative charts on the dashboard to understand the reports better
  • Utilized AWS Dynamo DB for data storage and faster access.
  • Wrote Python and SQL scripts to generate a report of Key Performance Indicators to track efficiency of trading desks
  • Created dashboards from the trading data to show Key Performance Indicators of trading desks to shift the trading to electronic
  • Converted JAVA applications to support Python and its libraries Jython and Jepp

Confidential

Associate Solution Advisor

Responsibilities:

  • Designed automated solutions in Python Scripts, SQL (stored procedures) and VBA Macros for performing statistical analysis to improve efficiency of the process by 35%
  • Worked on Data Analytic projects in forensics, AML, fraud detection for leading clients from Life Sciences and Financial industries using Descriptive and Inferential Statistics-parameter estimation and quartile deviations
  • Performed crucial ETL (Extract, Transform and Load) followed by analysis on financial data to identify, instances like Politically Exposed Persons involvement and unusual transactions.
  • Worked on certain Sentiment analysis using Python, NLTK and Twitter API data to understand the Sentiment of the client and the customers when Confidential revamped it’s outlook
  • Improved existing processes by writing SQL functions and stored procedures to automate the data manipulation step instead of using a look-up table with millions of records
  • Resolved extremely slow execution times of a SQL query by optimizing the query using techniques like left-deep joins and filtering records before joining table, selecting only columns that are required instead of all etc.,
  • Designed and Implement My SQL Database using SQL Workbench Schema tools to accommodate all the data storage requirements of the application.
  • Managed the workflow by being pivotal in task scheduling to improve the efficiency of Data Remediation task across multiple parties
  • Automated various manual tasks using VBA macros and Python to expedite the Data Remediation process
  • Built Python and SQL scripts and Excel Pivot tables and charts to perform data analysis to get insights to understanding the various aspects of key performance indicators KPIs
  • Managed huge databases with data of financial transactions using SSIS, SSMS and SSRS

Confidential

Software Developer

Responsibilities:

  • Implemented Parallel computation and multi-threading in C++ to speed up Physics based simulation application by 50%
  • Improved performance by parallelizing the thermal radiation transport module using Message Passing Interface MPI, an open source core of library routines
  • In Parallel computation, divided the main task into number of subtasks and each subtasks are independently performed by different processors simultaneously. This reduces the overall computational time and leads to an efficient execution.
  • Leveraged MPI to allow efficient communication - Avoid memory-to-memory copying and allow overlap of computation and communication and offload to communication co-processor, where available.
  • Designed a algorithmic logic to assign each subroutine the range of tasks it should perform to support parallel processing
  • Understood the existing routine for thermal radiation transport in Radiation Hydro Drynamics for analysis and further learning.
  • Also managed servers in network configuration & development of Web application.

Hire Now