- An enthusiastic Data Science graduate from confidential skilled with Data Science knowledge - Machine Learning, Deep Learning, NLP, Big Data Analytics. Seeking new challenges by leveraging intellectual curiosity and skill.
- Experience in major components like Python, SQL, R, AWS EC2, S3.
- Expertise in Numpy, Pandas, Pyspark, NLTK, machine learning and deep learning libraries
- Worked on Cloud Computing with Amazon Web Services like EC2, S3 which provide fast and efficient processing of Big Data.
- Hands on in performingClustering, Regression and Classification on data using Machine learning libraries MLlib (Spark)
- Hands on experience working with machine learning and Deep Learning libraries like CNN, Pytorch, TensorFlow, Sentiment Analysis packages.
- Extensively Worked on writing high Performance SQL Queries
- Well-versed with different stages of Model building in Machine Learning, Deep Learning and Natural Language Processing NLP
- Excellent analytical, problem solving, communication and inter-personal skills to manage and interact with individuals at all levels.
- Excellent with Microsoft tools like Excel (VLOOKUP, Pivot tables, Pivot charts), Powerpoint, Word
- Quick learner with ability to master new concepts and applications.
- Strong ability to understand any requirement & come up with a comprehensible approach.
- Familiar with JSON based Web services and Amazon Web services.
- Experienced in working with various Python IDE using PyCharm, PyScripter, Sypder, PyStudio and Google Colab, Jupyter
Programming Languages: Python, R, SQL scripting, Java, C++, SAS
Database: SQL, MySQL, Oracle, Mongo DB, Neo4j
Software: Bash, MySQL Workbench, Tableau, Google Colab, Jupyter, MatLab, R- Studio, Eclipse, VMware, Google Cloud, Sublime Text, PyCharm, AWS, MS Excel, Powerpoint
Machine Learning Libraries: Pandas, NumPy, Tensor Flow, Scikit, Matplotlib, Seaborn, XG Boost, PySpark, Keras, NLTK, PyMongo
Version Control: GIT.
Content Management Systems: WordPress, Medium
Application Servers: Flask, Apache, Tomcat
- Worked on Data Wrangling on multiple datasets with over 4.5 million records
- Performed data preprocessing and data cleaning before developing Python scripts to analyze key performance Indicators
- Worked extensively using Numpy and Pandas libraries in python to perform Data Wrangling and analysis
- Created user interactive dashboards and decks using Python Pivot charts and Excel pivot tables for the Client to make data driven decisions
- Leveraged and worked on PySpark to perform Data Analysis on huge (>4.5 M rows) datasets to get the KPIs of the business
- Built Machine Learning and deep learning models for Sentiment Analysis on Twitter data using twitter API
- Worked with Twitter API and python libraries like TextBlob, Tweepy to collect twitter data and performed data preprocessing and data cleaning before building machine learning and deep learning models
- Worked extensively using NLP libraries in python like NLTK to perform Sentiment Analysis to understand the Public sentiment on topics like Covid and Elections
- Built a web application (hosted on AWS EC2 instance) that detects presence of COVID using medical images like CT scans with Flask server and using Inception V3 Deep Learning model
- Built a web application that detects presence of Malware in an executable file (.exe file) using Machine Learning
- Built a web application based on Machine Learning models like RandomForestClassifier, GradientBoostingClassifier to predict presence of Heart Disease based on the inputs collected from the user
- Deployed applications hosted on AWS EC2 instances to production. All the health related AI applications are now publicly available for access
- Built a Deep Learning model using Tensorflow and Keras to predict the Airport Wait Times using the data from the Customs and Border Protection website
- Implemented different techniques like Earlystopping and Dropout to prevent regularization and overfitting of the models and thereby performing well on the unseen (real world) data
- Took part in migrating the domain from internal server to AWS.
- Developed multiple approaches for Data De-Duplication i.e to eliminate duplicate records
- Implemented Cosine Similarity along with other string matching algorithms like Jaro Wrinkler, Levenshtine Distance to perform string comparisions
- Implemented innovative algorithm that helps detect duplicates in datasets with Names and addresses
- Implemented Active learning where learning algorithm can interactively query a user to label new data points with the desired outputs.
- Developed an algorithm to perform linkaging records in Python to eliminate redundancies using recordlinkage python Open source library
- Designed and developed machine learning models that will perform Named Entity Recognition tagging on the Summary of Product Characteristics in the Medical arena
- Leveraged Python open source libraries like Spacy, Medacy, SparkNLP, Pyspark to perform the NER on different types of structured and unstructured datasets
- Developed various Machine Learning and Deep Learning models to suit various use cases especially to solve regression problems
- Implemented Naive Bayes Classifier and Lasso Regression to identify and predict those patients most at risk for costly readmissions using R and RStudio
- Built end-to-end web applications and deployed to production that extensively use cutting edge Data Science models
Confidential, New York
Data Science Intern
- Worked as a Data Science Intern to develop the trading technology for e-Trading using the cutting edge Data Science knowledge
- Built models for Chat transcriptions to extract trading products and prices using Natural Language Processing - Deep NLP model
- Performed paramount Data cleaning and data pre-processing steps like stopwords removal, stemming, lemmatization etc., on the data extracted from the Bloomberg Terminal
- Built a web application with Java Servlets to generate and analyze Garbage Collector reports that were generated by the Java Virtual Machine
- Utilized AWS Dynamo DB for data storage and faster access.
- Wrote Python and SQL scripts to generate a report of Key Performance Indicators to track efficiency of trading desks
- Created dashboards from the trading data to show Key Performance Indicators of trading desks to shift the trading to electronic
- Converted JAVA applications to support Python and its libraries Jython and Jepp
Associate Solution Advisor
- Designed automated solutions in Python Scripts, SQL (stored procedures) and VBA Macros for performing statistical analysis to improve efficiency of the process by 35%
- Worked on Data Analytic projects in forensics, AML, fraud detection for leading clients from Life Sciences and Financial industries using Descriptive and Inferential Statistics-parameter estimation and quartile deviations
- Performed crucial ETL (Extract, Transform and Load) followed by analysis on financial data to identify, instances like Politically Exposed Persons involvement and unusual transactions.
- Worked on certain Sentiment analysis using Python, NLTK and Twitter API data to understand the Sentiment of the client and the customers when Confidential revamped it’s outlook
- Improved existing processes by writing SQL functions and stored procedures to automate the data manipulation step instead of using a look-up table with millions of records
- Resolved extremely slow execution times of a SQL query by optimizing the query using techniques like left-deep joins and filtering records before joining table, selecting only columns that are required instead of all etc.,
- Designed and Implement My SQL Database using SQL Workbench Schema tools to accommodate all the data storage requirements of the application.
- Managed the workflow by being pivotal in task scheduling to improve the efficiency of Data Remediation task across multiple parties
- Automated various manual tasks using VBA macros and Python to expedite the Data Remediation process
- Built Python and SQL scripts and Excel Pivot tables and charts to perform data analysis to get insights to understanding the various aspects of key performance indicators KPIs
- Managed huge databases with data of financial transactions using SSIS, SSMS and SSRS
- Implemented Parallel computation and multi-threading in C++ to speed up Physics based simulation application by 50%
- Improved performance by parallelizing the thermal radiation transport module using Message Passing Interface MPI, an open source core of library routines
- In Parallel computation, divided the main task into number of subtasks and each subtasks are independently performed by different processors simultaneously. This reduces the overall computational time and leads to an efficient execution.
- Leveraged MPI to allow efficient communication - Avoid memory-to-memory copying and allow overlap of computation and communication and offload to communication co-processor, where available.
- Designed a algorithmic logic to assign each subroutine the range of tasks it should perform to support parallel processing
- Understood the existing routine for thermal radiation transport in Radiation Hydro Drynamics for analysis and further learning.
- Also managed servers in network configuration & development of Web application.