AREA OF INTEREST:
Data Science, Data Analysis, Data Mining and Machine Learning.
Languages: Python, C++, Java(basic).
Numerical Computing: R, MATLAB.
Systems: Linux, Hadoop.
Database: SQL, MYSQL
Libraries: NumPy, SciPy, Pandas, IPython, matplotlib, scikit - learn, TensorFlow, ggplot2, dplyr, caret, car, randomforest, data.table, XML, jasonlite, RMarkdown, reshape2, twitter, sqldf, MySQLdb, Matcovnet.
Machine Learning Engineer
- Extract Transfer Load (ETL) data using petl library (python ETL)
- AWS lambda function and machine learning services to mine data from web sources using python parser; clean, organize and combine data points to extract useful insights into client profiles
- Used nontraditional financial data (crime rate indices, unemployment rates, job security index, average pay,sql job titles based on geolocation) from open source data to formulate a comprehensive outlook of clients’ financial future pay ability to increase account receivables for business
- SDLC: operated on weekly sprint facilitated by Agile/Scrum
- Red Hat 6 and older Linux server infrastructure setup, maintenance and support for virtual and physical hosts.
- Managed queues using ServiceNow and Remedy 7; on - call support, SRTs, disaster recovery (DR) testing and datacenter build and migration.
- LDAP server/user administration.
- Installing and patching of SOE/Applications, TL upgrades, Server tuning etc.
- LPAR and HMC Management: Modifying server profile, DLPAR partitions, disk mapping through VIO, etc.
- LVM and File System Management: housekeeping of file systems, adding/deleting file systems and modifying size and attributes.
- Used shell Scripting for performance monitoring and event tracking.
- Taught beginner and intermediate English to Japanese and Korean speaking populace.
- Extracted features using Convolutional Neural Network on Kaggle cats & dogs image dataset using TensorFlow API and Inception-v3 framework, pre-trained on ImageNet dataset
- Performance validation for SVM, both linear and kernel SVM Confidential features and varied training samples accompanied by hyper-parameter optimization using Bayesian approach.
- Improved two class accuracy to 62% with just 250 samples, 70% for 5000 samples and 63% for third class classification.
- Evaluated classification performance in a team of three for breast cancer, forest, species, car evaluation dataset from UCI repository using Fisher’s LDA, SVM, LOGREG, KNN, NB, NN, BAG-DT, Kernel-SVM, LASSO on python and R.
- Performance analyzed using ROC, LFT, ACC and mean error rates for different training, validation and test cases.
- Implemented majorization based LASSO function and Kernel-SVM in Python.
- Implemented linear regression based model for cancer presence prediction.
- Logistic regression based model for classification of cyclic organic compounds in R.
- Tested case based datasets for cases involving Confidential for continuous/randomized block design, Latin Squares Design, Factorial Designs, Nested Designs, Split-Plot Designs, Repeated Measures Design and Poisson Regression.
- Devised beauty product recommendation system using collaborative filtering based on Fitzpatrick skin types.
- Implemented collaborative filter in python for 100K users hosted on MYSQL server on Amazon’s RDS
- Built website using HTML, CSS and PHP hosted on ec2 instance with LAMP server.