Data Scientist/machine Learning Engineer Resume
Atlanta, GA
SUMMARY
- Highly efficient Data Scientist with 8 years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive.
TECHNICAL SKILLS
Languages: Python, R
Packages: ggplot2, caret, dplyr, Rweka, gmodels, twitter, NLP, Reshape2, rjson, dplyr, pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit - learn, Beautiful Soup, Rpy2.
NLP/Machine Learning/Deep Learning: LDA (Latent Dirichlet Allocation), NLTK, Apache OpenNLP, Stanford NLP, Sentiment Analysis, SVMs, ANN, RNN, CNN, TensorFlow, MXNet, Caffe, H2O, Keras, PyTorch, Theano, Azure ML
Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.
Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Jupyter, Teradata, Netezza, MongoDB, Cassandra.
BI Tools: Tableau, Tableau Server, Tableau Reader, SAP Business Objects, Amazon Redshift, Azure Data Warehouse
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Data Scientist/Machine Learning Engineer
Responsibilities:
- Built Machine learning data pipeline in aws using deep learning frame work such as comprehend medical, textract and sagemaker to extract medical related entities .
- Developed Deep learning model using Spacy to extract the lab related terms and deployed them in aws sagemaker with accuracy of 91%
- Leveraged aws glue and pyspark to perform ETL related activities and used lambda functions to load the data into redshift cluster .
- Built pipeline framework for the flow of data using aws sns and sqs for the asynchronous flow of data from one end point to another .
Confidential, SAN JOSE, CA
Data Scientist/Machine Learning Engineer
Responsibilities:
- Automated and built Machine learning model pipelines using frameworks such as sci-kit learn on user identity security data to detect fraud and deployed them using kubeflow . (anomaly detection) .
- Developed Deep learning model LSTM(RNN) based on semantic analysis of the text (Natural language processing) to detect potentially malicious typo-squatting/targeted phishing URL and leveraged ASN mappings, WHOIS data patterns, and HTML tag analysis to classify the type of attack with an accuracy of 91%.
- Built and automated data engineering ETL pipeline over Snowflake DB using Apache spark and integrated data from disparate sources with Python APIs, consolidated them in a data mart (star schema) and orchestrated entire pipeline using apache airflow delivering daily/weekly metric email reports from the power BI server to facilitate on the go decision making for business users.
Confidential - Santaclara
Data Scientist
Responsibilities:
- Built machine-learning model to classify anomalies such as online scammers, telemarketers from end user data with 92% accuracy to detect fraud.
- Designed prepaid & postpaid churn prediction model using Random forest, which enabled operators to reach 60 to 500 at-risk customers out of every 1,000 calls which lead to decrease of attrition rate from 24 % to 19 % a year.
- Performed data profiling and customer segmentation using clustering techniques to predict value-added service purchase by the customer.
- Successfully deployed coupon recommendation engine on amazon aws using matrix factorization technique over spark ml, based on customer Location and purchase history, which generated ad revenue of $1M .
Confidential - Wichita, KS
Data Scientist/Machine Learning Engineer
Responsibilities:
- Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
- Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, seaborn to develop variety of models and algorithms for analytic purposes.