We provide IT Staff Augmentation Services!

Data Scientist/machine Learning Engineer Resume

4.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Highly efficient Data Scientist with 8 years of experience in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive.

TECHNICAL SKILLS

Languages: Python, R

Packages: ggplot2, caret, dplyr, Rweka, gmodels, twitter, NLP, Reshape2, rjson, dplyr, pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit - learn, Beautiful Soup, Rpy2.

NLP/Machine Learning/Deep Learning: LDA (Latent Dirichlet Allocation), NLTK, Apache OpenNLP, Stanford NLP, Sentiment Analysis, SVMs, ANN, RNN, CNN, TensorFlow, MXNet, Caffe, H2O, Keras, PyTorch, Theano, Azure ML

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka.

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Jupyter, Teradata, Netezza, MongoDB, Cassandra.

BI Tools: Tableau, Tableau Server, Tableau Reader, SAP Business Objects, Amazon Redshift, Azure Data Warehouse

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, GA

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Built Machine learning data pipeline in aws using deep learning frame work such as comprehend medical, textract and sagemaker to extract medical related entities .
  • Developed Deep learning model using Spacy to extract the lab related terms and deployed them in aws sagemaker with accuracy of 91%
  • Leveraged aws glue and pyspark to perform ETL related activities and used lambda functions to load the data into redshift cluster .
  • Built pipeline framework for the flow of data using aws sns and sqs for the asynchronous flow of data from one end point to another .

Confidential, SAN JOSE, CA

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Automated and built Machine learning model pipelines using frameworks such as sci-kit learn on user identity security data to detect fraud and deployed them using kubeflow . (anomaly detection) .
  • Developed Deep learning model LSTM(RNN) based on semantic analysis of the text (Natural language processing) to detect potentially malicious typo-squatting/targeted phishing URL and leveraged ASN mappings, WHOIS data patterns, and HTML tag analysis to classify the type of attack with an accuracy of 91%.
  • Built and automated data engineering ETL pipeline over Snowflake DB using Apache spark and integrated data from disparate sources with Python APIs, consolidated them in a data mart (star schema) and orchestrated entire pipeline using apache airflow delivering daily/weekly metric email reports from the power BI server to facilitate on the go decision making for business users.

Confidential - Santaclara

Data Scientist

Responsibilities:

  • Built machine-learning model to classify anomalies such as online scammers, telemarketers from end user data with 92% accuracy to detect fraud.
  • Designed prepaid & postpaid churn prediction model using Random forest, which enabled operators to reach 60 to 500 at-risk customers out of every 1,000 calls which lead to decrease of attrition rate from 24 % to 19 % a year.
  • Performed data profiling and customer segmentation using clustering techniques to predict value-added service purchase by the customer.
  • Successfully deployed coupon recommendation engine on amazon aws using matrix factorization technique over spark ml, based on customer Location and purchase history, which generated ad revenue of $1M .

Confidential - Wichita, KS

Data Scientist/Machine Learning Engineer

Responsibilities:

  • Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, seaborn to develop variety of models and algorithms for analytic purposes.

We'd love your feedback!