Software Engineer Resume
San Jose, CA
SUMMARY OF SKILLS:
- 2+ years of Hands - on Data Engineer and Data Scientist at Confidential, San Jose
- Designing and developing Machine Learning proof-of-concept
- Data scraping, wrangling, and transforming using Pandas and PySpark
- Designing Machine learning application using Apache Spark and Scikit-learn
- Designing and developing end-to-end Data pipeline
- Distributed computing using Apache Spark and Hadoop
- Design Interactive Notebook using Jupyter Notebook
- Excellent Communication and Presentation Skills
PROFESSIONAL EXPERIENCE
Software Engineer
Confidential, San Jose CA
Responsibilities:
- I saved $100K per quarter by replacing commercial text analytics solution with Machine Learning Application.
- Designed and Developed Proof of Concept, Unsupervised Machine Learning Application in Python. The program categorizes the Customer Loyalty Survey Documents into different Topics - cluster of similar words, using Probabilistic Clustering Algorithm.
- Designed and Developed Proof of Concept Supervised Machine Learning Application in Apache pySpark for Sales Transaction processing - matching of unallocated Sales Transactions to the correct salesperson based on Sales Transaction, Customer and Sales Accounts Data stored in Apache Hadoop
- Designed and developed data pipeline using Apache Spark to clean and prepare Customer Survey Data for Machine Learning.
- Developed Data Visualization of the Document Cluster to demonstrate high frequency words in each cluster.
- Designed Sentiment Analysis Program using R programming language to create Tableau Dashboard of the EPS - Executive Pulse Survey
- Evaluate Apache Spark ML API, by writing Machine Learning code in Scala, python and R programming language against large public datasets
- Designed and wrote PHP program for SOAP API to read Survey Data from Verint Database
- Designed Tableau Dashboard to present Customer Survey Data to the Sales Executive.
- Intern Final Project Presentation: Sentiment Analysis of Unstructured Text Data from FY2015 CSAT dataset to Cisco Executives and Managers.
- Text Analytics of APAC Executive Pulse Survey (EPS) dataset using R and Tableau 9.0 for developing Dashboard.
- Developed a Sentiment Analysis of Twitter feed data in batch mode using R language.
- Developed Tableau Dashboard by integrating two data sources FY 2015 Booking Data & FY2015 Executive to compare Booking Data, Customer Loyalty and Verbatim response to 8 open-ended questions.
TECHNICAL SKILLS:
Programming: Python, R Programming, SQL and Hive Query Language
Deep Learning: NVIDIA Digits Deep Learning platform, Caffe, Tensor Flow, Keras and mxnet
Big Data: Apache Spark 1.6/ 2.1 (Scala/Python/R), Spark ML Pipeline, Apache Hive, Apache Drill, IBM BigInsights, and IBM Big SQL
Data Analysis & ML Libraries: Pandas, Scikit-learn, Apache Spark ML library, GraphLab Create
Data Science Environment: Jupyter Notebook and RStudio
Database: Oracle 11g SQL, MySQL, SAP Hana
Public Cloud Service: Amazon AWS and Microsoft Azure
Business Analytics: SAP Predictive Analytics, Tableau 9.0
Web Programming: PHP programming, HTML 5, and CSS
