Data Scientist Resume
Sunnyvale, CA
PROFESSIONAL SUMMARY:
- 10+ years of software industry experience as Data Scientist in Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Exploration, Feature Engineering, Predictive modeling, Data Visualization and Algorithm Development
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.
- Developed predictive models using Decision Tree, Random Forest, Naive Bayes, Logistic Regression, Cluster Analysis, Neural Networks, and ensemble methods like bagging, boosting to improve the efficiency of the predictive model and good knowledge on Recommender Systems.
- Worked on NLP, Text Mining and sentiment analysis for extracting the unstructured data from various social Media platforms like Facebook, Twitter and Reddit
- Skilled in Advanced Regression Modelling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts
- Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
- Well adapted in Statistical Programming Languages and adept at writing code in R, Python and cloud platform as AWS ML and Azure ML
- Extensively worked on using major statistical analysis tools such as R, SQL, Python, Advanced Excel, and MATLAB
- Experience in designing stunning visualizations using Tableau and ggplot2 for publishing and presenting dashboards, Storyline on web and desktop platforms.
- Strong SQL programming skills, with experience in working with functions, packages, triggers and stored - procedures
- Skilled in using dplyr and pandas in R and Python for performing Exploratory data analysis.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Quick learner of various new technical concepts in machine learning/data science/deep learning field
- Excellent track record in delivering quality software on time to meet the business priorities (4 s)
SKILLS:
Python (7+ years): Pandas, Numpy, Scikit-learn, data cleaning and imputation, creating machine learning pipelines, model selection and evaluation
SQL (7+ years): Designing and querying Relational Databases, Tables, Relations, Joins, Grouping, Stored Procedures, Functions, Triggers and Indexes Using PostgreSQL, SQLite and pgAdmin3
Visualization and Reporting (5+ years): Matplotlib, Tableau, ggplot2, dplyr, tidyR
Business Analytics (3+ years): Google Analytics, Google Adwords, Jira
Machine Learning (6+ years): Prediction with Lasso, Ridge, Linear and Logistic, classification with KNN, Decision Trees, Random Forest and gradient descent, clustering and time series analysis, CNN, RNN, LSTM, GRU
Model and Feature Evaluation (4+ years): ROC, cross-validation, bootstrapping, PCA, grid-search, A/B split testing
Deep Learning (1+ years): Familiar with TensorFlow, Distributed machine learning and running models on GPUs
Data Engineering (15+ years): SSIS, ETL, DTS
Software Development Life Cycle (8+ years): Natural Language Processing and Topic Modeling (4+ years) Sentiment analysis, TF-IDF, NLTK
TECHNICAL SKILLS:
Programming Languages: Python (scikit-learn, pandas, numpy, scipy), R, Java, C++, C, SAS, R, SQL
Softwares/Tools: PostgreSQL, LIBSVM, ggplot, dplyr, weka-tool
Cloud Services: AWS S3, EC2, Lambda, DynamoDB, ElastiCache, RDS, SNS, CloudWatch
Applications: Tableau, R Studio, Matlab, MS Excel
Data Visualization: R, Python, Weka, Azure ML, Tableau
Machine learning Algorithms: Classification, KNN, Regression, Random Forest, Clustering(K-means), Neural Nets, SVM, Bayesian Algorithm, Social Media Analytics, Sentimental analysis, Market Base Analysis, Bagging, Boosting
Domain Knowledge: Banking, Finance, Insurance, Healthcare, Energy
PROFESSIONAL EXPERIENCE:
Confidential, Sunnyvale, CA
Data Scientist
Responsibilities:
- Developed an asynchronous event based microservices based system for Confidential that can serve millions of sports enthusiastic fans
- Worked on data mining, data cleansing and transformation of game data prior to the building machine learning models
- Operationalizing machine learning models and ad hoc analysis in R using micro services
- Built number of intelligent features powered by advanced data-analytics in very short period of time
- Designed effective analytical approach to streamline game analysis to significantly cut-down the computation time by 50%
- Worked closely with CTO on data modeling, Confidential 's data pipeline and data analytics
- Scaled Confidential platform to process millions of game data events in an AWS environment using EC2, S3, lambda functions, SNS, etc.
- Big data analytics with Spark, Kafka, Hadoop, Hive and Scala functional programming
- Applications of machine learning algorithms, including random forest and boosted tree, SVM, neural network, and deep learning using CNTK and Tensorflow
- Performed Data preparation on a High dimensional (Big data with large volume and variety) Data sample collected from the live customer data.
- Hands on deep learning packages and libraries: Caffe, Tensorflow, Theano, Keras, numpy etc
- Implemented a deep convolutional neural network (CNN) to identify images and also familiar with RNN, LSTM and GRU
Environment: R, Python, Machine Learning, SQL, AWS, Postgres, Tableau, Data Mining, TensorFlow, Hadoop, Spark, Scala, MapReduce, A/B Testing, Caffe, Azure ML, CNN, RNN, LSTM, GRU
Confidential, Sunnyvale, CA
Senior Software Engineer
Responsibilities:
- Played key role in adding machine learning intelligence to Confidential network and security products
- Designed and built machine learning solutions for traffic prediction and outlier detection
- Only Employee from Confidential APAC to give a tech talk for Connect' 15 at Confidential HQ
- Technology and acquisition consultant to senior executives who looks into core Machine Learning companies
- Collaborated with CTO and chief architect on Machine Learning architecture that enables dynamic security rules
- Exploratory data analysis and report generation using python visualisation libraries (Seaborn, Matplotlib) which led to streamlining the sales process and finding the issues, reduction of cost and increase of revenue
- Mentored and trained both entry level and mid-level career employees
Environment: R, Python, Machine Learning, AWS, Azure ML, SQL, Postgres, Tableau, Data Mining, Spark, Tensor Flow, Caffe
Confidential
Software Development Engineer
Responsibilities:
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Responsible for design and development of advanced R/Python programs to transfrom dataset in preparation for modeling.
- Provided R/SQL programming, with detailed direction, in the execution of data analysis that contributed to the final project deliverables. Responsible for data mining.
- Worked in a team of programmers and data analysts to develop insightful deliverables that support data-driven marketing strategies.
- Retrieving data from database through SQL as per business requirements.
- Experience on coding in Java, C, C++ programming languages.
- Manipulation of Data using BASE SAS Programming
Environment: Java, Python, Data Mining, Machine Learning, Matlab, SQL, R, Postgres, C++, C
Confidential
Data Analyst
Responsibilities:
- Web crawling and text mining techniques to score referral domains, generate keyword taxonomies, and assess commercial value of bid keywords.
- Developed new hybrid statistical and data mining technique known as hidden decision trees and hidden forests
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Scraped, merged and cleaned data from different websites for events and business opportunities
Environment: Informatica 9.0, Java, Python, R, ODS, OLTP, Oracle 10g, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, PL/SQL
