- Certified Data Scientist with over 1+ years of experience in Data Science with Artificial Intelligence, Machine Learning, Deep Learning, Data Mining, Data Analytics, Data Visualization.
- Experience in Design, model, validate and test statistical algorithms using Python and R, against various real - world data sets including behavioral data.
- Develop, build, test analytics applications using iterative and agile-like development processes or practices such as test-driven development, continuous integration.
- Working experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Decision Trees, K-Means Clustering and Association Rules.
- Experience with analyzing online user behavior, Conversion Data (A/B Testing) and customer journeys.
- Experience using technology to work efficiently with datasets such as scripting, data cleansing tools, statistical software packages.
- Knowledge of writing Packages, Stored Procedures, Functions, Views using SQL.
- Working experience of statistical analysis using R, MATLAB and Excel.
- Proficient in the integration of various data sources with multiple relational databases like Oracle/, MS SQL Server, Flat Files into the staging area.
- Good Knowledge in implementing deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
- Good experience in Text mining to transposing words and phrases in unstructured data into numerical values.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Good knowledge in statistics, mathematics, machine learning, recommendation algorithms and analytics with excellent understanding of business operations and analytics tools for effective analysis of data.
Programming & Scripting Languages: R, Python.
Database: SQL, MS Excel, Oracle.
Tools: TensorFlow, Keras.
Development Tools: R Studio, MS Office, Notepad++, MS Excel.
Techniques: Machine learning, Regression, Clustering, Data mining Text mining.
Confidential, Sunnyvale, CA
Jr. Data Scientist
- Developed applications of Machine Learning, Statistical Analysis and Data Visualizations with data Processing problems in sustainability and finance domain.
- Compiled data from various sources public and private databases to perform complex analysis and data manipulation for actionable results.
- Unearthed the raw data by doing the Exploratory Data Analysis (Classification, splitting, cross-validation).
- Used predictive modeling with tools using Python.
- Used NLP methods for information extraction, topic modeling, parsing, and relationship extraction.
- Worked with NLTK library for NLP data processing and finding the patterns.
- Worked on NLP and ML techniques to analyze Twitter feeds, streaming news to determine the product reviews.
- Worked on development of data warehouse and ETL systems using relational and non-relational tools like SQL.
- Built and analyzed datasets using R, MATLAB and Python.
- Worked on bootstrapping methods such as Decision Tree and Random Forests to reduce the variance.
- Developed visualizations and dashboards using ggplot.
- Applied linear regression in Python to understand the relationship between different attributes of dataset and causal relationship between them.
- Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings through use of comparison, T-test, F-test, R-squared, P-value etc.
- Interfaced with large scale database system through an ETL server for data extraction and preparation.
- Utilized various techniques like Histogram, Bar plot, Pie-Chart, Scatter plot, Box plot to determine the condition of the data.
- Created pivot tables and charts using worksheet data and external resources, modified pivot tables, sorted items and group data, and refreshed and formatted pivot tables.
- Administered user, user groups, and scheduled instances for reports in Tableau.
- Converted metric insight reports to tableau reports.
- Used VLOOKUP to match source and destination address of the user data.
- Wrote SQL queries for Data Manipulation.
- Applied clustering algorithms like K-means and Hierarchical with help of Scikit and Scipy.
- Created pivot tables and ran VLOOKUP's in MS Excel as a part of data validation.
- Application of various machine learning algorithms and statistical modeling like decision trees, regression models, SVM, clustering to identify volume using scikit-learn package in Python.
- Extract random samples and perform comparison on the measurements conducted on samples of the dataset.
- Achieved 50% cost savings, advanced commercial product development by building and optimizing machine learning models using XGboost, TensorFlow and Keras.
Environment: Machine learning, HDFS, Linux, Python (Scikit-Learn/Scipy/NumPy/Pandas), R, SQL, MS Excel.
Confidential, Santa Monica, CA
Jr. Data Scientist
- Statistical Modeling to drive values from customer data, avoid churn.
- Prepared regular data reports by collecting samples of data sets using Excel spreadsheets.
- Cleaned data by analyzing and eliminating duplicate and inaccurate data outliers using R.
- Compared data with source documents and re-entered data in verification format to detect errors.
- Generated reports by running SQL queries against current databases to conduct data analysis.
- Evaluated and optimized performance of models tuned parameters with K-Fold Cross Validation.
- Analyzing transaction data to cluster users into segments and develop different marketing strategies for each cluster.