- Being a Passionate Data enthusiast and having around 6+ years of Professional experience in Data Science and Analytics including Data Mining, Machine Learning, Statistical Analysis and Data Visualization with large sets of both Structured and Unstructured Data.
- Experience in feature extraction, creating Regression models, Classification, Predictive data modeling and Cluster analysis.
- Strong experience in implementing various Machine Learning Algorithms like Linear Regression, Logistic Regression, Random Forest, Support Vector Machines (SVM), Naive Bayes, K - Nearest Neighbor.
- Extensive experience in providing Machine Learning and Data Mining solutions to various business problems based on requirements using Python.
- Proficient in data manipulation for data loading and extraction and worked with different Python libraries like Pandas, NumPy, Scikit-learn, Seaborn, Scipy for data analysis.
- As a Data scientist actively involved in all phases of project life cycle including Data Extraction, Data Cleaning, Data Visualization and building models.
- Strong experience in Software Development Life Cycle (SDLC) including Requirement Analysis, Design Specification and Testing in both Waterfall and Agile methodologies.
- Experienced in Data Integration, Validation and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
- Handsome knowledge on building models and Natural Language Processing (NLP’s) with deep learning frameworks like Tensor Flow, PyTorch and Keras.
- Proficient Mathematical knowledge on Matrix Operations, Statistics, Probability, Linear Algebra and Geometry
- Worked with various data visualization tools of python like Matplotlib, Seaborn, ggplot and pygal
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau
- Hands on Experience in using GIT Version Control System.
- Proficient with excellent initiative and innovative thinking skills and ability to guide teammates to breakdown large and complex issues to simplified versions for easy execution.
SDLC, Agile, Scrum, Python, SQL, SQL Server, MySQL, Machine Learning, Deep Learning, Matplotlib, Scipy, Numpy, Pandas, MS Visio, Hadoop, MapReduce, HDFS, Spark, SSAS, SSIS, SSRS, Tableau, TensorFlow, PyTorch, Keras, AWS, Windows, Linux
- Performed Customer segmentation based on customers behavior, demographics, transactions by using customer specific details like age, income and created multiple customer classes.
- Analyzed the customers purchase data and product trends to recommend the types of products for customers based on their behavior tracked through customer accounts.
- Explored and created different new data sets to work with and implement few data science work flow platforms for future applications.
- Constructed customer classes with historical, demographic and behavioral data as features using Random Forest Classifier and Logistic Regression to help marketing team understand purchase pattern of customers.
- Predicted sales and profits using machine learning and deep learning strategies.
- Assisted marketing team to devise business strategy to target customers with discount coupons, deals and offers to improve customer purchases and maintaining stock at stores.
- Communicated with management to discuss insights obtained from data, assisted in making best business decisions and reduced customer churn by 15% in few months of implementation by extracting value from data.
- Applying clustering algorithms like partitioning clustering, fuzzy clustering, density-based clustering methods to group the data on their similar behavior patterns.
- Identified distinct patterns in which customers respond to offers and clustered their actions using K-means, Hierarchical Clustering and segmented them into different groups, helped marketing team to further analyze behavioral patterns of customers.
- Created Customer Lifetime Value (CLV) from the customers data by using Multi-Linear Regression algorithm, identified high and low value segments and helped organization to understand customers and improve customer service to retain customers.
- Performed personal and food sales Predictive Modeling by using decision trees and regressions in order to get the risk involved by giving individual scores to the customers.
- Proposed marketing strategies to target potential customers using their first three months data and from regression model, we evaluated CLV for every new customer.
- Investigated large datasets to handle missing values, cleaned messy datasets and applied feature scaling to standardize range of independent variables.
- Improved model performance by tuning hyper-parameters using optimization techniques like Grid search, Random search and Bayesian optimization and increased model efficiency by XG-Boosting
- Validated models using Cross validation, loss function to measure model performance and created Confusion Matrix, Receiver Operating Characteristic (ROC) and Cumulative Accuracy Profile (CAP) curves. Addressed over-fitting and under-fitting by tuning hyper parameters using L1 and L2 Regularization
- Applied dimensionality reduction technique like Principal Component Analysis (PCA) to extract relevant optimal features from high dimensional data.
- Visualized results using Matplotlib, Seaborn libraries of scikit-learn and used Tableau to present results on dashboards for team members, Management and other relevant departments in company.
- Forecast the company’s short-term and long-term growth in terms of revenue, number of customers, various costs, stock changes etc., using machine learning algorithms.
- Developed predictive solutions to support online shopping using machine learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machine in Python.
- Worked on data cleaning, data preparation and feature engineering with Python, including NumPy, SciPy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
- Responsible for data identification, collection, exploration, cleaning for appropriate modeling.
- Worked on NLTK library in python for doing sentiment analysis on customer product reviews and other third-party websites using web scrapping.
- Performed sentiment analysis of customer reviews and classified each review into good, bad and neutral class to understand pulse of customers about business.
- Implemented Time Series analysis on sales data to consider what measures to be taken for improve the Sales.
- Used MySQL and created SQL tables and involved in data loading and writing SQL UDFs.
- Conducted analysis in assessing customer behaviors with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
- Evaluated parameters with K-Fold Cross Validation, Grid search methods to optimize performance of models.
- Along with data analytics and Excel data extracts, Implemented Agile Methodologies, Scrum stories and sprints in a Python based environment.
- Worked on .csv, .json, .excel different types of files for the data cleaning and data analysis.
- Performed Time Series Analysis on animal medicine and vaccine product sales data in order to extract meaningful statistics and other characteristics of the data to predict future values based on previously observed values.
- Worked in Tableau environment to create weekly, monthly, daily reports using tableau desktop & publish them to server.
- Worked on Excel using pivots, conditional formatting, large record sets, data manipulation and cleaning.
- Used GIT HUB as version control software to manage the source code and to keep track of changes to files which is fast and light weight system.
- Analyzed the data using various machine learning algorithms to segregate all transactions made by customers depending on the amount and total transactions.
- Extracted Tera bytes of both structured and unstructured data by using SQL queries and performed data mining tasks including handling missing data, data wrangling, feature scaling.
- Developed an easy to use documentation for the frameworks and tools developed for adaption by other teams.
- Implemented Porter Stemmer (Natural Language Tool Kit) with NLP bag of words model using Count Vectorizer class to process text data.
- Created predictive model using LSTM, Recurrent Neural Networks (RNNs) and studied reviews, obtained feedback on customer service to help employer reduce customer churn.
- Experimented with other classification models like Random Forests, Logistic Regression and Naïve Bayes to classify customers reviews.
- Extracted data from web using Web Scraping, Text mining and processes data into tab separated files.
- Automated customer service by creating chat box which responds to customer queries using deep learning and text processing with NLTK of NLP library.
- Evaluated model performance by creating confusion matrix, classification report and accuracy score. Improved model performance by k-fold cross validation and XG-Boosting and achieved model accuracy of 92%.
- Built machine learning algorithms to forecast the company’s short term and long-term growth in terms of revenue, number of customers, stock changes and other.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, meta data solution and data life-cycle management in both RDBMS, Big Data environments.
- Presented simple visualization of results using seaborn visualization libraries of Python.
- Used python for statistical operations on the data and seaborn, ggplot for visualizing the data regarding the sales and customers.
- Acquired data from primary or secondary data sources and maintain databases/data systems.
- Established new client data preparing them for entry into new platform.
- Loaded data by converting CSV file into corresponding database tables.
- Worked with management team to create prioritized list of needs for each business segment.
- Monitored and resolved issues of data flow on daily basis. Also created views for reporting team to use data for marketing numbers on daily basis.
- Collaborated with reporting team to resolve data discrepancies and logical data corrections which are occurring throughout reports.
- Generated Tableau ad-hoc reports using excel sheet, flat files, CSV files.
- Designed, built, and implemented relational databases
- Used data mining techniques for outlier detection and created algorithm to connect patterns between customer trends.
- Created Software solutions in Software development lifecycle (SDLC) and Agile methodologies environment.
- Performed computational tasks on data by creating pig, hive and Map reduce scripts to access and transform data in HDFS.
- Developed and implemented metadata models for reporting functionalities and developed automated process for data corrections.
- Developed SQL, NoSQL and PL/SQL scripts to extract data from database and for testing Purposes.
- Reviewed logical model with application developers, ETL team, DBAs, and testing team to provide information about data model and business requirements.
- Identified and logged defects if/when test fail, using SQL to narrow down root cause of problem for efficient investigation by development team and log accordingly.
- Used advanced Excel functions to generate spreadsheets and pivot tables.