- A data geek with 6 years of a total experience in the field of data science with areas of expertise in machine learning, data analytics, text analytics, business analysis with strong command in creating visualizations on Tableau and Shiny .
- Professional working experience in Machine Learning algorithms such as linear regression, segmentation and modeling, logistic regression, Naïve Bayes, Decision Trees, Clustering, and XG Boost.
- Experience with statistical data analysis such as linear models, multivariate analysis, Statistical Analysis, Data Mining and Machine Learning Skills.
- Experience in using python libraries like Numpy, SciPy, Pandas, Matplotlib, Scikit - learn, BeautifulSoup, google, SparkContext, TensorFlow, Bokeh.
- Experience in data mining using Spark, Hive SQL.
- Strong understanding of how analytics supports a large organization including being able to successfully articulate the linkage between business objectives, analytical approaches &findings and business decisions.
- Experience with analyzing online user behavior, Conversion Data(A/B Testing) and customer journeys, funnel analysis and Recommender Systems.
- Hands on experience in developing customer scorecards and business dashboards unsing Shiny in R and AWS QuickSight.
- Professional working experience in Machine Learning algorithms such as linear regression, segmentation and modelling, logistic regression, Naive Bayes, Decision Trees, Clustering, and XG Boost. Excellent analytical skills with demonstrated ability to solve problems.
- Ability to work with large transactional databases across multiple platforms (Teradata, Oracle, HDFS, SAS).
- Hands on experience in integrating APIs using REST ful methods.
- Experience using technology to work efficiently with datasets such as scripting, data cleansing tools, statistical software packages using python(3.5/2.7)
- Deep understanding of Software Development Life Cycle (SDLC) as well as Agile/Scrum methodology to accelerate Software Development iteration.
- Hands on experience in writing queries in SQL and R to extract, transform and load (ETL) data from large datasets using Data Staging.
Programming skills: Python ((SciPy, NumPy, Pandas, Scikit-learn, TensorFlow, Flask, Matplotlib, Bokeh, Jupyter notebook), C, R, Shell Scripting, SQL, Shiny in R, TSQL, JSON, Object Oriented Programming (Python) SAS programming.
Machine Learning: Supervised learning, Unsupervised learning, Reinforcement learningText Analytics, Sentiment Analysis, Pricipal Component Analysis (PCA) and Dimensionality Reduction, Feature Engineering, Neural networks and Deep learning, Amazon AWS(Machine learning).
Math and Statistics: Inferential Statistics, Time Series Analysis, Differential Equations, Intermediate Probability, Stochastic Calculus
Tools: and Technologies MATLAB, Minitab, Microsoft Office Suite, Eclipse, R studio, Tableau, Spotfire, Hadoop, MapReduce, HDFS, Apache Spark and Hive, Scoop,Eclipse, Atom, AWS CLT, D3, AWS Suite, ETL tools.
- Developed scripts to extract data from various file formats and applied data transformations to business requirements and loaded the data in database.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and worked on creating Hive tables and written Hive queries for data analysis.
- Created an internal unified database in SQL from various third-party APIs including content analytics and Once Click Retail. Performed data ingestion into HDFS from SQL using SQOOP and PySpark.
- Updated company data warehousing techniques such as data recall and segmentation, resulting in a 20% increase in usability for non-technical staff members
- Developed the social media monitoring tool to collect the social media data, perform sentiment analysis on the fly and translate the consumer insights into business insights.
- Developed price elasticity and cross price elasticity models on discoverability and pricing data using multinomial Logistic Regression models to govern prices for ~$20 MM in revenue, risk assessment and product assortment.
- Built the dynamic business dashboards in Shiny to include the omnichannel scorecards including text analytics on product reviews from Amazon, Walmart and Chewy.
- Actively participated in the collaboration with retailers on cost eliminating strategies, such as drop ship, endless aisle, vendor flex, and efficient data exchanges
- Applying advanced statistical and predictive modeling techniques to inform real-time results and deploy on-the-fly recommendations about consumer behavior and search.
- Implemented machine learning model (logistic regression, XGboost) with Python Scikit- learn
- Developed ALEXA skills for two brands including the voice ordering. Performed sentiment analysis on customer questions to give the business insights .
- Worked on miss value imputation, outlier’s identification with statistical methodologies using Pandas, Numpy libraries.
- Working on Image Analysis on product images to score the probabilities of customer likes using Image and Customer reviews data.
- Working on classification models on Hadoop Cluster to compare and identify the best model which predicts the loyalty algorithm with a higher accuracy.
- Utilize feature reduction and Supervised Learning to build the models using Apache Spark and scikit Machine Learning Libraries.
Environments : Python (Scikit-learn, pandas, Numpy), Machine Learning (logistic regression, XGboost), Gradient Descent algorithm, Bayesian optimization, Tableau, Hadoop, spark.
Confidential, Kansas City-MO
Data Science Intern
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Worked on creating the Entity resolution tables with ODM (on-Device Management) data for business teams using AWS (EMR and S3), Python and Hive.
- Modeled user behavior based on clickstream data over 30 million devices and developed a recommendation algorithm in Python to automatically push apps to the user’s mobile to improve CTR (click through rate) and drive more sessions.
- Engaged in the development of the CTR (Click Through Rate) predictions as part of the recommendation engine. Achieved a conversion rate of 13% and net profit of 750k $.
- Actively participated in the PoC implementation of accurate location prediction (known as Home Prediction) using the cellular data, triangulation method and geo-tagging using Python (Scikit-learn, SparkSQL and HiveSQL)
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop Implemented a Python-based distributed random forest via Python streaming.
- Used Pandas, Numpy, seaborn, SciPy, Matplotlib, Scikit-learn Python libraries for developing various machine learning algorithms.
- Utilized machine learning algorithms on AWS EC2 using Spark MLLib.
- Designed and implemented Statistical models, Predictive models, metadata solution and data life cycle management in both RDBMS, Big Data environments.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Provided insights that impact data design and improve the quality of software.
Environment: Python, SQL, SQL Server, Informatica, SSRS, PL/SQL, T-SQL, Tableau, Spark MLlib, regression, Cluster analysis, Spark, Kafka, AWS S3, AWS EC2, logistic regression, Hadoop, Hive, Random Forest, OLAP, HDFS, SVM, JSON, Tableau, XML, Cassandra, MapReduce.
Data Science Intern
- Created SQL code to identify, analyze, and interpret trends in large datasets; retrieve data from SQL to clean up data systems
- Collaborated with multiple business teams to create dashboards used for individual team needs
- Participated in all phases of data mining, data collection, data cleaning, developing model, validation, visualization and Client demonstration of Patient Finder application (micro application).
- Implemented q-queries for extracting data from KDB+ (database) to JSON format for python use
- Implemented the core ML classification algorithms including Logistic regression, XGBoost, Random Forest using Scikit-learn on 8TB healthcare high dimension data. Validated the performance using cross-validation. Achieved accuracy of 38% on highly imbalance data (1:100000)
- Developed micro-app web visualization using R-Shiny and D3.js with active filters. Acheived a NPS (Net Promotor Score) score of 9.5/10 from the customer (Alexion)
- Developed an internal web-scraper tool for inspection of ad-hosting on websites using google, URLLib, Beautiful Soup packages in python
- Developed a jupyter notebook for analyzing text, popularity and sentiment of the population visiting a website using the comments (web-scraper tool and HTML). POC of clustering the websites has been developed.
Environment: Python (3.4/2.7), SQL, R, Shiny dashboard, Statistical Modelling, predictive algorithms, kDB, q queries.
Data Science Engineer
- Performed market analysis to design the strategy against other products, increasing sales by 24% for the Op-Flex package.
- Data Cleaning: Created complex formulas and conditions to clean raw sensor data, applied V-lookups and Index-match functions to merge and filter information from various data sets for reporting and statistical gathering purposes
- Modeling: Designed and developed a new statistical and machine-learning model and feature extraction system for power plants using MatLab and C++. Tried different algorithms including GBMs, Linear Regression for predicting the plant output using sensor data like pressure, temperature, etc..
- Dashboard: Proposed and developed new visualization techniques for enriched client experience on a web-based VI platform using Java and VBScript. Integrated to iOS devices as an independent application.
- Analyzed internal processes for efficiency, designed an automation tool for auto-tune software using Java, accounting for a cost reduction of 30,000 $/year and reduced the touch time of the project by 300 man hours.
- Prepared various statistical data analysis reports for corporate clients, provided ad-hoc updates and modified report templates as required
- Successfully completed training on in-house databases, extracted data from various data sources, applied Excel functions to transform raw data in business information required for reporting and data analysis purposes
- Created complex formulas and conditions to clean raw data, applied V-lookups and Index-match functions to merge and filter information from various data sets for reporting and statistical gathering purposes
- Efficiently used PivotTables and Lists to summarize required statistics, helped decision making to review progress.
Environment: SQL, C++, OOPS, VBScript, MatLab, Excel, Regression, MiniTab, Tableau, SQL databases.