- Over 7 years of experience in IT with expertise in design and development and implementation of IT solutions.
- Over 6 years of experience understanding business requirements from client and translate it to appropriate Data Analytics use cases
- Over 4 years of experience as a data scientist/ experience building AI/ML pipelines.
- Strong Understanding and experience in Data structures, Algorithms, and OOPs.
- Experienced in data acquisition with MS SQL Server, MySQL, PostgreSQL and PL/SQL. Creating, configuring and reshaping tables using advanced SQL queries + data scraping with BeautifulSoup, Requests, urllib, HTML, Scrapy
- Experienced in data cleaning, handling missing values, reshaping and implementing exploratory data analysis with advanced data mining and Natural Language Processing techniques; Accessed data through API/RESTful services
- Experienced in using the analysis tools of Python - (Pandas, Numpy, Matplotlib, Seaborn, Bokeh, Sci-kit, SciPy, NLTK, OpenCV, Natural Language Processing, Deep Learning, Keras, TensorFlow)
- Experienced in predictive analysis with linear, polynomial, logistic regression, SVM, tree-based, and boosting models
- Experienced in clustering with K-means to identify outliers and to classify unlabeled data and Classification Trees (PCA, LSA)
- Experienced in creating Machine Learning systems with neural networks (DNN, CNN, RNN)
- Proficient to handle bias (under fitting) and high variance (over fitting)
- Experienced in Recommender Systems (Collaborative/ Content Based/ Hybrid)
- Experienced in evaluating models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection.
- Experienced in building data pipelines on Python
- Experienced in preparing and presenting daily, weekly and monthly and on-demand reports
- Ability to handle multiple tasks and responsibilities independently as well as a proactive team member/leader
- Developed hypothesis statements and applied statistical testing
- Experienced in developing Adobe Analytics reports, tagging strategy as well as creating Splunk reports to measure response times/load times…
- Developed Flask Restful APIs with Sqlite/ Sql Alchmey/ Postgre SQL
- Hands on experience on Tableau with MS SQL Server, Cloud BigQuery, Amazon Redshift etc.
- Experienced in implementing Bayesian and Monte Carlo methods to simulate multiple scenarios.
- Experienced in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera and Horton works
- Good knowledge in NoSQL databases - Hbase, Cassandra & MongoDB, database performance tuning & data modeling.
- Familiar with the tools in Hadoop Ecosystem including Hive, Pig, HDFS, MapReduce, Sqoop, Storm, Spark, SparkSQL, SparkML, Kafka, Yarn, Oozie, and Zookeeper.
Tools: Numpy, Pandas, Matplotlib, Seaborn, Bokeh, SciPy, Sci - kit, Statmodels, Keras, Tensorflow, NLTKSQL Alchemy, MS SQL Server, MySQL, PostgreSQL, NOSQL, Adobe Analytics, Splunk, MS Power BI, Tableau, AWS, Glue, Athena, Sagemaker, Redshift, MS Azure, Google Cloud Services, BigQuery, Deep Learning (DNN, CNN, RNNs), H2O
Big Data: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie
Databases: Microsoft SQL Server, MySQL, PostgreSQL, Oracle, Tera data
NO SQL Databases: HBase, Cassandra, Mongo DB
Business Intelligence Tools: Tableau, Splunk, Micro Strategy, Qlikview, SAP Business Intelligence
AWS: EC2, S3, EBS, VPC, ELB, AMI, SNS, RDS, EMR, IAM, Redshift, Route53, Sagemaker, Lambda
Development Methodologies: Agile/Scrum, Waterfall, Design Patterns Basic level of Devops - Docker …
Data Scientist | Data Engineer
- Built Text Classification model (Google Universal Sentence Encoder, Word2Vec, Glove, Fast Text, Bert, NLTK, Spacy, RNN, CNN models; multi class classification, received %91 accuracy on balanced data set.
- Developed a wrapper script to automatically created AWS RDS, CloudWatch Alarms and Dashbord via Jenkins and CloudFormation template (once user requests a DB cluster, pushes the parameters into GitHub; then script on Jenkins gets triggered….)
- Worked on DB optimization; improved product model performance %50 percent and observed it on Airflow; Hive DDLs, array, map and struct data structures….
- Resolved Tableau Athena/Composite DB data extract refresh problem; updated all data sources in a day, and re-published workbook on the server; created dashboards on new data…
- Worked on BI re-platforming; provided a report on Tableau, PowerBI, Domo, Apache Superset, Qlikview, Qliksense, Microstrategy…
Data Scientist | Data Engineer
- Built Customer Segmentation (K-means, K-means++, DBSCAN), Customer Churn (based on an understanding of lack of Order Placement in last 6 months) and CLTV algorithms on Sales Data.
- Worked directly with clients to understand the requirements, both functional and non-functional, and provided best approach and technical design as per business rules.
- Earned rich experience on a variety of ETL and Business Intelligence tools like SAP Magellan, SAP BO, Micro Strategy, Tableau, PowerBI, Python Visualization Tools while developing advanced config based, parameter driven generic ETL and reporting applications for clients, along with complex Linux, Unix, Bash, Shell and SQL scripting.
- Developed various BI reports, Dashboards and visualizations using Tableau and PowerBI.
- Converted business requirements into functional and technical documents for the team to understand.
- Liaised between business and development team and conducted user trainings.
- Experienced in eCommerce and web technology development practices, technology, and tooling system flow analysis, debug of web logs, and web client troubleshooting tools.
- Worked with the Testing/QA team to debug/test the web analytics tags.
- Developed and maintained tagging strategy, performed code modifications, testing, and deployment when appropriate
- Perform testing used to built-in plug-in and 3rd party debugging tools such as Adobe debugger
- Developed Splunk Search Processing Language (SPL) queries, created Reports, Alerts and Dashboards for Business Activity Monitoring, Enterprise Architecture and customized them.
- Experienced in Performance Engineering and testing using Adobe Analytics and Splunk.
- Started car accident risk prediction modeling, collected some data on weather (wind speed, visibility, rainy/snowy/foggy/ice, snow depth, rain depth, temperature….), time (hour of the day, day of the week, month of the year, solar azimuth or angle (angle of sun’s position), static features such as speed limit, road curviness, average traffic volumes, proximity to intersections, road width, road surface, human factors such as distractions like billboards, or population density
- Created negative observations via random sampling
- Created flow and event data reports and dashboards, detected patterns and anomalies on the data by visualizing and presenting charts to FDOT D4, and provided solutions to overcome these.
- Strong grasp of data driven systems, processes and algorithms
- Collected requirements from business users, and designed report models to meet business requirements.
- Maintained and developed complex SQL queries, stored procedures, triggers, clustered index & non-clustered index, views, and functions that meet user requirements
- Developed Exploratory Data Analysis and Machine Learning Tools to provide industrial solutions at marketing.
- Implemented projects in Python language by using linear regression, logistic regression, supervised learning and other advanced data analysis techniques such as clustering, principle component analysis, time series analysis, spatial analysis.
- Compared performance.
Data Scientist | Operations Research Analyst | Logistics Planner
- Analyzed statistical data and reports concerning personnel-related data such as hires, transfers, performance appraisals, logistics related data such as logistic support of HKIA in Afghanistan.
- Coordinated seamless logistics support of Hamid Karzai International airport (Resolute Support Mission).
Data Analyst | Operations Researcher
- Assumed various projects in relation with Data Analysis, Optimization and Simulation & Modeling in coordination with the allied nations’ military personnel.
- Experienced in solving logistic/inventory/transportation problems related to naval operations, optimized these problems by implementing linear/non-linear models in Excel’s Open Solver, Macros.
- Prepared and presented weekly, monthly and on-demand reports to the decision makers.
- Business problem was to analyze the database that lists purchases made by about 27000 customers over a period of one year .
- Based on this analysis, I generated the customer segmentations; helped the team concentrate on right campaign terms for each segmentation, and also figured out the Lifetime Value of a Customer. Dataset included over 16 M rows and 20 features.
- Created cohorts on month basis, and took the active customers in these cohorts
- Converted this table into a customer retention table; analyzed this data on heatmap
- Created Recency, Frequency and Monetary value metrics; then created RFM Segment and RFM Score Features
- Applied K-Means, DBSCAN Algorithms, used Elbow criterion and silhouette coefficient methods to decide the number of segments
- Visualized and analyzed the segments on a Snake Plot
- Acquired more than 40,000 live Twitter data with the development API using Python packages
- Preprocessed and normalize the text data using methods such as stemming and tokenizing
- Conducted the sentiment analysis on the tweets in Python and determined the impact on users
- Used techniques such as bag of words and TF-IDF/ word2vec, glove, fasttext… vectorizing for feature engineering
- Improved the model with machine learning algorithms, and applied RNN Model
- Compared their performances in terms of precision, recall and response time.
- Combined pipeline flow with Grid Search method to obtain a smooth flow of implementation and optimum parameters to select the best algorithm to apply
- Conducted advanced queries and fetched the data from data base
- Data cleaning, feature reformatting, scaling.
- Applied advanced visualization techniques to Used Vehicle data for analysis purposes as well as applying One Hot Encoding on non-numerical features to convert them into categorical values.
- Auto-ML/CatBoost/Linear Regression/Ridge/Lasso/Elastic Net/Decision Tree/Random Forest Models, TPOT, H2O (0.92 R2 with Auto-ML, 0.90 R2 with H2O, 0.88 R2 with Catboost (Deployed Model)).
- Data cleaning, feature reformatting and advanced visualization of imbalanced data.
- Applied dummy variable creation for the non-numerical values in the data set.
- Applied machine learning algorithms, Logistic Regression, Random Forest Classifier with hyper parameter tuning.
- Applied Synthetic Minority Oversampling Technique for the imbalanced data and applying the same models (%86 Recall).
- Data cleaning, feature reformatting, scaling, mean normalization
- Applied advanced visualization techniques to Preowned Vehicle data for analysis purposes as well as applying One Hot Encoding on non-numerical features to convert them into categorical values
- Linear Regression/Ridge/Lasso/Elastic Net/Decision Tree/Random Forest Models (%90 R-Squared for RFM)
- Applied the same model on Rapid Miner, Compared my results with the ones on Rapid Miner
- Eliminated the trend and seasonality in no-stationary time series
- Built and evaluated ARMA, ARIMA, and SARIMA models to model the time dependence
- Compared 3 models using ACF plot and diagnostic plots for the residual and performed both future prediction and uncertainty analysis for the forecasts with the optimal model