- Technical professional with an extensive portfolio of projects, passionate for data with global experience around the spectrum in Data Analytics and Data Science for tackling challenging problems with specialization in Machine Learning and Predictive Analytics.
- A motivated problem solver with an aptitude for innovation and solutions development.
- Profound experience as a data scientist with deep understanding of technology trends with expertise in core of complex technologies.
- 7+ years’ Experience with various ingestion techniques to bring into R, Python and Azure ML environment from different big data platforms such as a HDFS, Hadoop
- Hands on with various Data cleansing process like handling missing values by using feature selection such as a replacing by mean, forward or backward fill, removing entire rows or columns or values, removing outliers, normalizing and scaling data
- Visualized data using different visualization tools R, Azure ML and Power BI
- Created predictive model using supervised, unsupervised and ensemble machine learning algorithms
- Hands on experience in implementing classification algorithms such as KNN, Naive Bayes, Decision Trees, Clustering, Linear and Logistic Regression
- Familiar with predictive models using numeric and prediction algorithms like Support vector machines and Neural Networks, and ensemble methods like Bagging, Boosting and Random Forest to improve the efficiency of the predictive model
- Experience in extracting the data for creating value added datasets using Python, R, Azure and SQL to analyze the behavior; to target a specific set of customers and obtain hidden insights within the data to effectively implement project objectives
- Worked on KAGGLE data sets and Microsoft Azure ML predictive models as a part of Data science Boot camp
- Experienced the full software life cycle in SDLC, Agile and Scrum methodologies
- Strong conceptual, analytical, and design skills with leadership qualities. Able to work with a team or individually with excellent communication skills, and ability to meet deadlines in a fast - paced work environment
- Python, R
- PL/SQL, T-SQL
- Azure Blob, Data Lake
- JSON, XML, XHTML
- UNIX Shell Script
- Linear & Logistic Regressions Clustering (K-means)
- Bayesian Algorithm, KNN, Random Forest
- C, C++, C#
- Core Java
- MS SQL Server, MS Excel, Power BI
- ODBC, IDE
- GIT HUB
- Azure ML
- KAGGLE, Tensor flow, Keras
- Neural Networks, SVM, Bagging & Boosting
Confidential, Phoenix, AZ
Sr. Data Scientist / Machine Learning Engineer
- Worked with business users, business analysts, program managers, project managers, system analysts for reviewing business requirements
- Created technical specification documents based on business requirements and collaborated with BA and other software developers to discuss design and architecture
- Good experience of software development in Python and IDEs: pycharm, Jupyter Notebook.
- Developed predictive analytics models using Natural Language toolkit on Python for the utility of scoring essay program and slide of the reader scores of 500,000 Pre-K to twelfth-grade students of the schools in Tennessee State
- Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.
- Used Python to write data into JSON files for testing Student Item level information. Created scripts for data modelling and data import and export
- Performed Data preparation on a High dimensional (Big data with large volume and variety) data sample collected from the students essay tests data using in to Python
- Data preparation Includes Data Mapping of unlined data from various formats, Identifying The missing data, Finding the correlations, scaling and removing the Junk data to further process the data for building a predictive model into Apache Spark
- Data cleaning, pre-processing, imputation, transformation, scaling, feature engineering, data aggregation, merge data frames, descriptive statistics, data visualization, score assessment mapping, reporting on Tableau dashboards
- Worked on Azure databases, the database server is hosted on Azure and use Microsoft credentials to login to the DB rather than the Windows authentication that is typically used
- Used Docker to run the local instance of the application to laptop. Run Docker app in the background to test the application and simultaneously query the local instance of the database in order to see which tables are inserted/created. Used commands such as Docker-compose up
- Built Power Shell scripts to execute Bulk copy protocol (BCP) commands to pull bulk data into database (we process around 150 Million records)
- Closely working with Document Processing and Scoring Services Expert Team to find the rule sets to build a predictive model and performed visualization for getting in depth knowledge and correlation between variables
- Processed data using Python and developed a Predictive model to predict KPI’S (Key performance indicators) such as domain level scores within ranges and Retain Ability
- Conduct exploratory data analysis using Pandas, NumPy, Matplotlib, Scikit-learn, SciPy, and NLTK in Python for developing various machine learning algorithms
- Perform data extraction and manipulation over large relational datasets using SQL, Python, and other analytical tools.
- Closely working with senior Artificial Intelligence Team to create and build a Machine learning layer in the final Product
- Trained Data with Different Classification Models such as Decision Trees, Random forest, Linear & Logistic Regression, KNN models to classify quartiles & predict scores
- Based on over all Statistics, Model Performance and Run Time decided Final Model and achieved accuracy, precision, recall in the range of 75-80 % on average for the validated models
- Calculated statistical thresholds for A/B Tests and routinely collected data for multiple test at a time.
- The program showed improvement in the essay score of 75.7% students from fall 2017 to fall 2018, the scores were improved by equivalent of 0.1 months of schooling for each book they read, 1.4 months if they participated last year as well and by 1.2 months for students with economic disadvantage
- Involved in building database Model, REST APIs and Views utilizing Python, in order to build an interactive web-based solution.
- Used Python libraries and SQL queries/sub queries to create several datasets which produced statistics, tables, figures, charts and graphs.
- Developed, designed, managed dashboard control panel for customers and Administrators using SQL API calls.
- Follow the process of updating and maintaining JIRA support ticket, Project story and its sub-tasks workflow process and communicating with ticket submitter. Maintained a track of all the loads in JIRA.
- Upload detailed documents on process flows, ETL flows, explanation of scripts used for validating data files, DataMart tables in confluence for knowledge sharing and team building.
Confidential, Pittsburgh PA
Data Scientist / Data Analyst
- Participated in requirement gathering and worked closely with the architect in designing and modeling.
- Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash
- Creation of Python scripts for data access and analysis to aid in process and system monitoring, and reporting
- Analyzed data and performed data preparation by applying historical model on the data set in AZURE ML
- Developed predictive models based on demographic, psychographic, econometric and statistical data that deliver insights related to member enrollment
- Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing value
- Under supervision of Sr. Data Scientist performed Data Transformation method for Rescaling and Normalizing Variables
- Developed a predictive model and validate KNN model to predict the feature label
- Performed Boosting method on predicted model to improve the efficiency of the model
- Job scheduling, batch-job scheduling, process control, forking and cloning of jobs and checking the status of the jobs using shell scripting
- Used PyQt for the functionality filtering of columns helping customers to effectively view their transactions and statements. Implemented navigation rules for the application and page.
- Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records.
- Worked extensively on developing a shell-script which automates the packaging of the project and deploys it from Jenkins to the production server when the project is pushed to the GitLab.
- Deployed the project into Jenkins using GIT version control system.
- Used Shell scripting for host concurrent programs and migration scripts for deployment.
- Wrote Unix Shell Scripts, undertook Code Optimization and Performance tuning of the application.
- Presented Dashboards to Higher Management for more Insights using Microsoft Power BI
Confidential, Kansas City, KS
- Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
- Performed Exploratory Data analysis (EDA) to maximize insight in to the dataset, detect the outliners and extract important variables by graphically and Numerically.
- Implemented algorithms such as Principal Component Analysis (PCA) and t-Stochastics Neighborhood Embedding (t-SNE) for dimensionality reduction and normalize the large datasets. developed various Clustering algorithms for market segmentation to analyze the customer behavior patterns.
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, Scikit-learn, NLTK in Python at various stages for developing machine learning model.
- Implemented machine learning algorithms, Random forest and Support vector machines to predict the Customer churn and Customer interface.
- Used cross-validation techniques to avoid the overfitting of the model to make sure the predictions are accurate and measured the performance using Confusion matrix and Classification report.
- Improved accuracy using Ensemble methods of the model with different Bagging and Boosting methods.
- Performed Text analytics on unstructured email data using Natural language processing tool kit(NLTK).
- Involved in various pre-processing phases of text data like Tokenizing, Stemming, Lemmatization and converting the raw text data to structured data
- Performed feature engineering, performed NLP by using some techniques like Word2Vec, BOW (Bag of Words) , tf-idf, Avg-Word2Vec , if-idf , Weighted Word2Vec.
- Performed Sentimental analysis(NLP) on the email feedback of the customers to determine the emotional tone behind the series of words and gain the express of the attitudes and emotions by Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks(RNN).
- Implemented LSTM layer network of moderate depth to gain the information in the sequence with help of Tensor Flow.
- Created distributed environment of Tensor Flow across multiple devices (CPUs and GPUs) and run them in parallel.
- Used PySpark Machine learning library to build and evaluate different models.
- Used Tableau to convey the results by using dashboards to communicate with team members and with other data science teams, marketing and engineering teams.
- Communicated the results with operations team for taking best decisions.
Environment: : NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark-ML, Random Forests, SVM, t-SNE, PCA, Tensor Flow, K-Means, Natural Language Tool Kit, LSTM - RNN