- Data analytics professional with 7 years of IT experience including 5 years in delivering end to end data science projects.
- Implemented advanced analytical solutions to real business problems leveraging Machine Learning algorithms and Business intelligence tools that have impacted the business and end user experience.
- Demonstrated success in designing and executing hypothesis driven analytical projects and implementing design of experiments (DOE) methods to find cause - and-effect relationships.
- Experienced in using Python/R Studio/SQL/ SAS to perform statistical analysis and to implement machine learning algorithms utilizing different packages.
- Hands-on experience working with data integration software (Talend ESB), analytical software and reporting tools. Proficient in using SQL to query a database (Oracle/SQL Server/ MySQL) and to perform query management and data analysis.
- Leveraged big data tools and supporting technologies for extracting meaningful insights from huge data sets. Good knowledge on Distributed Computing, Hadoop Architecture and its ecosystem components like HDFS, Map Reduce, HIVE, IMPALA, Spark (PySpark) and Kafka.
- Experienced in using source code change management and version control tool such as Github.
- Proficient in implementing best practices for Data Visualization and adept in utilizing Tableau for creating appealing and interactive dashboards.
- Extensive exposure on analytics project life cycle CRISP-DM (Business understanding, Data understanding, Data preparation, Modelling, Evaluation and Deployment).
Programming Languages: Python, Java, SAS Base, SAS Enterprise Miner, Bash Scripting, Regular Expressions and SQL (Oracle & SQL Server).
Packages and tools: Pandas, NumPy, SciPy, Scikit-Learn, NLTK, Spacy, matplotlib, Seaborn, Beautiful Soup, Logging, PySpark, Keras and TensorFlow.
Machine Learning: Linear Regression, Logistic Regression, Multinomial logistic regression, Regularization (Lasso & Ridge), Decision trees, Support Vector Machines, Ensembles - Random Forest, Gradient Boosting, Xtreme Gradient Boosting(xGBM), Deep Learning - Neural Networks, Deep Neural Networks(CNN, RNN & LSTM) with Keras and TensorFlow, Dimensionality Reduction- Principal Component Analysis(PCA), Weight of Evidence (WOE) and Information Value, Hierarchical & K-means clustering, K-Nearest Neighbors.
Data Visualization: Tableau, Google Analytics, Advanced Microsoft Excel and Power BI.
Big Data Tools: Spark/PySpark, HIVE, IMPALA, HUE, Map Reduce, HDFS, Sqoop, Flume and Oozie
Text Mining: Text Pre-Processing, Information Retrieval, Classification, Topic Modeling, Text Clustering, Sentiment Analysis and Word2Vec.
Cloud Technologies: Google Cloud Platform Big Data & Machine Learning modules - Cloud Storage, Cloud Dataflow, Cloud ML, BigQuery, Cloud Dataproc, Cloud Datastore, Bigtable. Familiarity on AWS - EMR, EC2, S3.
Version Control: Git
Confidential, San Francisco, CA
Data Scientist/ Data Science Consultant
- Retail Analytics: Developed a machine learning modelling framework in python to understand the likelihood of a customer making a purchase leveraging rule-based extraction engines and ensemble of machine learning models. The solution showed potential of 7% improvement in sales per customer with an incremental revenue of ~3M.
- Leveraged disparate data sources that provide deep customer insight including online transactional data, web traffic data, payment and orders history and marketing campaigns exposure data.
- Performed price sensitivity and variation analysis across different marketing channels and conducted exploratory data analysis on variables such as life time value and profit score.
- Built data pipelines, implemented code modularization involving package creation and co-developed REST API’s using Flask for production deployment. Developed web services in ESB and exposed as REST API using JSON output format.
- Co-designed a robust customer segmentation framework that identified behavioral groups among the customer base. Generated insights that helped marketing team to design more effective marketing campaigns and create more relevant content that improve personalization of online shoppers.
- Performed data discovery and build a stream that automatically retrieves data from multitude of sources (SQL databases, external data such as social network data, user reviews) to generate KPI’s using Tableau.
Environment: Python/ Jupyter Notebook/ Oracle SQL developer/ Unix/Tableau/HDFS/IMPALA /HIVE/Jira/Hue.
Confidential, San Diego, CA
Data Science Consultant
- Implemented a natural language processing and statistical modeling-based approach to find nearest-neighbor NCIs (Non-Conformance Incidents) reported for products/process that were manufactured across global manufacturing sites. Used Python NLTK package and reduced recurring incidents up to 60%.
- Performed topic modeling on incidents reported and categorized incidents to topics to tag incidents to product related or process related for further root cause analysis.
- Incident sentences were converted to tokens and compared for similarities using stop wording and word lemmatization. Computed distance match between recurring incidents using cosine similarity.
- Generated percentile scores for capturing distance between recurring incidents and integrated with complaints effectiveness metrics dashboard in Tableau to provide visual insights to business users.
Environment: Python/ SQL Server / Microsoft Excel/Unix/Tableau/HDFS/Hive/Jira.
Confidential, Peapack, NJ
Data Analytics Specialist
- Marketing analytics: Designed a robust customer segmentation framework based on physician prescribing potential and adoption rate of branded drugs. Predicted physician lifetime value for each segment groups leveraging APLD patient level data from Symphony, IMS Xponent, IMS Sales and Distribution data (DDD) and various internal datasets.
- Performed A/B testing by sending emails to certain physician segments in the categories while maintaining a control population to observe the incremental impact of the emails. Provided distinct segments using unsupervised techniques with key physician characteristics which helped marketing team to prioritize market segments and devise promotional messages.
- Compared conversion metrics within test and control group and identified cases that were positively correlated with segments with high prescribing potential and adoption rates. Analysed prescribing behaviour across different groups, observed segments with high physician life time values were 8 to 10 times likely to prescribe if they received email compared to other groups. Analytical model enabled marketing teams to minimize market spend and prioritize on market segments.
Environment: Python/ SQL server/ HIVE/Microsoft Excel/Power BI
Confidential, Chicago, IL
Data Analytics Consultant
- Analyzed Medicare resource utilization groups (RUG’s) and Managed Care insurance claims data from healthcare provider and predicted residents with negative margins using Regression and CART.
- Handled class imbalance using re-sampling techniques. Utilized Logistic regression in R to identify the factors affecting margin and predict residents with negative margins. Build Gradient Boost Model utilizing H20.ai in R to analyze variable importance and evaluate model performance.
- Performed data pre-processing and cleaning to prepare data sets for further analysis; including outlier detection and treatment, missing value treatment, variable transformation and various other data manipulation technique.
- Performed clustering analysis on historical patient level data to classify them into payment (total expense per stay) groups and identified parameters impacting expenditures and provided recommendations to drive reimbursements.
- Performed clustering analysis on historical patient level data to classify them into payment (total expense per stay) groups and identified parameters impacting expenditures and provided recommendations to drive reimbursements. The model showed incremental revenue increase of $1M by identifying patient groups.
Environment: R studio/ Azure Data Studio / Microsoft Excel/ Tableau
Confidential, Madison, WI
Data Analytics Consultant
- Involved in the building and deployment of end to end real time fraud detection and segmentation model in Tableau and Azure ML web service to productionize claims scoring process using KNN and CART models.
- Performed text analytics on claims transcript notes using NLP using Latent Dirichlet Allocation (LDA) model to perform topic modelling and enhance existing model. Optimized and streamlined the claims model to process a claim within stipulated SLA. Implemented code modularization involving package creation, version control to push code to central repository improving code maintainability.
- Presented the model results to Claims business and helped them interpret it effects on KPI’s.
- Helped in capturing required results and assess population stability over time to fine tune the model.
Sr. Data Analyst
- Developed interactive dashboards using Tableau and made recommendations utilizing exploratory analysis that facilitated evaluation of quality and monitor the performance of BA/BE trial sites that contribute to potential risk.
- Worked closely with Business users, and interacted with ETL developers, Project Managers, and members of the QA teams for successful reporting across enterprise and ensured consistency on Key Performance Metrics (KPM’s).
- Worked with DBA team for performance improvement issues. Created custom Function's (Date range, Time functions, Logical functions) for the reports. Designed, developed, tested, and maintained functional reports based on user requirements.
Environment: Tableau Desktop 8.0/ Tableau server / Cognos / Microsoft Excel/ OSB
SQL Developer/ Data Analyst
- Worked throughout Software Development Life Cycle (SDLC) including Requirements, Specifications Analysis/Design and Testing.
- Experienced in Agile Scrum development methodology for diverse requirements.
- Performed Entity Relationship Modelling by designing ER Diagrams.
- Experience in creating and running complex queries that include DDL and DML statements, joins, sub-queries.
- Creating indexes on tables to improve the performance and created views for restricting access and summarizing data from multiple tables.
- Implemented Root Cause Analysis (RCA) while solving issues adhering to Service Level Agreements (SLAs).
- Performed code enhancements and maintenance adhering to strict deadlines and documented the changes in a version document.
- Included the steps taken to solve the issues into a document called Standard Operating Procedures (SOPs).
- Attended daily stand-up calls and suggested alternate ways of solving various issues as part of agile process.
- Assisted the team in making technical documents like Requirement Specification document as per the standard guidelines.
- Conducted knowledge transfer sessions to the new members and made sure that they are on par with other members in the team.
Environment: SQL Server, T-SQL, ETL, ER Diagrams, MS Excel, SOPs
- Performed data analysis on large relational datasets using optimized diverse SQL queries, such as DML and DDL statements including joins, group functions, string functions, and ranking functions.
- Developed queries to create, modify, delete and update the Oracle database and to analyze the data.
- Set up data extracts from MS Access to Excel to create PivotTables, vlookups and other analysis tools with VBA. Maintained an MS Access (VBA, SQL, ODBC) reporting database for tracking sales of individual employees with daily metrics.
- Automated MS Excel spreadsheets and converted data into an MS Access database, Pass-Through SQL Queries from MS SQL.
- Compared data received against the sources, verified accuracy and loaded into data lakes.
- Performed analysis of functional and non-functional categorized data elements for data profiling and mapping from source to target data environment. Analyzed test results, including user interface data presentation, output documents, and database field values, for accuracy and consistency.
- Worked with the ETL team to document the transformation rules for data migration from source systems to target systems and then to data marts for reporting purposes.
Environment: SQL, ETL, Pivot tables, MS Excel, MS ACESS