We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Boston, MA

SUMMARY:

  • A competent professional with 8+ years of experience in Data Science, Business Analysis and Product management in fast moving Agile environment.
  • Extensive programming skills in analytical and statistical programming language such as Python, R and SQL.
  • Well versed in machine learning algorithm like Linear, Logistic and Time series, Linear Discriminant analysis, Decision Trees, Support Vector Machines, Random Forests and K - nearest neighbor.
  • Proficient in managing entire data science project life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Statistical Modeling, Testing and Validation, Visualization and Reporting.
  • Proficient in Machine learning algorithm like Linear Regression, Ridge, Lasso, Elastic Net Regression, Decision Tree, Random Forests and more advanced algorithms like ANN, CNN, RNN, Ensemble methods like Bagging, Boosting, Stacking.
  • Excellent performance in Model Validation and Model Tuning with Model selection, K-ford cross-validation, Hold-Out Scheme and Hyper parameter tuning by Grid search and Hyper Opt.
  • Advanced experience with Python (2.x, 3.x) and its libraries such as NumPy, Pandas, Scikit-learn, XGBoost, Lightbgm, Keras, Matplotlib, Seaborn.
  • Experience in building machine learning solutions using PySpark for large sets of data on Hadoop System.
  • Proficient at building and publishing interactive reports and dashboard with design customization based on the stakeholders’ needs in Tableau and PowerBi.
  • Experienced in RDMS such as SQL Server 2012 and NoSQL database like MongoDB, DynamoDB.
  • Responsible for creating ETL packages, migrating data from, Flat File and MS Excel, cleaning data and backing up data files, and synchronizing daily transactions by using SSIS.
  • Built recommendation engines using content based collaborative filtering algorithm to understand the relations between various Confidential -Talent Management job profiles.
  • Interpret problems and provides solutions to business problems using data analysis, data mining, optimization tools, and machine learning techniques and statistics.
  • Extracted data from non-traditional sources like Web scraping,
  • Quick learner in any new business industries or software environment to deliver the best solutions adapted to new requirements and challenges.
  • Strong understanding of Hadoop ecosystem and its components.
  • Very good at problem solving, debugging, troubleshooting, designing, and implementation solutions to complex mathematical and technical challenges.
  • Highly experienced at translating business objectives from multiple stakeholders into detailed implementation and project plans.
  • Researched and implemented multiple profitable trading strategies and signals.
  • Improved performance of existing strategies and models.
  • Extensive experience researching and implementing high-to-medium frequency statistical arbitrage trading strategies.
  • Detailed expertise in algorithmic trading practices, exchanges, market microstructure, data feeds and technologies.
  • Algorithmic trading generalist.
  • Proven track record of leveraging analytics and large amounts of data to drive significant business impact.
  • Passion for researching and innovating new methodologies in the intersection of machine learning, applied math, probability, statistics and computer science.
  • Proficient at translating unstructured business problems into an abstract mathematical framework.
  • Proficient at designing custom business intelligence dashboards.
  • Effective Manager training: one-on-ones, feedback, coaching, etc.
  • Detailed knowledge of equity markets operation, rules, regulations and practices.

TECHNICAL SKILLS:

Languages: Expert in Python(Pandas, Numpy, Scipy, Scikit-Learn, Keras, Seaborn, Tensorflow,K Means, Hierarchical clustering), R, SQL

K: Nearest Neighbors, Support Vector Machines, Na ve Bayes, Decision Trees, Random Forest, Neural Networks, Bagging, Boosting

Big Data: Apache SPARK, Hive, Map Reduce, Pig

Analysis: Feature Selection Methods, Principal Component Analysis, Supervised and Unsupervised Learning, Classification Techniques, Topic modeling, Model building, Penalized Linear regression, Time Series

Analytics Tools: IBM SPSS, MS Excel, Weka, Tableau

Relational Data Bases: SQL server, Oracle 10g

NoSQL: MongoDB, HBase

Tools: PyCharm2017.2.2, Anaconda3.6.1, Jupiter notebook, Rstudio, Weka, Dreamweaver, Eclipse, Toad, IBM SPSS

MS Share Point, Rally, Agile: Scrum, Team Foundation server

ETL and BI Tools: SSIS, Tableau

SPECIALITIES: Machine Learning / Predictive Analytics / Text Mining / Audio Analytics

Classification: Naive Bayes, Sequential (Neural network),SVM, Zero R, One R etc.

Regression: Simple Linear Regression, Multiple Linear Regression, Logistic Regression, Linear Discriminant Analysis, Ridge Regression, Lasso Regression

Ensemble: Random Forest

K: Nearest Neighbor (kNN)

Decision Tree Learning: Classification and Regression Tree (CART), J48, Gradient Boosting Machines (GBM)

Clustering: k-Means, Hierarchical

Recommendation Engines: Content Based Recommender System

Time Series: Moving Average, ARIMA

Text Analytics: Using R(Tidy Verse, TidyText,tm, Word Cloud) Vectorizer,Topic Modelling

PROFESSIONAL EXPERIENCE:

Confidential - Boston, MA

Data Scientist

Responsibilities:

  • Build an in-depth understanding of the problem domain and available data assets
  • Research, design, implement, and evaluate machine learning approaches and models
  • Perform ad-hoc exploratory statistics and data mining tasks on diverse datasets from small scale to "big data
  • Participate in data architecture and engineering decision-making to support analytics
  • Take initiative in evaluating and adapting new approaches from data science research
  • Investigate data visualization and summarization techniques for conveying key findings
  • Communicate findings and obstacles to stakeholders to help drive the delivery to market
  • Developed the code as per the client's requirements using SQL, PL/SQL and Data Ware housing concepts.
  • Delivered an interactive dashboard in Tableau to visualize 1billion rows historical data
  • Designed and developed user interfaces and customization of Reports using Tableau and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.

Environment: Oracle, PostgreSQL, SAS, SQL, PL/SQL, T-SQL, Tableau, TOAD for data analysis, MS Excel, Netezza, DFAST, CCAR.

Confidential, Florham Park, NJ

Data Scientist

Responsibilities:

  • Worked on building a recommendation system, to recommend variety of Confidential products/services to new customers. Used concept of text processing and recommendation systems.
  • Communicated and coordinated with end client for collecting data and performed ETL to define the uniform standard format.
  • In preprocessing phase, used Pandas to clean all the missing data, data type casting and merging or grouping tables for EDA process.
  • In data exploration stage used correlation analysis and graphical techniques in Matplotlib and Seaborn to get some insights about the product sales in a region and type of business.
  • Explored the product sales data for cannibalization, when similar products were launched in the same category.
  • Segmented the data using K-Means clustering and analyzed the client's behavior according to their demographic details, regions and monthly revenues in each cluster.
  • Designed and implemented Cross-validation and statistical tests including k-fold, stratified k-fold, hold-out scheme to test and verify the models’ significance.
  • Collected the feedback after deployment, retrained the model to improve the performance.
  • Used Python and Tableau to analyze the number of products which gave maximum sales in a category, leading to sales optimization.
  • Used R to perform text mining to find out meaningful pattern from unstructured textual feedback. Created word cloud and word corpuses that was used by the higher management.
  • Implemented topic modeling using LDA in R to predict the product category of feedback and detect specific issues in each category by analyzing the feedback collected from customer service.
  • Used text mining in R to generate Net promoter value using R.
  • Sentiment analysis of customer feedback after every release.
  • Proficient in working with R libraries. (Tidy verse, tidy text, sentiment, ggplot2, dplyr, qplot).
  • Conducting user interviews, gathering requirements, analyzing the requirements and loaded them into Rally.
  • Created user stories and tracked/updated the progress for the entire scrum.
  • Creating and maintaining the product backlog using TFS tool and also tracking the documents changes for functional and business specifications.
  • Writing SQL queries for backend testing, data analysis, modification and addition of features.
  • Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver data science solutions.
  • Built a fraud detection model using traditional structured data and textual data (Linear regression, logistic regression, Decision tree classifier and Random Forest classifier).
  • Tackled highly imbalanced Fraud dataset using sampling techniques like down-sampling, up-sampling and SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikit-learn.
  • Predicted the insurance claim renewal of .5 million Confidential ’s insurance partners by using various data models.
  • Used PCA and other feature engineering techniques to reduce the high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.
  • Built client implementation model using Linear Regression for estimating customer loyalty and profitability at an account level, thereby helping to predict probability of attrition in various future periods.
  • Worked with a marketing team to process transaction and behavioral data of a customers.
  • Developed statistical report using Tableau dashboards on Insurance claims for actionable insights.
  • Used OCR technology to extract data from various insurance reports to identify payouts trends. Built a Sound detection model using neural network so that we can improve the clients experience by customizing the response.
  • Used Sequential model of Keras library and achieved 72% accuracy.
  • Validated the results using confusion matrix based on percentage of false positives and false negatives.

Environment: Jupiter, Rstudio, PyCharm, Tableau, Text Mining, TopicModelling, Collaborative filtering, Clustering, SQLRally

Kroger Cincinnati, Ohio

Data Scientist

Responsibilities:

  • Explored the Sales data of a store to find the seasonality of a product and its trends across time.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
  • Performed data imputation using Scikit-learn package in Python.
  • Used Python 2.x/3.X NumPy, Scipy, Pandas, Scikit-learn, Seaborn to develop variety of models and algorithms for analytic purposes.
  • Experimented and built predictive models including ensemble methods such as Gradient boosting trees to predict sales amount.
  • Predicted the sales of a product based on the seasonality, this helping to maintain the inventory as per the sales.
  • Conducted analysis and patterns on customers’ shopping habits in different location, different categories and different months by using time series modeling techniques.
  • Used RMSE/MSE to evaluate different models’ performance.
  • Used Tableau for visualizations like word cloud, geographical heat map, etc.
  • Used Python and Tableau to analyze the number of products which gave maximum sales in a category, leading to sales optimization.
  • Created user stories and tracked/updated the progress for the entire scrum.
  • Creating and maintaining the product backlog using TFS tool and also tracking the documents changes for functional and business specifications.
  • Writing SQL queries for backend testing, data analysis, modification and addition of features.

Environment: Pandas, NumPy, Matplotlib, Python 2.x/3.X, RMSE/MSE, Tableau, TFS tool.

Confidential - San Francisco, CA

Data Analyst/Data Scientist

Responsibilities:

  • Integrate data from multiple data sources or functional areas, ensures data accuracy and integrity, and updates data as need.
  • Develops and/or uses algorithms and statistical predictive models and determines analytical approaches and modeling techniques to evaluate scenarios and potenti al future outcomes
  • Performs analyses of structured and unstructured data to solve multiple and/or complex business problems utilizing advanced statistical techniques and mathematical analyses.
  • Collaborates with business partners to understand their problems and goals, develop predictive modeling, statistical analysis, data reports and performance metrics
  • Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.
  • Developed advanced models using multivariate regression, Logistic regression, Random forests, decision trees and clustering.
  • Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit-learn, and NLTK in Python for developing various machine learning algorithms.
  • Designed, developed, and implemented data quality validation rules to inspect and monitor the health of the data.
  • Worked extensively with data governance team to maintain data models, Metadata and dictionaries.
  • Participate in the on-going design and development of a consolidated data warehouse supporting key business metrics across the organization.
  • Applied predictive analysis and statistical modeling techniques to analyze customer behavior and offer customized products, reduce delinquency rate and default rate. Lead to fall in default rates from 5% to 2%.
  • Applied machine learning techniques to tap into new markets, new customers and put forth my recommendations to the top management which resulted in increase in customer base by 5% and customer portfolio by 9%.
  • Analyzed customer master data for the identification of prospective business, to understand their business needs, built client relationships and explored opportunities for cross-selling of financial products. 60% (Increased from 40%) of customers availed more than 6 products.
  • Designed, developed, and implemented data quality validation rules to inspect and monitor the health of the data.
  • Dashboard and report development experience using Tableau and QlikView.

Environment: Scikit, Nltk - Python, Matplotlib, Tableau, Modelling.

We'd love your feedback!