We provide IT Staff Augmentation Services!

Data Scientist Resume

Mclean, VA

SUMMARY:

  • Data Scientist with 6+ years of experience executing data driven solutions to increase efficiency, accuracy, and utility of internal Data Processing.
  • Extensive experience with “Machine learning” solutions to ratifying Business situations and generating visualization data by using Python.
  • Worked with different tools like Pandas, NumPy, Matplotlib, and Scikit - Learn for python to generate short coding data with Machine learning Models.
  • Hand’s on working with Naïve Bayes, Random forests, Decision tree’s, Linear and Logistic Regression. Principle component analysis, SVM, Clustering, Neural Networking and circulated vision on related systems.
  • Passionate in implementing “Deep Learning Techniques” like Kera’s, Theano.
  • Experience knowledge on Time Series Forecasted sales, demand for Loan’s using time series modeling technique like Introgressive. Moving average, Holt-winter.
  • Python Packages used for Developed visible visualization to plot results like Seaborn, Matplotlib, ggplot and Pygal
  • Extracted data and worked with data from different database for oracle, SQL server, DB2, Mongo DB, NoSQL, PostgreSQL , Teradata and Cassandra.
  • Followed with “Data Science life cycle”, SDLC, Waterfall and Agile methodologies and used to develop software products.
  • Experience with statistical programming languages such as Python and R.
  • For performing data mining, data analysis, and predictive modelling worked with Java machine learning Library WEKA.
  • Actively involved in all phases of data science project life cycle including Data Extraction, Data Cleaning, Data Visualization and building Models.
  • Also used t-SNE (t-Distributed Stochastic Neighbor Embedding), UMAP (Uniform Manifold Approximation and Projection).
  • Expertise in Machine learning Unsupervised algorithms such as K-Means, Density Based Clustering (DBSCAN), Hierarchical Clustering and good knowledge on Recommender Systems.
  • Extensive hands-on experience and high proficiency with structures, semi-structured and unstructured data, using a broad range of data science programming languages and big data tools including Python, Spark ML lib, SQL, Scikit-Learn, Hadoop.
  • Working with Fluid, IT, mechanical system by developing mathematical model using Linear, Multi-linear and Non-linear regression and fault analysis of system.
  • Trained data analysts, data engineer and juniors and leveraged my experience and skills to motivate them and increase the communication as well as to improve my subject skills.

PROFESSIONAL EXPERIENCE:

Confidential, McLean, VA

Data Scientist

Responsibilities:

  • Developing data analytical databases from complex financial source data.
  • Responsible for data identification, collection, exploration, cleaning for modeling.
  • Data entry, data auditing, creating data reports and monitoring all data for accuracy.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, to visualization of the data after removing missing and outliers to fit in the model.
  • Performed Clustering with historical, demographic and behavioral data as features to implement the Personalized marketing to the customers.
  • Applied isolation forest, local outlier factor from Sklearn, where local filters are used unsupervised outlier detection and score each sample.
  • Worked with dimensionality reduction techniques like PCA, LDA and ICA.
  • Also used t-SNE (t-Distributed Stochastic Neighbor Embedding), UMAP (Uniform Manifold Approximation and Projection).
  • Worked with different methodologies including Pareto/NBD model for computing CLV at customer level with business contexts like contractual vs non-contractual and continuous vs discrete.
  • As an extension to the Pareto model we also implemented Gamma-Gamma model in order to focus on monetary value, lifetime and purchase count.
  • Also used different strategies like Market Mix Modelling for company’s advertisement investments.
  • Applying Clustering algorithms to group the data on their similar behavior patterns.
  • Using NLP to sorting the email to automatically updating the records in Customer Relationship management (CRM).
  • Performed Credit Risk Predictive Modelling by using Decision Trees and Regressions in order to get the risk involved by giving individual scores to the customers.
  • Used Deep learning Techniques for Fraud transactions like Kera’s and Theano’s
  • Worked Behavior Tracking on multiple proof of concept projects like Fraud Detection, Dynamic etc.
  • Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
  • Worked with balance transfer card on paying off debt and without incurring any new charges. So, by using Clustering we will save overall money to the customer.
  • Worked with Survival Analysis Technique to consider the personal loan for calculation of risk measurements over time.
  • Worked with Time to Default Modeling for financial data like Cox Proportional Hazard Mixture cure model, Cox Proportional Hazard Regression for error prediction for the loan.
  • Used Tableau dashboards to display the results of the customers in order to communicate with the team as well as individual customers. Which helped the support team for better marketing.
  • The results observed from the model are communicated with customer support team to take the decision according to the customers history.

Confidential, Seattle, WA

Data Scientist

Responsibilities:

  • Collecting data from various data sources including oracle database server and customer support department and integrating those into a single data set.
  • Responsible for data identification, collection, exploration, cleaning for modeling.
  • Worked with Data preprocessing techniques like checking the data normally distributed and implemented log transformation, Box-Cox, cube root, square root transformations.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, to visualization of the data after removing missing and outliers to fit in the model.
  • Performed and treated outliers and missing values detected using boxplots and Pandas predefined functions.
  • Worked with dimensionality reduction techniques like PCA, LDA and ICA.
  • Also used t-SNE (t-Distributed Stochastic Neighbor Embedding), UMAP (Uniform Manifold Approximation and Projection).
  • Worked with Regression model which includes Random forest regression. Lasso Regression.
  • Worked with various classification algorithms including Naïve Bayes, Random forest, Support Vector Machines, Logistic Regression etc.
  • Also worked with K-Nearest Neighbors, Apriori algorithms for product recommendations including content-based filtering and collaborative filtering methods.
  • Applied Clustering algorithms such as K-means to categories customers data in to certain groups.
  • Involved in Time Series forecasting models such as, Varimax, Arimax Holt Winter, Vector auto regression.
  • Worked with Content Based Filter, Collaboration filter for recommending products to the customers.
  • Used Regularization techniques such as L1 and L2 and Elastic net to balance variance - basics tradeoff.
  • Used Pyplot, ggplot, seaborn, Matplotlib, Plotly for visualizing the results.

Confidential

Data Scientist

Responsibilities:

  • Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC).
  • Performed data ETL by collecting, exporting, merging and massaging data from multiple sources and platforms including SSRS/SSIS (SQL Server Integration Services) in SQL Server.
  • Worked with cross-functional teams (including data engineer team) to extract data and rapidly execute from MongoDB through MongoDB connector.
  • Performed data cleaning and feature selection using Scikit-learn package in python. partitional clustering into 100 by k-means clustering using Scikit-learn package in Python where similar hotels for a search are grouped together.
  • Used Python to perform ANOVA test to analyze the differences among hotel clusters.
  • Implemented application of various machine learning algorithms and statistical modeling like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
  • Worked with ARIMAX, Holt Winters VARMAX to predict the sales in the regular and seasonal intervals.
  • Determined the most accurately prediction model based on the accuracy rate.
  • Used text-mining process of reviews to determine customers concentrations.
  • Delivered result analysis to support team for hotel and travel recommendations.
  • Designed Tableau bar graphs, scattered plots, and geographical maps to create detailed level summary reports and dashboards.
  • Developed hybrid model to improve the accuracy rate.

Confidential

Data Scientist

Responsibilities:

  • Collecting data from various data sources including oracle database server and customer support department and integrating those into a single data set.
  • Imported python and statistics libraries like NumPy, Pandas, Sklearn, Seaborn.
  • Worked with Data preprocessing techniques like checking the data normally distributed and implemented log transformation, Box-Cox, cube root, square root transformations.
  • Performed and treated outliers and missing values detected using boxplots and Pandas predefined functions.
  • For performing data mining, data analysis, and predictive modelling worked with Java machine learning Library WEKA.
  • Worked with various dimensionality reduction techniques like Principal Component Analysis (PCA), Latent Discriminant Analysis (LDA), Singular Value Decomposition (SVD), Factor Analysis etc.
  • Used high correlation filter, low variance filter and random forest for feature selection.
  • Worked with Times Series Forecasting to predict the sales production based on the historic data for product recommendation.
  • Worked with various classification algorithms including Naïve Bayes, Random forest, Support Vector Machines, Logistic Regression etc.
  • Implemented several Natural Language Processing mechanisms for Spam Filtering, Chatbots.
  • Worked with NLTK, SciPy, Polyglot for developing various NLP tasks.
  • Also worked with K-Nearest Neighbors, Apriori algorithms for product recommendations including content-based filtering and collaborative filtering methods.
  • Used Regularization techniques such as L1 and L2 and Elastic net to balance variance - basics tradeoff.
  • Preformed k-Fold cross validation on training set to increase the model accuracy.
  • Used Pyplot, ggplot, seaborn, Matplotlib for visualizing the results.

Confidential

Data Analyst

Responsibilities:

  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Expertise in writing automation scripts using JAVA.
  • Responsible for defining the key identifiers for each mapping/interface.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Involved in defining the business/transformation rules applied for sales and service data.
  • Worked with internal architects and, assisting in the development of current and target state data architectures.
  • Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures.
  • Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
  • Performed data quality in Talend Open Studio.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.

Hire Now