We provide IT Staff Augmentation Services!

Sr Data Scientist/machine Learning Resume

2.00/5 (Submit Your Rating)

CA

SUMMARY:

  • 7+ Years of experience in Python programming on different platforms of Data Science and Machine Learning.
  • Experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge on Recommender Systems.
  • Worked on implementation of Dynamic programming in Machine learning as Reinforcement learning.
  • Worked on different libraries related to Data science and Machine learning like Scikit - learn, OpenCV, NumPy, SciPy, Matplotlib, pandas, Json, SQL, Scala etc.
  • Hands on SparkMlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction.
  • Proficient in statistical programming languages like R and Python 2.x/3.x including Big Data technologies like Hadoop, Hive.
  • Involved in all the phases of project life cycle including Data acquisition (sampling methods: SRS/stratified/cluster/systematic/multistage), Power Analysis, A/B testing, Hypothesis testing, EDA (Univariate & Multivariate analysis), Data cleaning, Data Imputation (outlier detection via chi square detection, residual analysis, PCA analysis, multivariate outlier detection), Data Transformation, Features scaling, Features engineering, Statistical modeling both linear and nonlinear (logistic, linear, Naïve Bayes, decision trees, Random forest, neural networks, SVM, clustering, KNN), Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis, testing and validation using ROC plot, K- fold cross validation, statistical significance testing, Data visualization.
  • Documented methodology, data reports and model results and communicated with the project team manager to share the knowledge.
  • Used Natural Language Processing (NLP) for response modeling and fraud detection efforts for credit cards.
  • Supported client by developing Machine Learning Algorithms using Python programming and big data using PySpark to analyze transaction fraud, Cluster Analysis etc.
  • Having Strong years of experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
  • Also worked on different Python 2.x/3.x web frameworks like Django.
  • Also have an ability to design a Data warehouse/Mart or a Database model on different platforms like SQL, NoSQL, Oracle, SQL Server with full redundancy and normalization.
  • Worked on deep learning and machine learning platforms like Caffe, Neon and TensorFlow.

TECHNICAL SKILLS:

Data Analysis/Statistical Analysis: Hypothesis Test, ANOVA, Survival Analysis, Longitudinal Analysis, Experimental Design and Sample Determination, A/B Test, Z-test, T-test.

Machine Learning: Ensemble Methods( Random forest, gradient boosting, XG Boost, ADA Boost etc), SVM, KNN, Naive Bayes, Logistic/Linear regression, Decision Tress(CART/Information Gain), Fuzzy/K-means/Modes clustering, Hierarchical clustering, TensorFlow, Caffe, Neon.

Visulization Tools: Tableau, Pentaho, Obiee, R shiny, seaborn, matplotlib

Programming Languages: Python, R, Scala, Java, PHP, XML, SQL, PL/SQL, Scala, HiveSQL

Libraries: Pandas, Numpy, Numba, SKLearn, OpenCV, Django web framework,json,SciPy, mechanize, Beautifulsoup4, MNE, Caffe, NLP, Google ML, ggplots2

PROFESSIONAL EXPERIENCE:

Confidential, CA

Sr Data Scientist/Machine Learning

Responsibilities:

  • The present project is to develop an algorithm that accurately predicts the price of an assets among multiple classes based on the historical data available on multiple variables. Further, the aim was to improve the trust between the customers to ensure a fair price is quoted for the buyer and the price increased by the seller in which transparency is maintained.
  • Also involved in a project to identify the employees' access level, based on their current & historical tasks and duties.
  • Developed data solutions to support strategic initiatives, improve internal processes, and assist with strategic decision-making and design SWOT Analysis.
  • Extraction by developing a pipeline using Hive (HQL) to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python 2.x/3. x.
  • Replacement of missing data and perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Involved working on different databases like json, Scala, SPARK/HADOOP, XML, NoSQL and SQL of different platforms etc.
  • Involved working in Data science using Python 2.x/3.x on different data transformation and validation techniques like Dimensionality reduction using Principal Component Analysis (PCA) and A/B testing, Factor Analysis, testing and validation using ROC plot, K- fold cross validation, statistical significance testing.
  • Evaluated models for feature selection and elastic technologies like Elasticsearch, Kibana etc.
  • Used predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2 CLOUD PLATFORMS) and Django platform for the company’s core business.
  • Provided data and analytical support for the company’s highest-priority initiatives.
  • Used Python 2.x/3.x / R to develop many other machine learning algorithms such as Decision Tree, linear regression, multivariate regression, NLP (Natural Learning Process), Naive Bayes, Random Forests, K-means, & KNN based on Unsupervised/Supervised Model that help in decision making using TensorFlow and Sklearn.
  • Generated visualizations using Tableau and R-Shiny to present the findings on call center analytics.
  • Implementation of Reinforcement learning techniques in the field of Machine learning by following Dynamic programming using Python 2.x/3.x.
  • Also worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.

Environment: Python 2.x/3.x 2.x/3.x, R, CDH5, HDFS, Hadoop 2.3, Hive, Linux, Spark, TensorFlow, Tableau Desktop, Scala, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, Pyspark, SQL, Scikit-learn, Pandas, AWS(S3/EC2 CLOUD PLATFORMS), XML, json.

Confidential, Omaha, Nebraska

Data Scientist

Responsibilities:

  • Develop predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
  • Identify key processes within Supply chain which can be improved significantly using advanced analytics / data science and thereby strive for continuous improvement.
  • Extracting meaning from huge volumes of data to help improve decision making and to provide business intelligence through data driven solutions. developed pipeline using Hive (HQL) to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
  • Work closely with other analysts, data engineers to develop data infrastructure (data pipelines, reports, dashboards etc.) and other tools to make analytics more effective.
  • Gather data from different fromats like json, XML, NoSQL and SQL of different platforms etc.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python 2.x/3. x.
  • Replacement of missing data and perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects. used Python 2.x/3.x on different data transformation and validation techniques like Dimensionality reduction using Principal Component Analysis (PCA) and Factor Analysis
  • Used Python 2.x/3.x / R to develop many other machine learning algorithms such as Decision Tree, linear/logistic regression, multivariate regression, NLP (Natural Learning Process), Naive Bayes, Random Forests, Gradient Boosting, XG Boost, K-means, & KNN based on Unsupervised/Supervised Model that help in decision making using TensorFlow and Sklearn.
  • Performed model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.
  • Perfomed parameter optimization using Grid search and metric evaluation via regression (RMSE, R2, MSE etc), classification (Accuracy, precision, recall etc), threshold calculations using ROC plot.
  • Developed data solutions to support strategic initiatives, improve internal processes, and assist with strategic decision-making and design SWOT Analysis.
  • Used predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2 CLOUD PLATFORMS) and Django platform for the company’s core business.
  • Provided data and analytical support for the company’s highest-priority initiatives.
  • Generated visualizations using Tableau and R-Shiny to present the findings on call center analytics.
  • Implementation of Reinforcement learning techniques in the field of Machine learning by following Dynamic programming using Python 2.x/3.x.
  • Prepares formal reporting on operational performance KPIs/SLAs, incrementally. participated in production meetings for managers and senior leaders - as well as specific subject meetings to create usecases with complex data for consumption by senior leaders.

Environment: Python 3.6, Apache Spark and Kibana, I Python, Kafka, PIG, Scikit-Learn, MySQL, SQL, NoSQL, Data Warehouse, Data Modelling, Middleware Integration, Hadoop (MapReduce, HBase, Hive), Gradient Boost, Random Forest, xgboost, Nueral Nets, sklearn etc.

Confidential, Boston, MA

Data Scientist/Machine Learning

Responsibilities:

  • Involved in an NLP project to analyze the insights of customers. Perform Sentiment Analysis on the customers Emails for the purpose of email classification to design a Machine Learning System to route specific emails to the respective departments.
  • Also analyze the category and product-based sales based on the customer demographics and support the product promotion team to increase the sale.
  • Understanding and implementing the process of MapReduce using various Big Data platforms like Hadoop/Spark SQL API in Python 2.x/3. x.
  • Etraction of data from different database and warehouses and transforming them as per the requirements.
  • Used functional programming languages like Scala on Hadoop platforms.
  • Performed data profiling to learn about the behavior of various features and finding dependencies with in them using Python 2.x/3. x.
  • Understanding the complexity in available raw dataset and came up with different solutions like Dimensional reducing and Feature scaling to transform the dataset that fit to a best analytical model depending on the perspective of study.
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
  • Design of Dashboards showing Categories and product-based reports with key performance indicators.
  • Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in Python 2.x/3.x, MATLAB.
  • Predict the impact of various promotional decision makings using Python 2.x/3.x on overall sale of the Company using various Clustering techniques of Machine learning approaches like K-means, Maximum Likelihood and other Hierarchical Clustering techniques.
  • Maintain the robustness and effective fit to make predictions.
  • Responding and coordinating to making changes to the existing Data Model based on ad hoc analytical reports.

Environment: Python 2.x/3.x/R, Scala, Bigdata Hadoop/Spark SQL, Scikit-learn, Pandas

Confidential

Data Analyst

Responsibilities:

  • Acquire data from primary or secondary data sources and maintain databases/data systems.
  • Established new client data preparing them for entry into the new platform.
  • Loaded data by converting a CSV file into the corresponding database tables.
  • Work with management team to create a prioritized list of needs for each business segment.
  • Ran diagnostic survey tool to measure and predict team performance.
  • Extracted, compiled and analyzed data using Excel and Adobe to build reports and provide recommendations to clients to improve team performance.
  • Generated ongoing reports of each active account as they are being consulted.
  • Involved in client-facing activities where reports were presented to upper-management and to each team.
  • Communicated generated reports to GAP management showcasing the progress of each account daily.
  • Identify and address data quality problems by eliminating duplicates and standardizing data sets.
  • Locate and define new process improvement opportunities.
  • Used advanced Excel functions to generate spreadsheets and pivot tables.
  • Performed daily data queries and prepared reports on daily, weekly, monthly, and quarterly basis.
  • Advise client on system usage.
  • Execute customized self-service client dashboards.

We'd love your feedback!