We provide IT Staff Augmentation Services!

Data Scientist Resume

5.00/5 (Submit Your Rating)

Sunnyvale, Ca

SUMMARY

  • Over all 6 Years of IT Industry experience and Data Scientist experience specialized in implementing advanced Machine Learning and Natural Language Processing algorithms upon data from diverse domains and building highly efficient models to derive actionable insights for business environments leveraging exploratory data analysis, feature engineering, statistical modeling and predictive analytics.
  • Experiences in Machine learning, data mining, structured and un - structured data analysis, and image data analysis, including feature extraction, pattern recognition, text mining, computer simulation, data modeling, model evaluation and deployment.
  • Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, SPSS, Python and SQL.
  • Data Driven and highly analytical with working knowledge and statistical model approaches and methodologies (Clustering, Regression analysis, Hypothesis testing, Decision trees, Machine learning), rules and ever evolving regulatory environment.
  • Experienced with machine learning algorithm such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression and k - means
  • Expert in the entire Data Science process life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization.
  • Skilled in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
  • Testing techniques such as unit testing, integration and AB testing.
  • Experience in problem solving, data science, Machine learning, statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing, and computational linguistics; with extensive experience in predictive analytics and recommendation.
  • Experience in using Informatica Metadata Manager to perform data lineage.
  • Experience in foundational machine learning models and concepts (Regression, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning).
  • Determined customer satisfaction and helped enhance customer experience using NLP.
  • Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool.
  • Hands - on experience in Machine Learning algorithms such as Linear Regression, GLM, CART, SVM, KNN, LDA/QDA, Naive Bayes, Random Forest, SVM, Boosting, K-means Clustering, Hierarchical clustering, PCA, Feature Selection, Collaborative Filtering, Neural Networks and NLP.
  • Experience using machine learning models such as random forest, KNN, SVM, logistic regressions and used packages such as ggplot, dplyr, lm, rpart, RandomForest, nnet, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), Numpy, sci-kit learn, pandas, etc., in R, SAS and python.
  • Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
  • Strong skills in Statistics Methodologies such as Hypothesis Testing, Principle Component Analysis (PCA), Correspondence Analysis.
  • Expertise in building Supervised and Unsupervised Machine Learning experiments using Microsoft Azure utilizing multiple algorithms to perform detailed predictive analytics and building Web Services models for all types of data: continuous, nominal, and ordinal.
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupyter Notebook 4.X, R 3.0 (Caret, dplyr) and Excel …
  • Experienced of statistical analysis using R, SPSS and Excel.
  • Knowledge and experience in agile environments such as Scrum and using project management tools like Jira/Confluence and version control tools such as Github/Git.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.
  • Highly motivated team player with excellent Interpersonal and Customer Relational Skills, Proven Communication, Organizational, Analytical, Presentation Skills, and Leadership Quality.

TECHNICAL SKILLS

  • DB2 8.0, Oracle11g, Teradata, MySQL, MS Access, SQL Server 16.0, VSAM, IMS,
  • NoSQL databases - Mongo DB, XML, XSD, JSON, Redis, Elasticsearch
  • Logistics Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means
  • SAP- ABAP/4, C, C++, SQL, SPSS, R, Python
  • HTML5, DHTML and XML, CSS3, Web Services, JDBC
  • Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer
  • Unit test, integration and AB testing, Win-runner 8.0, Load-runner, Test Director 7.2, Quality center, Quick Test Professional 8.2, Rational Robot
  • Microsoft Office (Word, Excel, PowerPoint), MS Visio, SharePoint, Outlook, MS Project
  • Data Modeling - Logical/Physical/Dimensional, Star/Snow flake Schema, ETL, OLAP, Complete Software Development Lifecycle, CMMI Compliance, Waterfall, Agile, Iterative SDLC, Agile, Waterfall
  • Machine Learning, Deep Learning, NLP, Bayesian Learning, Optimization, Prediction, Pattern Identification, Data / Text mining, Regression, Logistic Regression, Bayesian Belief, Clustering, Classification, Statistical modeling
  • Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering, SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Teradata, Tableau
  • Informatica, Microsoft SQL Server Integrated Services
  • Machine learning, Regression, Clustering, Data mining
  • Logistics Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, K-means
  • Windows 7, Windows 10, Linux, Unix, Macintosh HD, Red Hat.

PROFESSIONAL EXPERIENCE

Data Scientist

Confidential, Sunnyvale, CA.

Responsibilities:

  • Work collaboratively with senior management to identify potential Machine Learning use cases and to a setup server-side development environment.
  • Performed Text Analytics and Text Mining to extract and convert data from raw text to JSON objects. Developed this entire application as a service with REST API using Flask.
  • Perform Multinomial Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Testing using AB Technique.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB.
  • Extensively used Python’s multiple data science packages like Pandas, NumPy, Matplotlib, SciPy, Scikit-learn and NLTK.
  • Created and implemented MDM data model for Consumer/Provider for Health Care MDM product from Variant.
  • Used Similarity Measure Algorithms like Jaro distance, Euclidean Distance and Manhattan Distance.
  • Performed Entity Tagging - Stanford NER Tagger and used Named Entity Recognition packages like SpaCy.
  • Worked with a team of Java developers and integrated the service.
  • Worked closely with the SME’s of different data sources and gained domain knowledge and understood the data. Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
  • Identified the fields needed for the analysis and created mapping documents.
  • Managed the offshore team, tracked the tasks and prioritized the tasks.
  • Created SQL Join queries to create an analytical base table from different source systems.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc. to import data into the data warehouse.
  • Performed EDA for Data understanding, Feature Engineering and Selection.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules
  • Segmented the customers using Unsupervised learning Clustering Algorithms K-means.

Environment: ERwin9.x, Oracle10g, HDFS, Pig, Hive, Map reduce, PL/SQL, UNIX, MDM, SQL Server, DB2, Graph, SQL, Tableau, Machine learning, Python, Numpy, NLTK, Pandas, Scipy, SpaCy, SQL Developer, Flask, SQL and ETL.

Senior Data Scientist

Confidential, Princeton, NJ

Responsibilities:

  • Responsible in developing system models, prediction algorithms, solutions to prescriptive analytics problems, data mining techniques, and/ or econometric model.
  • Communicate the results with operations team for taking best decisions and Collect data needs and requirements by Interacting with the other departments.
  • Used Pandas, NumPy, Seaborne, SciPy, Matlplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multi variant regression, naive Bayes, Random Forests, K-means, &KNN for data analysis.
  • Demonstrated and build statistical / machine learning systems to solve large-scale customer-focused problems and leveraging statistical methods and applying them to real-world business problems
  • Perform Data Profiling to learn about behavior with various features of turnover before the hiring decision, when one has no on-the-job behavioral data.
  • Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Addressed overfitting and underfitting by tuning the hyper parameter of the algorithm and by using L1 and L2 Regularization.
  • Worked with dimensionality reduction techniques like PCA, LDA and ICA
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of the database.
  • Developed Linux Shell scripts by using NZSQL/NZLOAD utilities to load data from flat files to the Netezza database.
  • Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python
  • Performed data cleaning and feature selection using MLLib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
  • Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of through a discovery approach.
  • Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Linear regression,
  • Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
  • Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Create and design reports that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
  • Use MLLib, Spark's Machine learning library to build and evaluate different models.
  • Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
  • Create various types of data visualizations using Python and Tableau .

Environment: Erwin r9.6, Python, SQL, Oracle 12c, Netezza, SQL Server, Java, SSRS, PL/SQL, T-SQL, Tableau, MLLib, regression, Cluster analysis, Scala NLP, Spark, Kafka, MongoDB, logistic regression, random forest, OLAP, Azure, Maria DB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, Map reduce, AWS.

Data Analyst/Data Scientist

Confidential

Responsibilities:

  • Worked as a liaison between internal team members, external customers and banks.
  • Led requirements discussion and wrote requirements specification to establish a repository.
  • Wrote vision and scope documentation to provide software solution and establish support model for legacy systems.
  • Led project meetings, discussions, obtained consensus and signoff on project documentation.
  • Elicited requirements specifications from finance managers and banks/investors to write business requirements to reduce the funding process. Storage Management Managing Space, Table spaces, Segments Extents, Rollback Segments & Data Dictionary.
  • Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Ensured Quality standards to meet the Data Quality SLAs.
  • Used SQL*Loader to move data from flat files into an Oracle database.
  • Performed financial analysis to determine the credit worthiness of clients; shortened the pre-approval process and accelerated the application submission to banking institutions.
  • Involved in creation of OALP, Data analysis, Data processing - Extracting data from different sources, Cleansing and transferring data, Distribution of data, Mappings Creation, Debugging, Optimization, Slowly changing dimensions, Comparison of mappings, Designing the data mart.
  • Analyzed emerging business trends, production patterns and forecasting demand using BI Reports
  • Maintained SharePoint site to house project-related documentation and system status.
  • Carrying out other duties as you may be requested by management.
  • Modernized present manually sent correspondences into System generated with plain language.
  • Extracted and formatted requested information for managers.
  • Interpret data, analyze results using statistical techniques and provide ongoing reports on the CRM application.
  • Ensuring all necessary Finance Compliance of the firm and governing bodies with the CRM application.

Environment: Model N Application, DB2, UNIX, Toad, Putty, HP Quality Center, SSIS, SSAS, SSRS, MS Excel, MS PowerPoint, MS Word, MS Project, Windows XP

Confidential

Senior Research Fellow

Responsibilities:

  • Worked cross-functionally to define problem statements, collected data, conducted surveys, data entry on MS Excel and SPSS, Data Transformation, build analytical models and make recommendations
  • Used analytical models such as Multivariate analysis, descriptive analysis, Principle component Analysis, Clustering in Statistical tool SPSS and R to identify insights that are used to drive key decisions across the organization
  • Provide leadership and mentorship to other members of the team
  • Lead and support various ad-hoc projects, as needed, in support of organization center
  • Build and maintain data driven optimization models, experiments, forecasting algorithms and capacity constraint models
  • Leverage tools like SPSS, R, Tableau, PHP, Python, Hadoop & SQL to drive efficient analytics
  • Excellent in using data access tools and building visualizations using large datasets and multiple data sources
  • Experienced with packages such as NumPy, SciPy, pandas, scikit-learn, dplyr, ggplot2
  • Hands-on experience with medium to large datasets (i.e. data extraction, cleaning, analysis and presentation)

Environment: Model N Application, DB2, UNIX, Toad, Putty, HP Quality Center, SSIS, SSAS, SSRS, MS Excel, MS PowerPoint, MS Word, MS Project, Windows XP

We'd love your feedback!