We provide IT Staff Augmentation Services!

Data Scientist Resume

SUMMARY

  • 7 years of experience in building Data Science solutions using Machine Learning, Statistical Modeling, Data Mining, Natural Language Processing (NLP) and Data Visualization.
  • Theoretical foundations and practical hands - on projects related to (i) supervised learning (linear and logistic regression, boosted decision trees, GBM, Support Vector Machines, neural networks, NLP), (ii) unsupervised learning (clustering(k-means, DBSCAN, Expectation Maximization), dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis, principal component and factor analysis, confidence intervals, A/B testing, (iv) algorithms and data structures.
  • Experience in building various machine learning models using algorithms such as Gradient Descent, KNN, Ensembles such as Random Forest, AdaBoost, Gradient Boosting Trees.
  • Experienced on a wide spectrum of high visibility projects ranging from sales effectiveness, competitive intelligence, fraud detection, time series forecasting, operational efficiency, sourcing and procurement, supply-chain optimization and financial analysis.
  • Provide thought leadership and strategic direction for where data science can solve challenging problems across the organization - determining where to focus, how to prioritize, and where to make investment to achieve optimal RO.
  • Experience in Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.
  • Assist clients by delivering projects from beginning to end, including understanding the business need, aggregating data, exploring data, building & validating predictive models, and deploying completed models to deliver business impact to the organization.
  • Experience using Spark and Amazon Machine Learning (AML) to build ML models.
  • Experience with platforms (Azure, and AWS).
  • Partner with Analytics team members and other functions to share insights and best practices, ensuring consistency of data-driven decision-making throughout the organization.
  • Ability to work with a variety of databases (SQL, Elasticsearch).
  • Work with DevOps consultants to operationalize models after they are built.
  • Experience in Deep Learning frameworks like, TensorFlow and Keras to help our customers build DL models.
  • Expert working within enterprise data warehouse environments platforms. and working within distributed computing platforms such as Hadoop.
  • Ability to work efficiently under Linux environment with experience with source code management systems like GIT.
  • Demonstrate ability to apply relevant techniques to drive business impact and help with Optimization, causal inference, and choice modeling.
  • Network with business stakeholders to develop a pipeline of data science projects aligned with business strategies. Translate complex and ambiguous business problems into project charters clearly identifying technical risks and project scope.
  • Build strong relationships with business and technology leaders.
  • Experienced in agile/iterative development process to drive timely and impactful data science deliverables.
  • Identify gaps in existing data and work with Engineering teams to implement data tracking.
  • Implement statistical and machine learning models, large-scale, cloud-based data processing pipelines and off the shelf solutions for test and evaluation; interpret data to assess algorithm performance.
  • Communicate with clinical and biomedical researchers to discover use cases and discuss solutions.
  • Proficiency in R (e.g. ggplot2, cluster, dplyr, caret), Python (e.g. pandas, numpy, scikit-learn, bokeh, nltk) statistical tools.
  • Experience in software development environment, Agile, and code management/versioning (e.g. git).
  • Design, train and apply statistics, mathematical models, and machine learning techniques to create scalable solutions for predictive learning, forecasting and optimization.
  • Communicate findings to a broad audience (e.g., clinicians, computer scientists, and the general public).
  • Lead data acquisition, data mining, and analysis techniques to social media websites (e.g. Twitter, Facebook, Instagram) and cell phone sensor data.
  • Develop novel ways to apply published machine learning models to imperfect clinical data including development of training datasets.
  • Develop high-quality, secure code implementing models and algorithms as application programming interfaces or other service-oriented software implementations.
  • Experience working with engineers in designing scalable data science flows and implementing into production.
  • Excellent communication and presentation skills and ability to explain technical concepts in simple terms to business stakeholders.
  • Experienced in Data visualization using Tableau, Weka, Power BI.
  • Extensive hands-on experience in modeling with massive distributed data-sets.
  • Extensive hands-on experience in navigating complex relational datasets in both structured and semi-unstructured formats.

TECHNICAL SKILLS

Languages: C, Python, R, SAS, PL/SQL, SQL

Python and R: Numpy, SciPy, Pandas, Scikit-learn, Matplotlib, Seaborn, ggplot2, caret, dplyr, purrr, readxl, tidyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, Beautiful Soup, Rpy2

NLP/Machine Learning/Deep Learning: LDA (Latent Dirichlet Allocation), NLTK, Sentiment Analysis, SVMs, RNN, CNN, TensorFlow, Keras, PyTorch, Azure ML

Algorithms: Kernel Density Estimation and Non-parametric Bayes Classifier, K-Means, Linear Regression, Neighbors (Nearest, Farthest, Range, k, Classification), Non-Negative Matrix Factorization, Dimensionality Reduction, Decision Tree, Gaussian Processes, Logistic Regression, Naïve Bayes, Random Forest, Ridge Regression, Matrix Factorization/SVD

Cloud: AWS, Azure

Web Technologies: Django, Flask, HTML5, Java Script, CSS3, Bootstrap

Databases: SQL, Hive, My SQL, MS Access, HDFS, Cassandra.

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Kafka

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Business Intelligence

Version Control Tools: SVM, GitHub

BI Tools: Tableau, Weka, Power BI

Operating System: Windows, Linux

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

  • Collaborate with cross-functional teams to solve critical business problems, drive operational efficiencies, and deliver successfully on high visibility strategic initiatives.
  • Deliver analytics and insight to address a wide range of business needs utilizing various secondary data sources such as Patient level data .
  • Make right choices from a breadth of tools, data sources and analytical techniques to answer a wide range of critical business questions.
  • Perform data preprocessing on messy data including imputation, normalization, scaling, feature engineering etc. using Scikit-Learn .
  • Conduct exploratory data analysis using Matplotlib and Seaborn . Maintain and monitor adherence program reporting. Design, experiment and test hypotheses. Apply advanced statistical and predictive modeling techniques to build, maintain and improve on real-time decision-making .
  • Articulate solutions/recommendations to business users. Present analytical content concisely and effectively to non-technical audiences and influences non-analytical business leaders to drive major strategic decisions basis analytical inputs .
  • Built classification models based on Logistic Regression, Decision Trees, Random Forest Support Vector Machine, and Ensemble algorithms to predict the probability of absence of patients.
  • Work on data cleaning to ensure data quality, consistency, and integrity using Pandas/Numpy .
  • Implement and test the model on AWS EC2 ; collaborated with development team to get the best algorithm and parameters .
  • Leverage appropriate advanced and sophisticated methods and approaches to synthesize, clean, visualize and investigate data as appropriate to deliver analytical recommendations aligned with the business need.
  • Analyze disease diagnoses, phenotypic traits, patient demographics, and genetics for epidemiological studies .
  • Utilize NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets .
  • Analyze longitudinal time series data to characterize disease trajectories, disease progression, medication adverse event episodes, drug resistance, and disease comorbidities .
  • Build predictive models including Support Vector Machine, Decision tree, Naive Bayes Classifier, CNN and RNN basics to predict whether the thyroid cancer cell is under potential danger of spreading by using python scikit-learn .
  • Collaborate with data engineers and operation team to implement ETL process, write and optimize SQL queries to perform data extraction to fit the analytical requirements.
  • Implement training process using cross-validation and evaluated the result based on different performance matrices .
  • Leverage BI tools like Tableau Desktop to develop business dashboards enabling leaders for decision making and forecasting the number of credit card defaulters monthly.
  • Organize reports and produced rich data visualizations to model data into human-readable form with the Tableau, Matplotlib and Seaborn to show the management team how prediction can help the business

Environment: s: Python (Scikit-Learn/Keras/Scipy/Numpy/Pandas/ Matplotlib/Seaborn), Linear and Non-linear Regression, Deep Learning, SVM, Decision Tree, Random Forest, XGboost, KNN, MySQL, Hadoop Framework, Tableau Desktop and Tableau Server.

Confidential

Data Scientist

Responsibilities:

  • Gathering requirements from business and Reviewing business requirements and analyzing data sources.
  • Performed Data collection, Data Cleaning, features scaling, features engineering, validation, Visualize, interpret, report findings, and develop strategic uses of data by python libraries like NumPy, Pandas, SciPy, Scikit-Learn.
  • Involved with Recommendation Systems such as Collaborative filtering and content-based filtering.
  • Implemented various statistical techniques to manipulate the data like missing data imputation, principle component analysis, sampling and t-SNE for visualizing high dimensional data.
  • Worked with Customer Churn Models including Random forest regression, lasso regression along with pre-processing of the data.
  • Explored and visualized the data to get descriptive statistics and inferential statistics for better understanding the dataset.
  • Built predictive models including support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network plus ensemble methods of the models to evaluate how the likelihood to recommend of customer groups would change in different set of service by using pythonscikit-learn.
  • Implemented training process using cross-validation and test sets, evaluated the result based on different performance matrices and collected feedback and retrained the model to improve the performance.
  • Configured SQL database to store Hive metadata.
  • Use clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana etc.
  • Work with NLTK library to NLP data processing and finding the patterns.
  • Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Use Jupyter Notebook to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured data processing.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Customer segmentation based on their behavior or specific characteristics like age, region, income, geographical location and applying Clustering algorithms to group the customers based on their similar behavior patterns.
  • The results from the segmentation helps to learn the Customer Lifetime Value of every segment and discover high value and low value segments and to improve the customer service to retain the customers.
  • Used Principal Component Analysis and t-SNE in feature engineering to analyze high dimensional data.
  • Work collaboratively with a team of scientists, analysts, compliance specialists, and the business lines to implement advanced methods to maintain and enhance the Insurence mitigation program.
  • Work closely with business stakeholders, Financial Analysts, Data Engineers, Data Visualization Specialists and other team members to turn data into critical information and knowledge that can be used to make sound organizational decisions. Propose innovative ways to look at problems by using data mining (the process of discovering new patterns from large datasets) approaches across a wide range and variety of data assets.
  • Strengthen the business and help support clients by using data to describe and model the outcomes of investment and business decisions.
  • Validate their findings using an experimental and iterative approach. Present back findings to the business team by exposing their assumptions and validation work in a way that can be easily understood the business counterparts.
  • Use analytics techniques to model complex business problems, discovering insights and identifying opportunities through the use of statistical, algorithmic, mining and visualization techniques.
  • Integrating and preparing large, varied datasets, implementing specialized database and computing environments, and communicating results.
  • Improve organizational performance though the application of original thinking to existing and emerging analytic methods, processes, products and services, and employ sound judgment in determining how innovations will be deployed to produce return on investment.
  • Work with Data Engineers and determine how to best source data, including identification of potential proxy data sources, and design business analytics solutions, considering current and future needs, infrastructure and security requirements, and load frequencies.

Environment: s: Python, PyCharm, Jupyter Notebook, Spyder, R, Tableau, Power BI, AWS, MySQL.

Confidential

Data Scientist

Responsibilities:

  • Carrying out specified data processing and statistical techniques such as s ampling techniques, hypothesis testing, time series, correlation and regression analysis using R .
  • Build predictive models to elevate the customer experience and drive revenue growth in our restaurants globally.
  • Implement globally scalable, cross brand solutions related to data science .
  • Requirement gathering, analysis and hypothesis generation with operations team.
  • Data collections and exploratory data analysis in R .
  • Develop data science solutions as currently defined by the existing product roadmap .
  • Prototype algorithms using Python and/or other languages with appropriate libraries and frameworks .
  • Extensively used python's multiple data science packages like pandas, Numpy, matplotlib, scipy, scikit-learn and NLKT .
  • Created the dashboard and reports in Tableau for visualizing the data in required format.
  • Extracting data from over one lac excel sheets with different formats in R .
  • Understanding business process work-flow and preparing the list of deliverables by discussing with operational team.
  • Explored different regression and ensemble models in machine learning to perform forecasting.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis .
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Segmented the customers based on demographics using K-means Clustering .
  • Extracting information like name, qualification, experience, key skills, present company name, previous company name, present location, date of birth, Email id and contact number using regular expressions in R programming tool .

Environment: s: Python (Scikit-Learn/Keras/Scipy/Numpy/Pandas/ Matplotlib/Seaborn), Linear and Non-linear Regression, Deep Learning, SVM, Decision Tree, Random Forest, XGboost, KNN, MySQL, Hadoop Framework, Tableau Desktop and Tableau Server.

Hire Now