We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Providence, RI

SUMMARY:

  • 9+ years of experience in building Data Science solutions using Machine Learning, Statistical Modeling, Data Mining, Natural Language Processing (NLP) and Data Visualization.
  • Provide thought leadership and strategic direction for where data science can solve challenging problems across the organization - determining where to focus, how to prioritize, and where to make investment to achieve optimal ROI.
  • Professional experience manipulating the Google Cloud platform (Google Analytics, Tag Manager, BigQuery).
  • Applied natural language processing (NLP) methods to clinical text to extract structured information.
  • Experienced on a wide spectrum of high visibility projects ranging from sales effectiveness, competitive intelligence, fraud detection, time series forecasting, operational efficiency, sourcing and procurement, supply-chain optimization and financial analysis.
  • Assist clients by delivering projects from beginning to end, including understanding the business need, aggregating data, exploring data, building & validating predictive models, and deploying completed models to deliver business impact to the organization.
  • Experience in Deep Learning frameworks like MXNet, Caffe 2, Tensorflow, Theano, CNTK, and Keras to help our customers build DL models.
  • Experience using SparkML and Amazon Machine Learning (AML) to build ML models.
  • Experience with platforms (Google Cloud, Azure, and AWS).
  • Ability to work efficiently under Unix/Linux environment with experience with source code management systems like GIT .
  • Ability to work with a variety of databases (SQL, ElasticSearch, Solr, Neo4j).
  • Work with DevOps consultants to operationalize models after they are built.
  • Expert level proficiency with statistical probabilistic modeling techniques such as regression, tree-based methods (Random Forest, GBM), neural networks, support vector machines, supervised/unsupervised clustering techniques (k-means, DBSCAN, Expectation Maximization), principal component and factor analysis.
  • Expert working within enterprise data warehouse environments platforms (Teradata, Netezza, Oracle, etc.) and working within distributed computing platforms such as Hadoop and associated technologies such as SQL, HQL, MapReduce, Spark, Storm, Yarn, Kafka, Sqoop and Hive.
  • Advanced knowledge of statistical and machine learning models (e.g., logistic regression, time series analysis, random forests, SVMs, XGBoost, CNNs/RNNs, Reinforcement Learning / Contextual Bandits techniques).
  • Network with business stakeholders to develop a pipeline of data science projects aligned with business strategies. Translate complex and ambiguous business problems into project charters clearly identifying technical risks and project scope.
  • Used the latest deep learning techniques to classify imaging studies and applied statistical models (with a focus on Bayesian methods) to assist researchers in analyzing missing, erroneous or incomplete patient data.
  • Provide thought leadership and strategic direction for where data science can solve challenging problems across the organization - determining where to focus, how to prioritize, and where to make investment to achieve optimal ROI.
  • Build strong relationships with business and technology leaders.
  • Introduce an agile/iterative development process to drive timely and impactful data science deliverables.
  • Identify gaps in existing data and work with Engineering teams to implement data tracking.
  • Partner with Analytics team members and other functions to share insights and best practices, ensuring consistency of data-driven decision-making throughout the organization.
  • Implement statistical and machine learning models, large-scale, cloud-based data processing pipelines and off the shelf solutions for test and evaluation; interpret data to assess algorithm performance.
  • Communicate with clinical and biomedical researchers to discover use cases and discuss solutions.
  • Proficiency in R (e.g. ggplot2, cluster, dplyr, caret), Python (e.g. pandas, numpy, scikit-learn, bokeh, nltk), Spark - MLlib, H20, or other statistical tools.
  • In-depth knowledge of databases, data modeling, Hadoop, and distributed computing frameworks.
  • Strong Experience operating in Big Data Pipelines (Spark, Hive, Presto, SQL engines) batch and streaming.
  • Experience in software development environment, Agile, and code management/versioning (e.g. git).
  • Design, train and apply statistics, mathematical models, and machine learning techniques to create scalable solutions for predictive learning, forecasting and optimization.
  • Communicate findings to a broad audience (e.g., clinicians, computer scientists, and the general public).
  • Lead data acquisition, data mining, and analysis techniques to social media websites (e.g. Twitter, Facebook, Instagram) and cell phone sensor data.
  • Develop novel ways to apply published machine learning models to imperfect clinical data including development of training datasets.
  • Develop high-quality, secure code implementing models and algorithms as application programming interfaces or other service-oriented software implementations.
  • Expert in statistical probabilistic modeling techniques such as regression, decision trees, neural networks, support vector machines, supervised/unsupervised clustering techniques, etc.
  • Experience with medical terminologies such as UMLS, SNOMED CT, ICD-9, ICD-10.
  • Extensive hands-on experience in modeling with massive distributed data-sets.
  • Extensive hands-on experience in navigating complex relational datasets in both structured and semi-unstructured formats.
  • Experience working with engineers in designing scalable data science flows and implementing into production.
  • Excellent communication and presentation skills and ability to explain technical concepts in simple terms to business stakeholders.
  • Experienced in Data visualization using Tableau, QlikView, Power BI and Alteryx.

TECHNICAL SKILLS:

Languages: C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R, Python 2.x/3.x, Java, C, SQL, Shell Scripting

NO SQL Databases: Cassandra, HBase, MongoDB, Maria DB

Statistics: Hypothetical Testing, ANOVA, Confidence Intervals, Bayes Law, MLE, Fish Information, Principal Component Analysis (PCA), Cross-Validation, correlation.

BI Tools: Tableau, Tableau server, Tableau Reader, Splunk, SAP Business Objects, OBIEE, SAP Business Intelligence, QlikView, Amazon Redshift, or Azure Data Warehouse

Algorithms: Logistic Regression, Random Forest, XG Boost, KNN, SVM, Neural Network rk, Linear Regression, Lasso Regression, K-means.

Big Data: Hadoop, HDFS, HIVE, PuTTy, Spark, Scala, Sqoop

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

PROFESSIONAL EXPERIENCE:

Confidential - Providence, RI

Data Scientist

Responsibilities:

  • Managed the behavior score risk modeling for consumer, and business credit card products .
  • Solved business problems including segmenting customers by purchasing behavior, modelling customer profitability and lifetime, forecasting financial metrics on the scale of months or years, predicting win/loss rates in contract negotiations.
  • Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot; Box plots to determine the condition of the data .
  • Worked on data processing on very large datasets that handle missing values, creating dummy variables and various noise in data .
  • Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis .
  • Implemented ridge regression, subset selection methods to choose most statistically significant variables for analysis.
  • Used various machine learning algorithms such as Linear Regression, Ridge Regression, Lasso Regression, Elastic net regression, KNN, Decision TreeRegressor, SVM, Bagging Decision Trees, Random Forest, AdaBoost, and XGBoost .
  • Built classification machine learning model in Python to predict the probability of a customer to be default in credit card payment and improved the accuracy of the model by 12% .
  • Predicted potential credit card defaulters with 84% accuracy with Random Forest.
  • Provide expertise and consultation regarding consumer and small business behavior score modeling issues and gives advice and guidance to risk manager using the models in strategies.
  • Participate in strategically-critical analytic initiatives around customer segmentation, channel preference and targeting/propensity scoring.
  • Build customer journey analytic maps and utilize NLP to enhance the customer experience and reduce customer friction points.
  • In partnership with Marketing, utilize Machine Learning to improve customer retention and product deepening across all Citizen's Financial products, including mortgages, cards and auto.
  • Works closely with other model risk governance, credit bureaus, external consultants and compliance and regulatory response to ensure proper development and installation of performance reporting and model tracking .
  • Leverage a broad stack of technologiesPython, Docker, AWS, Airflow, and Spark to reveal the insights hidden within huge volumes of numeric and textual data.
  • Build machine learning processes that monitor and provide feedback on how or where to improve predictive models already deployed.
  • Work closely with business stakeholders, Financial Analysts, Data Engineers, sound organizational decisions.Data Visualization Specialists and other team members to turn data into critical information and knowledge that can be used to make
  • Use creative thinking and propose innovative ways to look at problems by using data mining (the process of discovering new patterns from large datasets) approaches across a wide range and variety of data assets.
  • Present back their findings to the business by exposing their assumptions and validation work in a way that can be easily understood by business teams.
  • Use a combination of business focus, strong analytical and problem solving skills and programming knowledge to be able to quickly cycle hypotheses through the discovery phase of the project.
  • Establish and maintain effective working relationships with team members, as well as an innate curiosity around wanting to understand business processes, business strategy and strategic business initiatives to help drive incremental business value from enterprise data assets .
  • Designed and Developed reports, applied transformation for the Data Model, Data validation, established data relationships in Power BI and created supporting documentation for Power BI.
  • Organize business needs into ETL/ELT logical models and ensure data structures are designed for flexibility to support scalability of business solutions.
  • Craft and implement data pipelines utilizing Glue, Lambda, Spark, Python .
  • Work with Data Engineers to determine how to best source data, including identification of potential proxy data sources, and design business analytics solutions, considering current and future needs, infrastructure and security requirements, load frequencies, etc.

Environment: Python, PyCharm, Jupyter Notebook, Spyder, R, Tableau, Power BI, AWS, MySQL.

Confidential - Northbrook, IL

Data Scientist

Responsibilities:

  • Deliver analytics and insight to address a wide range of business needs utilizing various secondary data sources (e.g., Sales, Rx, HCP/Patient/Payer-level data, Formulary data, quantitative research outputs, etc.)
  • Utilize MapReduce and PySpark programs to process data for analysis reports.
  • Work on data cleaning to ensure data quality, consistency, and integrity using Pandas/Numpy.
  • Perform data preprocessing on messy data including imputation, normalization, scaling, feature engineering etc. using Scikit-Learn.
  • Conduct exploratory data analysis using Matplotlib and Seaborn. Maintain and monitor adherence program reporting. Design, experiment and test hypotheses. Apply advanced statistical and predictive modeling techniques to build, maintain and improve on real-time decision-making.
  • Built classification models based on Logistic Regression, Decision Trees, Random Forest Support Vector Machine, and Ensemble algorithms to predict the probability of absence of patients.
  • Implement and test the model on AWS EC2; collaborated with development team to get the best algorithm and parameters.
  • Leverage appropriate advanced and sophisticated methods and approaches to synthesize, clean, visualize and investigate data as appropriate to deliver analytical recommendations aligned with the business need.
  • Analyze disease diagnoses, phenotypic traits, patient demographics, and genetics for epidemiological studies.
  • Utilize NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
  • Analyze longitudinal time series data to characterize disease trajectories, disease progression, medication adverse event episodes, drug resistance, and disease comorbidities.
  • Use Machine Learning to graph human biology based upon vast data sets. Run experiments, synthesize molecules then do Phase 1 and 2 trials.
  • Work with bioinformatics colleagues to conduct integrative analyses of EMR data and genomics data to identify potential novel therapeutic targets, to develop predictive models for prognosis and treatment response, and to stratify patient populations for clinical trials.
  • Identify risks and opportunities that impacts the performance of the business and convert them into analytical solutions and provide appropriate actionable insights.
  • Build predictive models including Support Vector Machine, Decision tree, Naive Bayes Classifier, CNN and RNN basics to predict whether the thyroid cancer cell is under potential danger of spreading by using python scikit-learn.
  • Design and implement a recommendation system which leveraged Google Analytics data and the machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.
  • Collaborate with data engineers and operation team to implement ETL process, write and optimize SQL queries to perform data extraction to fit the analytical requirements.
  • Explore and analyse the customer specific features by using Matplotlib and ggplot2. Extract structured data from MySQL databases, developing basic visualizations or analysing A/B test results.
  • Implement training process using cross-validation and evaluated the result based on different performance matrices.
  • Leverage BI tools like Tableau Desktop to develop business dashboards enabling leaders for decision making and forecasting the number of credit card defaulters monthly.
  • Organize reports and produced rich data visualizations to model data into human-readable form with the Tableau, Matplotlib and Seaborn to show the management team how prediction can help the business.
  • Currently considering more factors such as dietary habits, work environment, mental state to explore the possibility of making improved predictions.

Environment: Python (Scikit-Learn/Keras/Scipy/Numpy/Pandas/ Matplotlib/Seaborn), Machine Learning (Linear and Non-linear Regressions, Deep Learning, SVM, Decision Tree, Random Forest, XGboost, Ensemble andKNN), MySQL, AWS RedShift, S3, Hadoop Framework, HDFS, Spark (Pyspark, MLlib, Spark SQL), Tableau Desktop and Tableau Server.

Confidential - Minneapolis, MN

Data Scientist

Responsibilities:

  • Responsible for conducting data-driven strategic analyses and developing internal decision-making tools/products to enable retail team decision making.
  • Constantly looking for opportunities to develop machine learning and statistical models to automate business decision making.
  • Created highly impactful solutions to complex problems alongside a talented team in a cross-functional (e.g. product, merchandising, supply chain) environment.
  • Performed in-depth analyses such as cost-benefit, invest-divest, forecasting, predictive, what-if, impact, etc. to help the Confidential focus on key decisions to improve safety, employee engagement, operational efficiency, product quality, and customer satisfaction.
  • Created executive level analysis and UI in Tableau or Looker for business performance insights.
  • Accelerated delivery of Retail industry insights including coverage of key sales events (e.g. Black Friday / Cyber Monday, Amazon Prime Day, etc.).
  • Derived actionable insights from massive data sets with limited oversight.
  • Automated quarterly insights across native / partner channels for external facing analyst sessions as well as internal tracking.
  • Built analytics expertise in multiple channels including display, website personalization, search, email, social, digital OOH, and cross-channel attribution.
  • Provided input into the refinement of existing data sources and the collection of new ones to improve the development of insights and predictive models.
  • Developed predictive Model using historical and current data to identify interested customers for Email Campaign.
  • Experience in Gradient Boosting algorithms like LightGBM, and XGBoost.
  • Involved in developing the sentiment and outcome based on NLP using log files of voice text for sentiment analysis.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
  • Used Python (NumPy, Scipy, Pandas, Scikit-Learn, Seaborn), and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Utilized spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Implemented, tuned, and tested the model on AWS EC2 to get the best algorithm and parameters.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Designed and developed machine learning models in Apache - Spark (MLlib).
  • Used NLTK in Python for developing various machine learning algorithms.
  • Implemented deep learning algorithms such as Artificial Neural network (ANN) and Recurrent Neural Network (RNN), tuned hyper-parameter and improved models with Python packages TensorFlow.
  • Modified selected machine learning models with real-time data in in Spark (PySpark).
  • Worked on different formats such as JSON, XML and performed machine learning algorithms in Python.
  • Participated in all phases of datamining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Worked very close with Data Architects and DBA team to implement data model changes in the database in all environments.
  • Used Pandas library for statistical Analysis.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: Python, Hive, Oozie, Tableau, HTML5, CSS, XML, MySQL, JavaScript, AWS, S3, EC2, Linux, Jupyter Notebook, RNN, ANN, Spark, Hadoop, Machine Learning, Deep Learning, R, TensorFlow, Spark 2.2, Scala, Linux, Spark SQL.

Confidential - Mahwah, NJ

Data Analyst

Responsibilities:

  • Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database .
  • Gathered, analyzed, and translated business requirements, communicated with other departments to collected client business requirements and access available data.
  • Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python .
  • Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
  • Used information value, principal components analysis, and Chi square feature selection techniques to identify.
  • Applied resampling methods like Synthetic Minority Over Sampling Technique (SMOTE) to balance the classes in large data sets.
  • Designed and implemented customized Linear regression model to predict the sales utilizing diverse . sources of data to predict demand, risk and price elasticity.
  • Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector .
  • Machine (SVM), Random Forest, AdA boost and Gradient boosting using Python Scikit-Learn and evaluated the performance on customer discount optimization on millions of customers.
  • Used F-Score, AUC/ROC, Confusion Matrix and RMSE to evaluate different model performance .
  • Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Used Keras for implementation and trained using cyclic learning rate schedule.
  • Overfitting issues was resolved by batch norm, dropout helped to overcome the issue.
  • Conducted in-depth analysis and predictive modelling to uncover hidden opportunities; communicate insights to the product, sales and marketing teams.
  • Built models using Python and Pyspark to predict the probability of attendance for various campaigns and events.
  • Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft .
  • Excel, PySpark-ML, Random Forests, SVM, Tensor Flow, Keras .

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Linux, Git, Microsoft Excel, PySpark-ML, Random Forests, SVM, Tensor Flow, Keras.

Confidential

Data Analyst/ Python Developer

Responsibilities:

  • Used python libraries like Beautiful Soap, NumPy .
  • Created various types of data visualizations using Python and Tableau .
  • Monitoring and tracking process performance using analytics tools like Tableau dashboard, R .
  • Utilized standard Python modules such as csv, robot parser, iterators and pickle for development.
  • Developed MapReduce programs to parse the raw data, and create intermediate data which would be further used to be loaded into Hive portioned data.
  • Involved in creating Hive ORC tables, loading the data into it and writing Hive queries to analyze the data.
  • Experience in performance analysis and capacity planning for growing MongoDB and Hadoop clusters.
  • Created views in Tableau Desktop that were published to internal team for review and further data analysis and customization using filters and actions.
  • Worked on Python OpenStack APIs and used NumPy for Numerical analysis .
  • Used Python scripts to update content in the database and manipulate files.
  • Used Python creating graphics, data exchange and business logic implementation.
  • Performed troubleshooting, fixed and deployed many Python bug fixes of the applications and involved in fine tuning of existing processes followed advance patterns and methodologies .
  • Skilled in using collections in Python for manipulating and looping through different user defined objects.
  • Used DDL and DML for writing triggers, stored procedures, and data manipulation.
  • Interacted with Team and Analysis, Design and Develop database using ER Diagram, involved in Design, Development and testing of the system .
  • Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes) .
  • Installed numerous python packages using pip and easy install .

Environment: Python 2.7, Tableau, R, Windows XP, UNIX, HTML, SQL server 2005.

We'd love your feedback!