We provide IT Staff Augmentation Services!

Data Scientist Resume

2.00/5 (Submit Your Rating)

Pal Alto, CA

SUMMARY:

  • Over 7 years of experience in IT and comprehensive industry knowledge on Machine Learning, Artificial Intelligence, Statistical Modeling, Data Analysis, Predictive Analysis, Data Manipulation, Data Mining, Data Visualization and Business Intelligence.
  • Data scientist with 2.5 years of experience in transforming business requirements into actionable data models, working in a variety of industries Banking/ Financial, Healthcare, Pharmaceutical & Insurance domains.
  • Experience in performing Feature Selection, Linear Regression, Logistic Regression, k - Means Clustering, Classification, Decision Tree, Supporting Vector Machines (SVM), Naive Bayes, K-Nearest Neighbors (KNN), Random Forest, and Gradient Descent, Neural Network algorithms to train and test the huge data sets.
  • Adept in statistical programming languages like Python, R and SAS including Big Data technologies like Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.
  • Expertized in Python data extraction and data manipulation, and widely used python libraries like NumPy, Pandas, and Matplotlib for data analysis.
  • Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to provide data summarization and Proficient in HiveQL, SparkSQL, PySpark. In depth knowledge in using of spark machine learning library MLlib.
  • Hands on experience and in provisioning virtual clusters under Amazon Web Service (AWS) cloud which includes services like Elastic compute cloud (EC2), S3, and EMR.
  • Proficient in designing and creating various Data Visualization Dashboards, worksheets and analytical reports to help users to identify critical KPIs and facilitate strategic planning in the organization utilizing Tableau Visualizations according to the end user requirements.
  • Strong familiarity in working with various statistical concepts such as Hypothesis Testing, t-Test, and Chi - Square Test, ANOVA, Statistical Process Control, Control Charts, Descriptive Statistics and Correlation Techniques.
  • Extensively worked on other machine learning libraries such as Seaborn, SciKit learn, SciPy for machine learning and familiar working with TensorFlow, NLTK for deep learning.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python.
  • Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
  • Exposed to the manipulating large data sets, by using R Packages like tidyr, tidyverse, dplyr reshape, lubridate, Caret and data visualization using ggplot2 packages.
  • Experience in working with Big Data technologies such as Hadoop, MapReduce jobs, HDFS, Apache Spark, Hive, Pig, Sqoop, Flume, Kafka and familiar with Scala Programming.
  • Good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, Fast Load, Multi Load, and Fast Export.
  • Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as GitHub/Git .
  • Exposure towards Azure Data Lake and Azure Storage.
  • Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS, SSIS and SSRS.
  • Quick learner having strong business domain knowledge and can communication the business data insights easily with technical and nontechnical clients.

TECHNICAL SKILLS:

Python, R, T: SQL, PL/SQL

Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit: learn, BeautifulSoup, MLlib, ggplot2, Rpy2, caret, dplyr, RWeka, gmodels, NLP, Reshape2, plyr.

Machine Learning: Linear Regression, Logistic Regression, Decision trees, Random forest, Association Rule Mining (Market Basket Analysis), Clustering (K-Means, Hierarchal), Gradient decent, SVM (Support Vector Machines), Deep Learning (CNN, RNN, ANN) using TensorFlow (Keras).

Statistical Tools: Time Series, Regression models, splines, confidence intervals, principal component analysis, Dimensionality Reduction, bootstrapping

Big Data: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, Flume, Oozie, Spark

BI Tools: Tableau, Amazon Redshift, Birst

Data Modeling Tools: Erwin r, Rational Rose, ER/Studio, MS Visio, SAP Power designer

Databases: MySQL, SQL Server, Oracle, Hadoop/Hbase, Cassandra, DynamoDB, Azure Table Storage, Natezza

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, SSRS, IBM Cognos7.0/6.0.

Version Control Tools: SVM, GitHub

Operating Systems: Windows, Linux, Ubuntu

WORK EXPERIENCE:

Data Scientist

Confidential, Pal Alto, CA

Responsibilities:

  • Used computer vision to automate Alternate Asset Servicing. The automation helped FM to reduce errors and improve operational efficiency. The automation improved trade processing by 27%.
  • Used Convolutional Neural Network (CNN) to perform image classification and object detection.
  • Investment managers buy trades on behalf of Credit Suisse Asset Management clients. These trades are in pdf format. The model accurately read the text and categorized the type of trades and assigned it to the analyst to settle the trade.
  • The system receives trades from FM via email attachments, these emails are sent to a queue, where by machine learning algorithms (ConvNets) fill in the gaps left by traditional OCR engines& accurately categorize the trade.
  • Model achieved 90% accuracy on the validation set.
  • Used Tensorflow packages to train machine learning models.
  • Wrote Python code to build a consensus model, to plot the data and wrote test cases in Python to test the code for analysis purposes.
  • Performed Data Quality Analysis Check using Statistical methods and Tools.

Environment: Python 3.4, PySpark, Jupyter, Numpy, Pandas, Scikit-Learn, Hadoop HDFS, Hive, Pig, OpenCV

Data Scientist

Confidential, Wilmington, DE

Responsibilities:

  • Developing, monitoring and maintenance of custom risk scorecards using advanced machine learning and statistical method. Recommending and implementing model changes with credit risk management team to improve performance of credit functions.
  • Working on cleaning the data using exploratory data analysis (EDA) and python libraries (NumPy, Pandas) by replacing the missing values using imputation techniques.
  • Developing predictive models using Decision Tree, Random Forest, Vector Machines and Naive Bayes and collaborating with marketing and devOps teams for production deployment.
  • Utilizing Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, and TensorFlow in Python for developing various machine learning algorithms.
  • Working on AWS which includes Amazon Kinesis, Amazon Simple Storage Service (Amazon S3), Spark Streaming, PySpark and Spark SQL on top of an Amazon EMR cluster.
  • Training and testing data using various Machine Learning algorithms like Linear & Logistic Regression, Naïve Bayes, Decision Trees, Random Forests, Clustering, SVM, Neural Networks, Principle Component Analysis, and Bayesian.
  • Performing Map Reduce jobs in Hadoop and implemented Spark analysis using Python for performing machine learning & predictive analytics on AWS platform.
  • Involving in creating data frames in Hadoop system, Spark using PySpark and then applying Hive/SQL queries into Spark transformations using Spark RDDs, Python libraries.
  • Vastly implementing Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, NoSQL databases like MongoDB.
  • Performed data preprocessing like cleaning (for outlier, missing values analysis, etc.) and Data Visualization (Scatter Plots, Box Plots, Histograms, etc.) using Matplotlib.
  • Worked on large scale of data sets and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
  • Utilized PySpark, Spark Streaming, MLlib, in Spark ecosystem with a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch (EC2).
  • Categorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.

Environment: Python 3x, Cloudera, Hadoop, Apache Spark, Hive, NumPy, NLTK, Pandas, SciPy, Map Reduce, Tableau, Sqoop, HBase, Oozie, HDFS, PySpark, NoSQL, Tableau, DynamoDB, Mongo DB, Teradata, SQL Server, AWS Redshift.

Data Analyst

Confidential, Irvine, CA

Responsibilities:

  • Identified problems with customer data and developed cost effective models by the root cause analysis.
  • Worked closely with teams of health services researchers and business analysts to draw insight and intelligence from large administrative claims datasets, electronic medical records and various healthcare registry datasets.
  • Developed and test hypotheses in support of research and product offerings, and communicate findings in a clear, precise, and actionable manner to our stakeholders.
  • Implemented big data processing applications to collect, clean and normalization large volumes of open data using Hadoop ecosystems such as PIG, HIVE, and HBase.
  • Applied various machine learning algorithms like decision tress, regression models, clustering, SVM to identify Volume using Scikit-learn packages in R.
  • Worked with various data formats such as JSON, XML, performed machine learning techniques using python and R.
  • Generated graphs and reports using ggplot, ggplot2 in R-Studio for analyzing models .
  • Integrate R into MicroStrategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool and communicated with users to build an understanding of the business process.
  • Worked with BI team in gathering the report requirements and used Sqoop to export data into Hadoop File System (HDFS) and Hive.
  • Involved in collecting and analyzing the internal and external data, data entry error correction, and defined criteria for missing values.
  • Developed Map Reduce jobs written in java, using Hive for data cleaning and preprocessing.
  • Exported the data required information to RDBMS using Hadoop Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Developed Map Reduce programs in Hadoop to extract and transform the data sets and results were exported back to RDBMS using Sqoop.

Environment: Python 2.x, Ski-Kit, R- Studio, ggplot2, XML, HDFS, Hadoop 2.3, Hive, Impala, Linux, Spark, SQL Server 2014, Microsoft Excel, MATLAB, Spark SQL, PySpark.

Tableau Developer

Confidential, Santa Ana, CA

Responsibilities:

  • Interacted with Business Analysts and Data Modelers for defined mapping and design documents for various data sources.
  • Created visual analytics for large data using Tableau on Sales and marketing Data, to assure integrity, identifying the root cause of data inconsistencies.
  • Worked on the dashboard reports for the Key Performance Indicators (KPI) for the top management.
  • Extensively worked in creating Aggregates, Hierarchies, Charts, Histograms, Filters, Quick filters, Cascading Filters, Table Calculations, Trend lines and calculates measures, LOD expressions Sets, Groups, Actions, Parameters etc.
  • Worked with various data sources such as Excel, MySQL, Teradata, and Oracle databases by blending data in to a single sheet.
  • Analyzed various reports, dashboards, scorecards in MicroStrategy and created the same using Tableau desktop server.
  • Provided customer support to Tableau users and Wrote Custom SQL to support business requirements.
  • Worked closely with reporting team for deploying Tableau reports and publishing them on the Tableau and Share point server.
  • Worked closely with ETL team for various trouble shooting issues.
  • Widely used Individual Axes, Blended Axis and Dual Axes.

Environment: Tableau Desktop 9.x/10.0, Table Servers 9.x/10.0, Tableau Repository, SQL Server, MySQL, Crystal Reports, MS-Excel, Teradata, Share Point, Agile.

Data Analyst

Confidential

Responsibilities:

  • Interacted with the Client and documented the Business Reporting needs to analyze the data.
  • Used SAS for pre-processing data, SQL queries, data analysis, generating reports, graphics, and statistical analyses .
  • Supported the research staff for technical and programming help. Worked with Bio statistician to analyze the results obtained from various statistical procedures like ANOVA, t test.
  • Performed data analysis, statistical analysis, and generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
  • Worked on SSIS Control Flow items (For Loop, execute package/SQL tasks, Script task, and send mail task) and SSIS Data Flow items (Conditional Split, Data Conversion, Fuzzy lookup, Fuzzy Grouping, Pivot).
  • Used Normalization methods up to 3NF De-Normalization techniques for effective OLTP systems.
  • Developed data mapping documentation to establish relationships between source and target tables including transformation processes using SQL.
  • Worked on Entity-Relationship concept, Facts and dimensions tables, slowly changing dimensions and Dimensional Modeling (Star Schema and Snow Flake Schema).
  • Testing database to examine the field size validation, check constraints, stored procedures to verify with metadata.

Environment: SAS/Enterprise Guide, SAS/SQL, SAS/BASE, SAS/MACROS, SAS/GRAPH, ANOVA, SQL Server 2008 R2, DB2, MS BI Suite (SSIS/SSRS), T-SQL, Share Point 2010, Visual Studio 2010, Crystal Reports, Agile/SCRUM

BI Developer

Confidential

Responsibilities:

  • Performing data analysis, data migration, data preparation, graphical presentation, statistical analysis, reporting, validation and documentation.
  • Created new Query Subjects and arranged in different folder structure for business View and manually created the query subjects using the complex SQL quires.
  • Responsible for the integration of the various data sources, validation, and system monitoring.
  • Responsible for report generation using SQL Server Reporting Services (SSRS) and Crystal Reports based on business requirements and connect with Teradata base for generating daily reports.
  • Created sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using SSRS.
  • Performed Gap analysis by identifying existing system, documenting the enhancements to meet the end state requirements.
  • Created, maintained, and adhered to Enterprise Data Modeling Standards while performing analysis of possibly sensitive data, and made recommendations in accordance with the objectives of the project.
  • Developed reports with appropriate properties set to show the reports in various output formats like HTML, PDF, CSV, Excel and expertise in formatting the reports for Excel output format.

Environment: ER Studio, MS Access, Netezza, DB2, T-SQL, SSIS, SSRS, SSAS, ETL, SQL Server 2008

We'd love your feedback!