We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Experienced Data Scientist with 5+ years of experience in Acquisition of Datasets, Data Engineering to extract features using Statistical Techniques, performing Exploratory Data Analysis, build diverse Machine Learning Algorithms for developing Predictive Models and plotting visualizations for Business profitability.
  • Outstanding preeminence in Data extraction, Data cleaning, Data Loading, Statistical Data Analysis, Exploratory Data Analysis, Data Wrangling, Predictive Modeling using R, Python and Data visualization using Tableau.
  • Profound knowledge in Machine Learning Algorithms like Linear, Non - linear and Logistic Regression, SVR, Natural Language Processing, Random forests, Ensemble Methods, Decision tree, Gradient-Boosting, K-NN, SVM, Naïve Bayes, K-Means Clustering.
  • Experienced in implementing Ensemble Methods (Bagging and Boosting) to enhance the model performance.
  • Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA, K-fold cross validation, R-Square, CAP Curve, Confusion Matrix, ROC plot, Gini Coefficient and Grid Search.
  • Rapidly evaluated Deep Learning frameworks, tools, techniques and approaches for Implementing and building data ingestion pipelines for Neural Networks that included CNNs and RNNs like LSTMs using Tensor Flow and Keras.
  • Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Seaborn, Matplotlib, NLTK and Scikit-learn)
  • Experienced in visualization tools like, Tableau9.X, 10.X for creating KPI’s, Forecasting & other analytical dashboards.
  • Strong understanding of advanced Tableau features including calculated fields, parameters, table calculations, row-level security, R integration, joins, data blending, and dashboard actions
  • Scheduled & distributed reports in multiple formats using Tableau, SSRS and Visual Studio.
  • Worked in data collection, transformation, and storage from various sources including relational databases, APIs, logs, and unstructured file.
  • Strong knowledge in Database, Data warehouse concepts, ETL processes & dimensional modeling, (Star and Snowflake Schemas).
  • Strong experience in Big data technologies like Spark 1.6, Spark sql, pySpark, Hadoop 2.X, HDFS, Hive.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQLServer2008, NoSQL databases like MongoDB3.2
  • Created database objects like tables, indexes, views, user defined functions, stored procedures, triggers, cursors, data integrity using SQL. Experienced in SQL tuning techniques.
  • Progressive involvement in Software Development Life Cycle (SDLC), GIT, Agile methodology and SCRUM process using powerful change management tools like Jira & Service Now.
  • Strong business sense and abilities to communicate data insights to both technical and non-technical clients.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining & reporting solutions that scales across massive volume of structured and unstructured data.

TECHNICAL SKILLS

Programming Tools: Python, R,R Studio, RShiny, SQL Server, C, C++, Java

Visualization Tools: Tableau, SSRS, Power BI& Qlikview

Databases: MySQL, SSIS, SSRS, SAS, Azure SQL Data warehouse, Spark SQL, Azure Data Lake Analytics

Other Tools: MS Excel, Google Analytics

Big Data: Hadoop, Pig, Hive, Flume, PySpark, Map Reduce, Scala

Statistical Models: Logistic regression, Clustering, Prediction Models, Data Mining, Natural Language Processing

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Data Scientist

Responsibilities:

  • Analyzed and processed complex claims data sets using advanced querying, Visualization and analytics tools.
  • Identified, measured and recommended improvement strategies for KPIs across all business areas specializing in claims & risk management.
  • Developed intricate algorithms based on deep-dive statistical analysis and predictive data modeling that are used to deepen relationships, strengthen longevity and personalize interactions with customers.
  • Conducted analysis of customer behaviors and discovered the value of customers with RMF analysis; applied customer segmentation with clustering algorithms such as K-Means Clustering. Gaussian Mixture Model and Hierarchical Clustering.
  • Acquired complex datasets, analyzed, and interpreted the patterns/trends using regression, classification and clustering ML techniques: Linear & Logistic regression, Naïve Bayes, Decision Trees, Random Forests, Support Vector Machines, Neural Networks, Principle Component Analysis and XG Boost.
  • Design built and deployed a set of python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation programs.
  • Used classification techniques like Random Forest and Logistic Regression to quantify the likelihood of each user referring.
  • Explored different regression and ensemble methods in Machine Learning to perform forecasting.
  • Collected large structured and unstructured datasets and variables and preformed Data Mining to satisfy risk and compliance business needs.
  • Performed Data Cleaning, which includes transforming variables, and dealing with the missing value and ensured data quality, consistency and integrity using Pandas and Numpy.
  • Used Backward-Forward filling methods on dataset’s for handling missing values.
  • Worked on different data formats such as JSON, XML and performed machine-learning algorithms in Python.
  • Developed Machine Learning algorithms on Big Data using PySpark to analyze fraudulent transactions, Cluster analysis etc.
  • Identified and integrated new datasets from various data sources including Oracle, DB2, SQL Server, AWS, and Azure by employing languages such as SPARK, HDFS, Hive and PIG Latin by working closely with the data engineering team to strategize and execute the development of models.
  • Implemented a Python based distributed Random Forest via PySpark.
  • Experience working with Big Data Tools such as Hadoop, Map Reduce, Hive QL, Pig Latin and Apache Spark.
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating customs, visualization tools using R, R Studio, R Shiny, Tableau and Power BI.
  • Using R programming analyzed and made inferences on cleaned data using packages such as dplyr, tidyr, readr, SparkR and ggplot2.
  • Developed Tableau workbooks to perform year over year, quarter over quarter, YTD, QTD and MTD type of analysis
  • Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports.

Tools: Machine Learning, MS SQL Server, R Studio, Python, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS Office, Outlook.

Confidential, Irving, TX

Data Analyst

Responsibilities:

  • Extensively involved in all phases of data acquisition, data collection, data cleaning, model development, model validation and visualization to deliver data science solutions.
  • Used Python and R for programming for improvement of model. Upgraded the entire models for improvement of the product.
  • Developed various machine-learning models such as Logistic regression, KNN with Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python.
  • Experimented and built predictive models including ensemble models using machine-learning algorithms such as Logistic regression, Random Forests, and KNN to predict customer churn.
  • Implemented public segmentation using unsupervised algorithms like K-means algorithm.
  • Used F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different models’ performance.
  • Utilized PCA, t-SNE and other feature engineering techniques to reduce the high dimensional data, applied feature scaling, handled categorical attributes using one hot encoder of Scikit-learn library
  • Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis.
  • Worked on Natural Language Processing with NLTK module of python and developed NLP models for sentiment analysis
  • Performed data cleaning including transforming variables and dealing with missing value and ensured data quality, consistency, integrity using Pandas, Numpy.
  • Tackled highly imbalanced Fraud dataset using sampling techniques like under sampling and oversampling with SMOTE (Synthetic Minority Over-Sampling Technique) using Python Scikitlearn.
  • Extracted data from SQL Server Database copied into HDFS File system and used Hadoop tools such as Hive to retrieve and analyze the data required for building models.
  • Retrieved data from Hadoop Cluster by developing a pipeline using Hive (HQL), SQL to retrieve data from Oracle database and used ETL for data transformation.
  • Identified and evaluated various distributed machine learning libraries like ML Lib (Apache Spark) and R.
  • Implemented a Python-based distributed random forest via PySpark and MLlib.
  • Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau. Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Adept at developing Interfaces, creating queries, retrieving the data from database and executing and performing queries on the dataset and get reports.
  • Lead discussions with users to gather business processes requirements and data requirements to develop a variety of conceptual, logical and Physical Data models.

Tools: Machine Learning, Python (scikit-leanr, Scipy Numpy, Pandas, Matplotlib, Seaborn), SQL Server, Hadoop, HDFS, Hive, Pig Latin, Apache Spark/ PySpark, Tableau, Linux.

Confidential, Plano, TX

Data Analyst

Responsibilities:

  • Analyzed large datasets to identify trends, built an in-depth understanding of the business domain and available data assets
  • Analyzed data to identify gaps, inconsistencies and descriptive statistical analysis and inferential statistical analysis.
  • Used data assets to gain insights into current business processes and proposed enhancements for core decision-making processes.
  • Performed Data Cleaning, Data blending, and Predictive modeling, Data mining and Data visualization approaches using Machine Learning Techniques.
  • Used programming techniques in Python, Apache Spark, Tensor flow, Scikit-learn and Pandas.
  • Used Statistical Models: Regression, classification and Clustering techniques and algorithms.
  • Worked on Jupyter environment to use Numpy, Pandas and Seaborn packages to perform Machine Learning techniques, Classifiers, Descriptive, Prescriptive and Predictive analysis.
  • Demonstrated success in executing high value, enterprise projects with clear articulation of benefits, costs and risk.
  • Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
  • Worked on feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing.
  • Used Oracle PL/SQL Tables, Cursors to process huge volumes of data and used bulk collect for mass update as performance improvement process.
  • Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
  • Developed reports and automated the daily reports on the inventory and price match products.

Tools: SQL, Tableau, Python, ETL pipelining, SSIS, SSRS, SQL Server, VBA, Excel (pivot tables, index, lookups, formulas), Data quality, mapping, Database design & architecture, R, Tableau, Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering/ Hierarchical Clustering/ Ensemble methods/ Collaborative filtering)

Confidential

Jr. Data Analyst

Responsibilities:

  • Compiled, analyzed and organized various client’s claims data to assist in delivering optimal healthcare management and Decision-making.
  • C r ea ted I n d ex es to g et t h e h ig h-level p er f o r m an ce .
  • C r ea tedv ar io u sd atab aseob j e cts( tab les,in d ex e s, v ie w s, s to r edp r o ced u r esa n dtrigg er s ) an di m p le m en tedr ef er en tiali n teg r it ycon s trai n tsf o ren f or ci n gd atai n teg r it ya n db us i n es sr u les
  • R espo n s ib le f o r tu n i n g T - S Q L pro c ed u r es, tri gg e r s an d o th er d atab ase o b j ec ts.
  • Designed, created data extraction, transformation and loading process in production using ETL tools like Microsoft SSIS & Informatica.
  • Desig n ed nu m ero u sad - h o can dcus to mr epor tsu s in gS Q L R e por ti n gSe r v ic e s .
  • C r ea ted v ar io u s T ab u lar an d Ma trix R epor ts u s i n g S S R S.
  • C r ea ted an d s ch ed u led j ob s f o r v ar io u s tas k s a n d al s o w as r e s po ns ib le f o r m ain ta in i n g j ob s
  • Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Responsible for understanding healthcare business operations and invested data to find patterns and trends
  • Reconciled daily reports using pivot tables and VLOOKUP.

Tools: Python, SQL, R

We'd love your feedback!