We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Wayne, NJ

SUMMARY:

  • Experienced, self - motivated data scientist/data analyst with over 6-year experience in Statistical Modeling, Data Mining, Machine Learning, and Data Visualization with rich domain knowledge in Marketing, Healthcare, and Banking industries.
  • Proficient in Machine Learning algorithms and Predictive Modeling such as Linear Regression, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest, Gradient Boosting, Stacking, SVM, KNN, and K-means Clustering.
  • Solid experience in Deep Learning techniques with Convolutional Neural Networks(CNN), Recursive Neural Networks(RNN), Principle Component Analysis (PCA), max pooling and normalization.
  • Strong experience with R Programming and Python 3.x/2.x including Numpy, Scikit-learn, Pandas, and Matplotlib to develop analytic models and solutions.
  • Experienced with Recommender System Design by implementing Collaborative Filtering, Matrix Factorization, Clustering Methods, and Market Basket Analysis.
  • Proficient in data visualization tools such as Tableau 10.5, Python Matplotlib/Seaborn, R ggplot2/Shiny to create visually impactful and actionable interactive reports and dashboards.
  • Familiar with Hadoop ecosystem and Apache Spark framework such as HDFS, DataBricks, HiveQL, SparkSQL, PySpark.
  • Experienced in Non-relational database such as MongoDB 3.2, Cassandra and Neo4j.
  • Adept in developing and debugging Stored Procedures, User-defined Functions (UDFs), Triggers, Indexes, Constraints, Transactions and Queries using Transact-SQL (T-SQL).
  • Involved in entire data science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modelling, Evaluation, Optimization, Testing and Deployment.
  • Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, developing data mining and reporting solutions across a massive volume of structured and unstructured data.
  • Excellent understanding of Systems Development Life Cycle (SDLC) such as Agile and Waterfall
  • Strong business acumen and analytical skills to translate numbers into actionable business decisions. Great passionate in learning cutting-edge theories and algorithms for Machine Learning and always looking for new challenges.
  • Have worked and supported clients in various environments for solution development phase to help them through analytics transformation and digital roadmap creation.

TECHNICAL SKILLS:

Machine Learning: \ Documentation Tools KNN, K-means, CART, Random Forest, \ MS Excel (Pivot Tables, vlookup, hlookup, Naïve Bayes, Cluster Analysis, Text Mining, \ index), MS Word, MS Power Point, Outlook, \ bagging, Gradient Decent, Adaboost, Neural \ MS Office 2010, MS Project Network, XGBoost, LDA

Statistics: \ Languages Linear and Logistic Regression, PCA, ARMA, \ Python 2.7/3.3, R 3.3.4, SAS 9.0, Excel \ GARCH, VaR, Ridge, Lasso, Elastic-net, \ Macro, Spark 2.0, Pig, MapReduce, Matlab \

R Packages: \ Databases \ ggplot2, glm, e1071, caret, tsa, fnn, tm, \ MySQL 5.7, Oracle 11, SQL Server 2012, \ wordcloud, quantmod, splines, KernSmooth, \ MongoDB 3.2, DataBricks, \

MASS, LibSVM, sqldf, shiny\ Microsoft Access\: Python Packages Hadoop Ecosystem \ seaborn, matplotlib, pandas, numpy, scipy, \ Hadoop 2.X, Spark 2.0, Hive 2.1, Hbase 1.0, \ bokeh, plotly, NLTK, scarpy, statsmodel, sqlite3\ Ambari, MapReduce, Sqoop\

Analytical Tools: \ Visualization Excel, IBM SPSS 20.0, E-views, Google \ Tableau 9.3, Power BI, Pivot Table, D3.js, Analytics \ R ggplot2, Python plotly

IDE: \ Scripting Language Anaconda 1.6, Jupyter Notebook 4.2, Spyder \ Unix Shell, SQL, Markdown 3.0, RStudio 1.0

PROFESSIONAL EXPERIENCE:

Confidential - Wayne, NJ

Data Scientist

Responsibilities:

  • Applied models and regression, comparing various initial models, creating pipelines for data processing and presenting reports to other teams within the company.
  • Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
  • Worked on fraud detection analysis on payments transactions using the history of transactions with supervised learning methods.
  • Collected data in Hadoop 2.x and retrieved the data required for building models using Databricks.
  • Developed Spark Python modules for machine learning & predictive analytics in Hadoop.
  • Created Transformation Pipelines for preprocessing large amount of data with methods such as imputing, scaling, selecting, etc.
  • Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Decision Trees, Logistic regression, Gradient Boosting, SVM and KNN.
  • Used PCA and other feature engineering techniques for dimensional reduction while maintaining the variance of most important features.
  • Ensembled methods were used to increase the accuracy of the training model with different Bagging and Boosting methods.
  • Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.
  • Built the TABLEAU 10.5 Dashboard utilizing complex calculated field, real-time table calculations, filters, parameters. Generated context filters and used performance actions while handling huge volume of data.

Environment: Hadoop 2.x, HDFS, Databricks, Python 3.x (Numpy, Pandas, Scikit-learn, Matplotlib), Jupyter, GitHub, TABLEAU 10.5, Linux

Confidential - Princeton, NJ

Data scientist

Responsibilities:

  • Formatted unstructured datasets, applied machine learning algorithms, conducted hyper-parameter tuning.
  • Created Business requirement documents and plans for creating SQL views and dashboards.
  • Conducted Naïve Bayes, Cluster Analysis, Text Mining during modeling and testing part.
  • Developed business rules for populating dimension tables, fact tables, and aggregate tables.
  • Extensively used SAS macros and SAS scripts to merge data set, indexing, data aggregation, record selection, sub-setting, multiple records, creation and modification of views, accessing multiple databases.
  • Created dashboards reports using TABLEAU 10.5 by connecting to various data sources (Oracle, XLSX, CSV, Par Accel, DB2) . Build the TABLEAU Dashboard utilizing complex calculated field, table calculations, filters, parameters. Generated context filters and used performance actions while handling huge volume of data.
  • Worked with relational databases such as Oracle, Sql server, MS-Access and SAS storage server.
  • Developed and published interaction reports also scheduled reports using Tableau Server.
  • Coordinated with the market research and analysis team for customer segmentation and targeting of various products.
  • Efficiently coordinated with marketing department to finalize launch of targeted campaigns.
  • Developed in-depth Dashboards Drill Down and Links to interrelated performance metrics.
  • Created different visualizations using Bars, Lines and Pies, Maps, Scatter plots, Histograms etc. and applied different filters like normal, quick, context, sharing, seas and user filters.

Environment: Tableau Desktop 10.5, Tableau server / Administrator, SAS Visual Analytics, SAS V9.4, MySQL 5.5, Oracle 10g, Microsoft Excel 2007, MS Access, Windows 2007 server, java 1.5 platform, Teradata, MS office 2007

Confidential

R Programmer / Data Scientist

Responsibilities:

  • Conducted research on development and designing of sample methodologies and analyzed data for pricing of client's products.
  • Investigated market sizing, competitive analysis and positioning for product feasibility.
  • Worked on Business forecasting, segmentation analysis and Data mining.
  • Automated Diagnosis of Blood Loss during Emergencies.
  • Developed Machine Learning algorithm to diagnose blood loss.
  • Generated graphs and reports using ggplot package in RStudio for analytical models.
  • Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Performed time series analysis using Tableau.
  • Collaborating with dev-ops teams for production deployment.
  • Worked in Amazon Web Services cloud computing environment.
  • Worked with Caffe Deep Learning Framework.
  • Developed various workbooks in Tableau from multiple data sources.
  • Created dashboards and visualizations using Tableau desktop.
  • Created dashboards in QuickView to visualize data.
  • Worked on R packages to interface with Caffe Deep Learning Framework.
  • Performed analysis using JMP.
  • Perform validation on machine learning output from R.
  • Written connectors to extract data from databases.

Environment: R, Python, R Studio, Shiny, Excel 2013, Amazon Web Services, Machine Learning, spark, Hadoop, Tableau, QuickView, JMP, Segmentation analysis

Confidential

Junior Data Scientist

Responsibilities:

  • Gathered, analyzed, documented, and translated application requirements into data models and supports standardization of documentation and the adoption of standards and practices related to data and applications.
  • Designed SSIS package to perform extract, transform, and load (ETL) data across different platforms, validate the data, and achieve the data from database.
  • Identified patterns, data quality issues, and leveraged insights by communicating with BI team.
  • Explored and Visualized the data to check the pattern, distribution, and correlation using Python Matplotlib and Seaborn on Jupyter Notebook.
  • Used Python Pandas and Scikit-Learn to preprocess the data, including data imputation, outlier detection, label encoding, scaling, resampling, and feature engineering to avoid multicollinearity issue.
  • Developed and Implemented predictive models such as Logistic Regression, Support Vector Machine, Random Forest, Gradient Boosting, and KNN in Python and compare the performance.
  • Applied feature transformation methods such as Principal Component Analysis to reduce the dimensions and improve the model performance.
  • Set up data preprocessing pipeline to guarantee the consistency between the training data and new coming data.
  • Designed rich data visualization to model data into human-readable form with Tableau and Matplotlib.

Environment: MS SQL Server 2008, MS SSIS, Python (Numpy/Pandas/Matplotlib/Seaborn/Scikit-Learn), Jupyter Notebook, Machine Learning (Logistic Regression, Support Vector Machine, Random Forest, Gradient Boosting, K-Nearest Neighbors), Tableau, Windows 8/XP

Confidential

Data Analyst

Responsibilities:

  • Created Database in Microsoft Access by using blank database and created tables and entered dataset manually and Data Types, performed ER Diagram and Basic SQL Queries on that database.
  • Microsoft Excel used for formatting data as a table, visualization and analyzing data by using certain methods like Conditional Formatting, Remove Duplicates, Pivot and Unpivot tables, Created Charts and Sort and Filter Data Set.
  • Applied concepts of probability, distribution and statistical inference on given dataset to unearth interesting findings through use of comparison, T-test, F-test, R-squared, P-value etc.
  • Performed Statistical Analysis and Hypothesis Testing in Excel by using Data Analysis Tool.
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions to address those needs.
  • Creating customized business reports and sharing insights to the management.
  • Presented a Dashboard for better understanding of dataset to the entire stake Holders.
  • Performed module specific configuration duties for implemented applications to include establishing role-based responsibilities for user access, administration, maintenance, and support.
  • Worked closely with internal business units to understand business drivers and objectives which can be enabled through effective application deployment.

Environment: MS Access, MS Excel, MS Visio, UML diagrams, Mainframes, Sql Server

We'd love your feedback!