We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

 Professional summary:

  • Highly efficient Data Scientist/data engineer with over 8+years of experience in areas including Data Analysis, Statistical Analysis, Machine Learning, predictive modeling, data mining with large data sets of structured and unstructured data in banking, automobile, food and market research sectors
  • Involved in the entire data science project life cycle including data extraction, data cleansing, transform modeling, data visualization and documentation.
  • Developed predictive models using Regression, Multiple regression, Logistic Regression, Decision Trees, Random Forest, Naïve Bayes, Cluster Analysis, Association rules/Market Basket Analysis and Neural Networks
  • Extensive experience with statistical programming languages such as R and Python
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Expertise in leveraging the Exploratory Data Analysis (EDA) with all numerical computations and by plotting all kinds of relevant visualizations to do feature engineering and to get feature importance
  • Skilled in using dplyr and pandas inR and Python for performing exploratory data analysis.
  • Skilled in using Principal Component Analysis for dimensionality reduction
  • Extensive hands - on experience with structured, semi-structured and unstructured data using R, Python, Spark ML lib, SQL and Scikit-Learn
  • Extensive experience in Text Analytics, developing different Statistical MachineLearning, Data Mining solutions to various business problems and generating data visualizations using R, Python, and Tableau
  • Knowledge on Twitter text analytics using R functions like sapply, corpus, tm map, search twitter and packages like Twitter, RCurl, tm, wordcloud
  • Proficient in SAS/BASE, SAS EG, SAS/SQL, SAS MACRO, SAS/ACCESS
  • Proficient in writing complex SQL queries like stored procedures, triggers, joints and subqueries
  • Extensive working experience with Python including Scikit-learn, Pandas, and Numpy
  • Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations
  • Skilled in data wrangling, Correlation analysis, multicollinearity, missing values, unbalanced data etc.
  • Proficient in Statistical Modeling and MachineLearning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-NearestNeighbors, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesis testing, Factor analysis/ PCA and Ensembles
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms
  • Experience in using GIT Version Control System
  • Knowledge on time series analysis data using AR, MA, ARIMA, GARCH and ARCH model
  • Good knowledge in Apache- Hive, Sqoop, Flume, Hue,and Oozie
  • Knowledge in Big Data with Hadoop 2, HDFS, MapReduce, and Spark
  • Knowledge in star schema, Snowflake schema for DataWarehouse, ODS architecture
  • Good knowledge on Amazon Web Services (AWS)- Amazon Sage Maker, Amazon S3 for machine learning
  • Collaborated with data warehouse developers to meet business user needs, promote data security, and maintain data integrity

TECHNICAL SKILLS:

EXCEL (8 years), POWERPOINT (6 years), SQL (6 years), MACHINE LEARNING (5 years), ORACLE (5 years) :

Programming Skills : R, R Studio, Python, SAS, Spark, Pig, Hive, Scoop, Map Reduce Python Libraries Scikit, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly Statistical /ML Methods. Databases M Oracle, MySQL, SQL Server 2008/2012/2014

Cloud Services Amazon Web Services (AWS) EC2: Scripting R, Python, UNIX Shell scripting, SQL stored procedures

Reporting & Visualization Tools: Tableau, Seaborne, Matplotlib, ggplot2

Machine Learning: Regression analysis, single and multiple regression, Logistic Regression, K-NN, Decision Trees, Support Vector Machine (SVM), Neural Networks, Random Forests, Ensembles method like Bagging, Boosting, Stacking, K Means Clustering and Hierarchical clustering.

Database Tools: Oracle SQL developer, MS SQL Server 2016/2014, SQL Server

Operating Systems UNIX, Linux, Windows: MS-Office Package Microsoft Office (Windows, Word, Excel, PowerPoint, Visio, Project).

Professional Experience

Data Scientist

Responsibilities:

  • Performed Explanatory Data Analysis that included Data Profiling on descriptive statistics (unknown response values, imbalanced data ), Feature Engineering and data pre-processing functions like transformations, imputation of missing data, capping skewed values, binning, duplicates using R
  • Performed advanced SQL operations that included advanced filtering and data aggregation, window functions and preparing data for use with analytic tools
  • Conducted Explanatory Data Analysis and carried out visualizations with ggplot2 () function
  • Performed chi-square test of independence to study the association between the categorical independent variable
  • Addressed overfitting by implementing the algorithm regularization methods like L2 and L1.
  • Used Principal Component Analysis in feature engineering to analyze high-dimensional data .
  • Involved with statistical domain experts to understand the data and working with the data management team on data quality assurance
  • Provided statistical analyses in comprehensive written reports or sampling plans
  • Worked with division personnel to ensure the delivery of high quality and timely statistical services necessary to achieve business goals
  • Carried out predictive analysis using logistic regression, decision trees and random forests
  • Carried out logistic regression with forwarding and backward and stepwise selection procedures
  • Analyzed output results through Confusion Matrix, Sensitivity, Specificity, Accuracy and Kappa
  • Validated the machine learning classifiers using Accuracy, AUC, ROC Curves and Lift Charts
  • Performed random forests and analyzed graphs on and testing errors
  • Continuously interacted with Marketing Strategists and Business leaders to identify their analytic needs
  • Created Tableau scorecards, dashboards using Stack bars, bar graphs, scattered plots, geographical maps, Gantt charts using show me functionality. Created dashboards to have a clear view of descriptive statistics of all variables, region-wise trend analysis and predicted vs actual response rate for each region. Worked extensively with Advance analysis Actions, Calculations, Parameters, Background images, and Maps. Effectively used data blending feature in Tableau.

Environment: : Oracle, SQL, R, Tableau, MS Excel and PowerPoint.

Confidential - Princeton, NJ

Data Scientist

Responsibilities:

  • Conducted in-depth data analysis and predictive modeling to uncover hidden patterns and communicate the insights to the product, sales, and marketing teams
  • Data extraction, data cleaning, exploratory data analysis, data transformations, data modelling and data visualizations using R, SQL and Tableau
  • Adept in writing R scripts while working with Oracle R Enterprise(ORE)
  • Carried out SQL code writing, editing and maintaining database objects using SQL Navigator for Oracle
  • Experience in SQL queries using Common Table Expressions, Case statements, Set operators, Date formats and other DML statements
  • Good knowledge and experience in data wrangling, manipulating functions along with customized user-defined functions
  • Developed models to assess trends, project needs, surface challenges, and estimate costs
  • Adept in the understanding of machine learning algorithms. Dealt with customer segmentation in determining the customer ps based on price and category of the automobile using k -means clustering
  • Carried out sales forecasting using Regression Analysis with an accuracy of 85%
  • Generated various visualizations like heat maps, trend graphs and boxplots using ggplot2 in R and presented various presentations to non-business users
  • Carried out various team meetings on project updates and presented business reports
  • Automated processes which reduced 50% of manual resources and doubled the efficiency
  • Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver data visualization and reporting solutions
  • Experience in Tableau in creating dashboards, parameters to define upper and lower threshold, action and URL filters for navigation, table calculations and tableau inbuilt functions to provide effective solutions
  • Experience in using multiple measures like Individual Axes, Blended Axes and Dual Axes
  • Worked extensively on Advanced Analytics using LOD expressions, Scatter plots, Box and Whisker plots, Background images, Heat Maps, Pareto charts, Trend Lines and Log Axes, groups, hierarchies and sets, filters, to create detail level summary report and Dashboard using KPI's
  • Develop, maintain and document highly visualized comprehensive dashboards using Tableau Desktop and publish it to Tableau Server to meet business specific challenges
  • Developed dashboards using calculated fields with different logics for trends, parameters, calculations, groups, sets and hierarchies in Tableau
  • Good experience in communicating findings to make data analysis actionable and understandable by business partners.

Environment: Oracle, Oracle R Enterprise, R, Tableau, MS Excel and PowerPoint.

We'd love your feedback!