Data Scientist Resume

SUMMARY

Highly efficient Data Scientist/Data Analyst with 6+ years of experience in Data Analysis/Data Science, Machine Learning, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Scraping. Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive.
Skilled in performing Data Parsing, Data Manipulation and Data Preparation with methods including describe Data contents, compute descriptive statistics of Data, regex, split and combine, remap, merge, subset, reindex, melt and reshape.
Experience in using various packages in Python and R like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit - learn, Beautiful Soup, Rpy2.
Developed Dashboards and story points using calculations, parameters, calculated fields, groups, sets and hierarchies in Tableau.
Adept and deep understanding of Statistical Modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.
Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data.
Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, Engineering, features scaling, features engineering, statistical modeling (Decision Trees, Regression Models, Neural Networks, Support Vector Machine (SVM), Clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross validation and Data Visualization
Developed various advanced interactive visualizations such as Heat map, Bubble chart, Tree map, and Line charts while working on motion chart, & Drill down analysis in Tableau Desktop.
Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, PySpark, Spark SQL,PySpark Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.
Deep understanding of MapReduce with Hadoop and Spark. Good knowledge of Big Data ecosystem like Hadoop 2.0 (HDFS, Hive, Pig, Impala), Spark (SparkSql, Spark MILib, Spark Streaming).
Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.
Worked and extracted data from various database sources like Oracle, SQL Server, DB2, regularly accessing JIRA tool and other internal issue trackers for the Project development.

TECHNICAL SKILLS

Programming Languages: SHELL, PYTHON, R

Database Operations : SQL, MYSQL

Data Visualization Tools: Tableau BI Products (desktop, server, reader & online), Business Objects.

Data Mining Models: Clustering, NLP, KNN, Linear Regression, Decision Trees

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

Analyze and Prepare data, identify the patterns on dataset by applying historical models. Collaborating with Senior Data Scientists for understanding of data.
Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating model in Python and R.
This project was focused on customer segmentation based on machine learning and statistical modelling effort including building predictive models and generate data products to support customer segmentation.
Used Python and R for programming for improvement of model. Upgrade the entire models for improvement of the product.
Develop a pricing model for various product and services bundled offering to optimize and predict the gross margin
Built price elasticity model for various product and services bundled offering
Under supervision of Sr. Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables.
Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering.
Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc.
Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions prototyping and experimentation ML/DL algorithms and integrating into production system for different business needs.
Worked on Multiple datasets containing two billion values which are structured and unstructured data about web applications usage and online customer surveys.
Performed Data cleaning process applied Backward - Forward filling methods on dataset for handling missing values.
Design, built and deployed a set of python modelling APIs for customer analytics, which integrate multiple machine learning techniques for various user behavior prediction and support multiple marketing segmentation program.
Segmented the customers based on demographics using K-means Clustering.
Explored different regression and ensemble models in machine learning to perform forecasting.
Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
Performed Boosting method on predicted model for the improve efficiency of the model.
Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI.
Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports.

Environment: Oracle BI tools, Tableau, MS-Excel, Python, Jupyter, Naive Bayes, SVM, K- means, ANN, Regression, MS Access, SQL Server Management Studio, R/R Studio, Red shift

Confidential

Data Scientist/ Data Analyst(Python)

Responsibilities:

Created an aggregated report daily for the client to make investment decisions and help analyze market trends
Built an internal visualization platform for the clients to view historic data, make comparisons between various issuers, analytics for different bonds and market
The model collects, merges daily data from market providers and applies different cleaning techniques to eliminate bad data points.
The model merges the daily data with the historical data and applies various quantitative algorithms to check the best fit for the day.
Captures the changes for each market to create a daily email alert to the client to help make better investment decisions.
Built the model on Azure platform using Python and Spark for the model development and Dash by plotly for visualizations
Built REST APIs to easily add new analytics or issuers into the model.
Automate different workflows, which are initiated manually with Python scripts and Unix shell scripting.
Create, activate and program in Anaconda environment
Worked on predictive analytics use-cases using Python language.
Clean data and processed third party spending data into maneuverable deliverables within specific format with Excel macros and python libraries such as NumPy, SQLAlchemy and matplotlib.
Used Pandas as API to put the data as time series and tabular format for manipulation and retrieval of data.
Helped with the migration from the old server to Jira database (Matching Fields) with Python scripts for transferring and verifying the information.
Analyze Format data using Machine Learning algorithm by Python Scikit-Learn.
Experience in python, Jupyter, Scientific computing stack (numpy, scipy, pandas and matplotlib).
Perform troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
Write Python scripts to parse JSON documents and load the data in database.
Generating various capacity planning reports (graphical) using Python packages like Numpy, matplotlib.
Analyzing various logs that are been generating and predicting/forecasting next occurrence of event with various Python libraries.
Created Autosys batch processes to fully automate the model to pick the latest as well as the best bond that fits best for that market.
Created a framework using plotly, dash and flask for visualizing the trends and understanding patterns for each market using the history data.
Used python APIs for extracting daily data from multiple vendors.
Used Spark and SparkSQL for data integrations, manipulations. Worked on a POC for creating a docker image on azure to run the model

Environment: Python, Pyspark, Spark SQL, Plotly, Flask, Post Man, Microsoft Azure, Autosys, Docker, Tableau 9.x

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship