We provide IT Staff Augmentation Services!

Sr Data Scientist Resume

4.00/5 (Submit Your Rating)

San Francisco, CA

PROFESSIONAL SUMMARY:

  • 8+ years working experience as Data Analyst, Data Scientist Extensive programming skills in analytical and statistical programming language such as R, Python and SQL.
  • Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, NumPy, seaborn, SciPy, matplotlib, scikit - learn, Beautiful Soup, Rpy2.
  • Expertise in writing functional specifications, translating business requirements to technical specifications, created/maintained/modified database design document with detailed description of logical entities and physical tables.
  • Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
  • Excellent experience in Normalization (1NF, 2NF, 3NF and BCNF) and De-normalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Expertise in OLTP/OLAP System Study, Analysis and E-R modeling, developing Database Schemas like Star schema and Snowflake schema used in relational, dimensional and multidimensional modeling.
  • Excellent experience in Extract, Transfer and Load process using ETL tools like Data Stage, Informatica, Data Integrator and SSIS for Data migration and Data Warehousing projects.
  • Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Center.
  • Well versed in Machine Learning algorithms such as Linear, Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, K-Nearest Neighbors (K-NN).
  • Experienced in Data Analytics and Predictive Modeling with R and Python
  • Experience working on data preprocessing steps like exploration, aggregation, missing data imputation, sampling, feature selection, dimensionality reduction, outlier detection, data transformation.
  • Expertise in developing Machine Learning algorithms using R and Python.
  • Understanding of Machine Learning ("ML") concepts including modeling, training, testing & validation with application development of such algorithms in work projects & non-academic environments
  • Involved in projects from initial phase till completion to convert data into information and information into insights, trends and future forecasts that impact consumer behavior and adds business value.
  • Experience in data wrangling, data visualization and detailed reporting using R, Python and Tableau.
  • Applied several Machine Learning algorithms to detect frauds and classify defaulters and non-defaulters.
  • Built time-series models and statistical models for sales predictions and descriptive visualizations of sales data to showcase identified hidden trends and anomalies.
  • Strong experience with Oracle/SQL Server programming skills, with experience in working with functions, packages and triggers.
  • Expertise in Excel Macros, Pivot Tables, Vlookup and other advanced functions.
  • Excellent knowledge and understanding of data mining techniques like classification, clustering, regression techniques and random forests.
  • Experience in designing, developing, scheduling reports/dashboards using Tableau and Cognos.
  • Experience with working in Agile/SCRUM software environments
  • Built recommendation engines using item based collaborative filtering algorithm to understand the relations between products and their effects on sales (product-wise, season-wise).
  • Experience on extracting data from various non-traditional data sources (web scraping)
  • Good understanding of Hadoop ecosystem and its components.

TECHNICAL SKILLS:

Operating Systems: UNIX, Windows 8/10, MS DOS.

Databases: MS SQL Server, NOSQL, Cassandra 3.11, MongoDB 3.6, AWS RDS, MySQL, Oracle.

Programming Languages: R, Python, Scala 2.12, Java OLAP Tools MS SQL 2017, Analysis Manager, DB2, OLAP, Cognos, Power play

Methodologies: Agile, Ralph Kimball, Billion’s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), and Joint Application Development (JAD).

Tools & Software: TOAD 9.6, MS Office, BTEQ, Teradata r15, SQL Assistant.

PROFESSIONAL EXPERIENCE:

Sr Data Scientist

Confidential, San Francisco, CA

Responsibilities:

  • Visualized the time series data of sales and checked for stationarity using Dickey Fuller test.
  • Data Acquisition, Cleaning & Quality analysis include: Acquisition of data using Python, SQL, Excel & related APIs. Worked on missing value imputation, outlier identification with statistical methodologies using Pandas, Numpy libraries in Python.
  • Made the time series data stationary and applied ARIMA model for forecasting sales, this helped in maintaining the inventory as per the sales.
  • Processed and cleaned the data by treating missing values using imputation method.
  • Application of various machine learning algorithms & statistical modeling like decision trees, random forests, some text analytics using natural language processing (NLP), supervised & unsupervised algorithms, regression models, classification & clustering using sklearn, keras package in Python & Mllib in spark. Detected and treated outliers, ran stepwise regression and all subset regression methods to choose effective variables to build the revenue model.
  • Was responsible for designing and implementing modifications to the Enterprise Data Warehouse.
  • Created different visualizations using cross tables, bar charts, pie charts, maps, line charts, scatter plots etc. using Tableau desktop & Python matplotlib.
  • Applied Spark using AWS EMR technology to use MLlib for machine learning. Validated & selected models using k-fold cross validation, confusion matrices & worked on optimizing models using hyper parameter search methods.
  • Predicted the revenue using linear modeling as well as ran the price elasticity model to show what happens to the revenue when the price of product increases, which helped to improve profit by 23%.
  • Studied users purchase history and built a recommendation engine using collaborative filtering to predict the preference that the users would give for the particular item.
  • Implemented topic modeling using LDA in R to predict the product category of tweets and detect specific issues in each category by analyzing the tweets collected from customer service.
  • Performed data analysis & applied relevant machine learning algorithms to decide appropriate marketing options & promotional ads.
  • Enabled the business to detect irregular issues more quickly and resolve the issue earlier.
  • Performed exploratory analysis on product data to know the structure, attributes, dimensions, missing values and outliers in the data using R.
  • Extracting the tweets data from twitter API (using twython) based on region and trying to create an influencer network based on the re-tweets to observe if the sentiment of the individual is being influenced by the influencers’ sentiment.
  • Tried to observe the difference in the sentiment score when there are two influencers speaking about the similar topic and how impactful is each influencer.

Environment: Python 2.x/3.x (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R (dplyr, ggplot2, rpart, caret, random Forest, gbm, h2o, neuralnet), JIRA, Trello, GitHub, Slack. SQL, T-SQL. Python, MySQL, ORACLE, HTML5, CSS3, JavaScript, Shell, Linux & Windows, Django, Python.

Sr Data Scientist

Confidential - Detroit, MI

Responsibilities:

  • Expertise in developing Machine Learning algorithms using R and Python.
  • Understanding of Machine Learning ("ML") concepts including modeling, training, testing & validation with application development of such algorithms in work projects & non-academic environments
  • Was responsible for designing and implementing modifications to the Enterprise Data Warehouse.
  • Created different visualizations using cross tables, bar charts, pie charts, maps, line charts, scatter plots etc. using Tableau desktop & Python matplotlib.
  • Applied Spark using AWS EMR technology to use MLlib for machine learning. Validated & selected models using k-fold cross validation, confusion matrices & worked on optimizing models using hyper parameter search methods.
  • Expertise in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering. features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K-fold cross validation and data visualization.
  • Strong knowledge in all phases of the SDLC (Software Development Life Cycle) from analysis, design, development, testing, implementation and maintenance.
  • Expertise in synthesizing Machine learning, Predictive Analytics and Big data technologies into integrated solutions.
  • Predicted the revenue using linear modeling as well as ran the price elasticity model to show what happens to the revenue when the price of product increases, which helped to improve profit by 23%.
  • Studied users purchase history and built a recommendation engine using collaborative filtering to predict the preference that the users would give for the particular item.
  • Implemented topic modeling using LDA in R to predict the product category of tweets and detect specific issues in each category by analyzing the tweets collected from customer service. Performed data analysis & applied relevant

Environment: Python 2.x/3.x (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R (dplyr, ggplot2, rpart, caret, random Forest, gbm, h2o, neuralnet), JIRA, Trello, GitHub, Slack. SQL, T-SQL. Python, MySQL, ORACLE, HTML5, CSS3, JavaScript, Shell, Linux & Windows, Django, Python.

Data Analyst

Confidential, Van Nuys, CA

Responsibilities:

  • Worked with this auto loan lending company as a data analyst to identify and quantify potential risk factors and the loan defaulters.
  • Applied logistic regression to predict credit scores and validated the model.
  • Researched various machine learning algorithms such as Naive Bayes, SVM, KNN, Random Forest and applied to predict loan defaulters to improve the previous model and increase the prediction accuracy.
  • Validated the results using the cost matrix which is calculated based on cost incurred due to false positives and false negatives.
  • Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
  • Expertise in writing functional specifications, translating business requirements to technical specifications.
  • Created/maintained/modified database design document with detailed description of logical entities and physical tables.
  • Used Knitr and R Markdown to present the reports with the ROC plots, graphs and results of the models.
  • Developed, recommended and assisted with the implementation of business strategies to manage credit risks, increase revenue and reduce exposure to credit losses.
  • Worked with Apache Spark using Python to develop & execute Big Data Analytics & Machine learning applications. Executed machine learning models under Spark ML & Mllib.
  • Worked on a huge transactional data set of 40 million rows and performed exploratory data analysis.
  • Converted continuous variables to dummy variables for performance improvement.
  • Assisted in marketing analytics by identifying cause-effect relationships between marketing actions and financial outcomes to raise profitability.

Environment: Python, MySQL, ORACLE, HTML5, CSS3, JavaScript, Shell, Linux & Windows, Django, Python. Data wrangling, Fraud Detection, Data Visualization, Reporting Tools Used Python, R, SQL, Hadoop Spark, Machine learning

Data Scientist

Confidential, Chicago, IL

Responsibilities:

  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Implemented the presentation layer with HTML, CSS and JavaScript.
  • Involved in writing stored procedures using Oracle.
  • Optimized the database queries to improve the performance.
  • Designed and developed data management system using Oracle.
  • Effectively communicated with the stakeholders to gather requirements for different projects
  • Used MySQL db package and Python-MySQL connector for writing and executing several MYSQL database queries from Python.
  • Created functions, triggers, views and stored procedures using My SQL.
  • Worked closely with back-end developer to find ways to push the limits of existing Web technology.
  • Involved in the code review meetings related to the usage of banking wallet.
  • Responsible for developing ETL pipeline to push data from data warehouse into downstream systems, followed the standard best practice of Informatica development standards.
  • Analyzed the customer responses from the previous campaign and targeting the right customer pool to get the maximum out of the calls made for the new campaign.
  • Was able to observe if the education background effected the decision of the Customer.
  • Was able to predict with 71% (without noise) accuracy how the customer will respond to a new campaign, where is an added noise the accuracy dropped to 52%.
  • Built a fraud detection model using decision trees to observe if we can improve the fraud detection flags and reduce the number of calls made by the call center.

Environment: Python 2.x/3.x (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R (dplyr, ggplot2, rpart, caret, random Forest, gbm, h2o, neuralnet), JIRA, Trello, GitHub, Slack. SQL, T-SQL. Python, MySQL, ORACLE, HTML5, CSS3, JavaScript, Shell, Linux & Windows, Django, Python.

Data Analyst

Confidential

Responsibilities:

  • Prepared detailed reports on regional success rate of campaign.
  • Extracted data using SQL queries, cleaned, imputed missing values and made the datasets ready for analysis.
  • Applied Predictive analytics using Python to provide real time estimates of customers & their behavior related to the usage of banking wallet.
  • Responsible for developing ETL pipeline to push data from data warehouse into downstream systems, followed the standard best practice of Informatica development standards.
  • Analyzed the customer responses from the previous campaign and targeting the right customer pool to get the maximum out of the calls made for the new campaign.
  • Was able to observe if the education background effected the decision of the Customer.
  • Was able to predict with 71% (without noise) accuracy how the customer will respond to a new campaign, where is an added noise the accuracy dropped.
  • Created functions, triggers, views and stored procedures using My SQL.
  • Worked closely with back-end developer to find ways to push the limits of existing Web technology.
  • Involved in the code review meetings related to the usage of banking wallet.
  • Built a fraud detection model using decision trees to observe if we can improve the fraud detection flags and reduce the number of calls made by the call center.
  • Validated the results using the cost matrix which is calculate based on cost incurred due to false positives and false negatives.

Environment: Python, MySQL, ORACLE, HTML5, CSS3, JavaScript, Shell, Linux & Windows, Django, Python, Fraud Detection, Data Visualization, Reporting Tools Used Python, R, SQL, Hadoop Spark, Machine learning

We'd love your feedback!