We provide IT Staff Augmentation Services!

Data Scientist Resume

San Diego, CA

SUMMARY:

  • Motivated Data Scientist having 8 years of professional IT experience with a strong Mathematical and Statistics background with experience in implementing Machine Learning models
  • Added support for Amazon AWS to host static/media files and the database into Amazon Cloud.
  • Data Mining of large structured and unstructured datasets, Data Acquisition, skilled in Predictive Modelling. In - depth knowledge in Statistical Analysis, Supervised learning, Unsupervised learning Machine learning
  • Familiar with development and deployment of various cloud-based systems like AWS and Azure.
  • Experience in Machine Learning algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, K-Means Clustering and Association Rules
  • Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory
  • Experience in applying predictive modeling and machine learning algorithms for analytical reports
  • Experience using technology to work efficiently with datasets such as scripting, data cleaning tools, statistical software packages
  • Experienced in deploying and automating applications in Microsoft Azure environment.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles
  • Valuable experience working with large datasets and Deep Learning algorithms with Tensor Flow
  • Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS
  • Experience implementing machine learning back-end pipeline Spark ML-lib, Scikit-learn, Pandas, Numpy
  • Working knowledge of extract, transform, and Load (ETL) components and process flow
  • Good experience in Normalization for OLTP and De-normalization of Entities for Enterprise Data Warehouse
  • Used Azure Terraform and Azure OpsWorks to deploy the infrastructure necessary to create development, test, and production environments for a software development project
  • Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, Spark Sql
  • Very good knowledge and experience on AWS, Redshift, S3, and EMR experience in Text Analytics, building different statistical machine learning, Datamining solutions to various business problems and generating data visualizations using R, Python, and Tableau
  • Over 5+Experience with Machine learning techniques and algorithms (such as k-NN, Naive Bayes, etc.)
  • Experience object-oriented programming (OOP) concepts using Python, C++ and PHP
  • Integration Architect & Data Scientist experience in Analytics, Bigdata, BPM, SOA, ETL and Cloud technologies
  • Tagging of experience in foundational machine learning models and concepts: regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning
  • Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction autoencoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN autoencoder architecture)
  • Exposure to AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM
  • Build LSTM neural network for text, like item description, comments
  • Have experience in Artificial Intelligence Chatbots worked on providing clear, precise and effective communication in written and verbal format to both internal and external audiences.
  • Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and good knowledge of Recommender Systems
  • Experienced the full software lifecycle in SDLC, Agile and Scrum methodologies
  • Experience in applying Machine Learning techniques/algorithms to business problems such as Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forests, Support Vector Machine (SVM), K-Nearest-Neighbors (KNN), K-means Clustering, Neural Networks, Gradient-Boosting and Ensemble Methods
  • Strong communication, interpersonal, planning and problem-solving skills

TECHNICAL SKILLS:

Expertise: Scikit-learn, NLTK, spaCy, NumPy, SciPy, OpenCv, Deep learning, NLP, RNN, CNN, Tensor flow, Keras, matplotlib, Microsoft Visual Studio, Microsoft Office

Machine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, K-Means Clustering, Support Vector Machines, Gradient Boost Machines & XGBoost, Neural Networks

Data Analysis Skills: Data Cleaning, Data Visualization, Feature Selection, Pandas

Operating Systems: Windows, Mac and Linux, Unix

Programming Languages: Python, SQL, R, Matlab, Torch, C, C++, Java, Octave, Apache Spark, Hadoop, Spark ML

Other Programming Knowledge and Skills: ElasticSearch, Data Scraping, RESTful-Api using Django Web Frame work

Tools: Toad, Erwin, AWS, Azure,D3, Mule Soft, Alteryx, Tableau, Shiny, Adobe Analytics, Anaconda

PROFESSIONAL EXPERIENCE:

Confidential, San Diego, CA

Data Scientist

Responsibilities:

  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Utilized machine learning algorithms such as linear regression, multivariate regression, naive bayes, K-means, & KNN for data analysis
  • Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory
  • Utilized analytical applications like R, and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value
  • Proficient working experience on big data tools like Hadoop, Azure Data Lake, and AWS Redshift.
  • Compiled data from various sources to perform complex analysis for actionable results
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Optimizing the Tensorflow Model for an efficiency
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive
  • Hands on experience in implementing LDA, Naive Bayes and skilled in Decision Trees, Random Forests, Linear and Logistic Regression, SVM, Clustering, neural networks.
  • Analyzing the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes
  • Using Jenkins AWS Code Deploy plugin to deploy to AWS
  • Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS
  • Developed Scripts and Batch Job to schedule various Hadoop Program. Used to train the model from insightful data and look at thousands of examples
  • Designing, developing and optimizing SQL code
  • Used Azure Terraform and Azure OpsWorks to deploy the infrastructure necessary to create development, test, and production environments for a software development project
  • Experienced in deploying and automating applications in Microsoft Azure environment.
  • Building performant, scalable ETL processes to load, cleanse and validate data
  • Providing support for data processes. This will involve monitoring data, profiling database usage, trouble shooting, tuning and ensuring data integrity
  • Participating in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies
  • Collaborate with team members and stakeholders in design and development of data environment
  • Learning new tools and skillsets as needs arise
  • Preparing associated documentation for specifications, requirements and testing
  • Used Tensorflow for text summarization
  • Wrote Hive queries for data analysis to meet the business requirements
  • Responsible for analyzing multi-platform applications using python
  • Developed MapReduce jobs in Python for data cleaning and data processing

Tech Environment/Skills: Python, Linux, Unix Statistical Modeling, Hive, PIG, and Map reduce, SQL, Scikit-learn, NLTK, spaCy, NumPy, SciPy, Scrum, Deep learning, NLP, RNN, CNN, Tensor flow, Keras, matplotlib, Machine Learning Algorithms, AWS, Azure, data lakes.

Confidential

Jr. Data Scientist

Responsibilities:

  • Responsible for data identification, collection, exploration, and cleaning for modeling, participate in model development
  • Familiar with development and deployment of various cloud based systems like AWS and Azure.
  • Performed Data Cleaning, features scaling, features engineering
  • Experienced in deploying and automating applications in Microsoft Azure environment.
  • Creating statistical models using distributed and standalone models to build various diagnostics, predictive and prescriptive solution
  • Missing value treatment, outlier capping and anomalies treatment using statistical methods, deriving customized key metrics
  • Deploying, managing, and operating scale, highly available, and fault tolerant systems to AWS.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis
  • Added support for Amazon AWS to host static/media files and the database into Amazon Cloud
  • Implemented AWS solutions using EC2, S3, RDS, EBS, Elastic Load Balancer, Auto scaling groups.
  • Visualize, interpret, report findings, and develop strategic uses of data by python Libraries like Numpy, Scikit-learn, MatPlotLib
  • Performed analysis using industry leading text mining, data mining, and analytical tools and open source software
  • Familiar with development and deployment of various cloud based systems like AWS and Azure.
  • Added support for Amazon AWS to host static/media files and the database into Amazon Cloud.
  • Understanding and implementation of text mining concepts, graph processing and semi structured and unstructured data processing
  • Optimized the ETL workflows for better performance in data migration and performed the required transformation based on the requirements of the project
  • Deploying, managing, and operating scale, highly available, and fault tolerant systems to AWS
  • Applied analysis methods such as Hypotheses testing and Analysis of variance (ANOVA) for validating the existing models on the observed data
  • Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection
  • Dummy variables were created for certain datasets to into the regression
  • Built multiple features of machine learning using python
  • Strong skills in data visualization like matPlotLib and seaborn library
  • Create different charts such as Heat maps, Bar charts, Line charts, etc.

Tech Environment/Skills: Python, Pandas, MatPlotLib, ANOVA, seaborn library, text mining, Numpy, Scikit-learn, Heat maps, Bar charts, Line charts, ETL workflows, linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN, SciPy, AWS, Azure, data lakes

Confidential, San Jose

Data Analyst

Responsibilities:

  • Process Improvement: Analyzed error data of recurrent programs using Python and devised a new process to reduce the turnaround time of the problem’s solutions by 60%
  • Worked on production data fixes by creating and testing of PL/SQL scripts.
  • Root Cause Analysis: Leveraged SQL and Oracle Apps knowledge to augment root cause analysis and reduce turnaround time for end customer data issues by 50%
  • Deep dived into complex data sets to analyze trends using Linear Regression, Logistic Regression, Decision Trees
  • Prepared reports using SQL and Excel to track the performance of websites and apps
  • Visualized data using Tableau to highlight abstract information
  • Analyzed the app’s data using Python and IBM SPSS Statistics, boosting the app’s usability by 30%
  • Participated in Building Machine Learning using python

Tech Environment/Skills: Python, PL/SQL scripts, Oracle Apps, Excel, IBM SPSS, Tableau, Qlikview

Hire Now