Data Scientist Resume
Houston, TX
SUMMARY:
- Around 5 years of Machine Learning/Data Science experience with excellence in developing and implementing large scale algorithms that have significantly impacted business revenues and user experience.
- Developed intricate algorithms based on deep - dive Statistical Analysis and Predictive Data Modeling that were used to deepen relationships, strengthen longevity and personalize interactions with customers.
- Experience in Data Mining, Machine learning and Spark Development with big informational datasets of Structured and Unstructured information, Data Acquisition, Data Validation, Predictive Demonstrating, Data Visualization.
- Analyzed and processed complex data sets using advanced Querying, Visualization and Analytics tools.
- Hands-on experience in applying several Machine Learning/Statistical Algorithms to real-world problems by using Deep Learning, Gradient Boosted Trees, Natural Language Processing, Random Forests, Clustering, Generalized Linear Models, Simulation Models and Gaussian Mixture Models.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including Data Extraction, Data Cleaning, Data Engineering, Data Loading, Data Wrangling, Feature Scaling, Statistical Modeling (Decision Trees, Regression Models, Neural Networks, SVM, Clustering), Dimensionality Reduction and Factor Analysis, testing and validation using ROC Plot, K- fold Cross Validation, Predictive Modeling using R, Python and Data Visualization using Tableau.
- Expertise in implementing Dimensionality Reduction techniques like Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Kernel PCA, Quadratic Discriminant Analysis (QDA) in Feature Extraction and Backward Elimination, Forward Selection, Bidirectional Elimination, Score Comparison in Feature Selection Techniques.
- Strong experience in building end-to-end Machine Learning platform using Java and Big Data technologies like Cassandra, Spark, Apache Hadoop, MapReduce, HDFS Architecture, HBase, Sqoop, Pig, MLlib, ELT and Hive.
- Highly skilled in Statistical Thinking which include Graphical and Quantitative EDA, sentiment analysis, Bootstrap Confidence Intervals, Correlation, Hypotheses Modeling, Collaborative Filtering, Recommender Systems, Time-Series, Inferential Statistics, Matrix Factorization as well as Modeling Techniques to gain valid inferences.
- Strong understanding of the principles of Data Warehousing(OLAP) using Kimball Methodology, Business Intelligence applications, Online Transaction Processing(OLTP), Fact Tables, Dimension Tables, Star and Snowflake schema modelling.
- Highly skilled in Tableau Desktop for Data Visualization using Cross Map, Scatter Plots, Geographic Map, Pie Charts and Bar Charts, Page Trails, Heat Map and Density Chart.
- Expertise in dealing with Relational Database Management Systems including Normalization, Stored Procedures, Constraints, Querying, Joins, Keys, Indexes, data import/export, Triggers and Cursors.
- Comprehensive Knowledge and experience in writing queries in SQL, MySQL, NOSQL, Postgre SQL and R to Extract, Transform and Load (ETL) data from large datasets. Strong Data Analysis skills using Business Intelligence (BI), SQL & MS Office Tools.
- Smart in examining large databases like Microsoft Azure, MongoDB, Cassandra, Oracle, SQL Server, DB2.
- Highly Skilled in using various Data Science related libraries in Python like Scikit-learn, OpenCV, NumPy, SciPy, Matplotlib, Pandas, Seaborn, Bokeh, nltk, Genism, Scikit, networkx, Stats models, TensorFlow, Theano and Keras.
- Expertise in using variant libraries of R such as ggplot2, caret, CA Tools, Amelia, Beautiful Soup, e1071, lubridate, miss Forest, caret, CORE learn, BigRF, rpart, PROC, I graph, tree, random Forest, LTSA, LSMeans, ROCR, Rweka, arules, sqldf, RODBC, RMarkdown.
- Super-eminent understanding of AWS (Amazon Web Services), S3, Amazon RDS, Apache Spark RDD, process and concepts. Developing Logical Data Architecture with adherence to Enterprise Architecture.
- Meticulously experienced working on data modeling tools like CA Erwin, Power Designer, MS Visio, ER/Studio and Data quality tools Informatica IDQ, Informatica MDM.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, PivotTables and OLAP reporting.
- Progressive involvement in GIT, Agile Methodology and SCRUM process. Strong business sense and abilities to communicate data insights to both technical and nontechnical clients
WORK EXPERIENCE:
Data Scientist
Confidential, Houston, TX
Responsibilities:
- Retrieved data from Hadoop Cluster by developing a pipeline using Hive(HQL), SQL to retrieve data from Oracle database and used ETL for data transformation.
- Performed data wrangling to clean, transform and reshape the data utilizing pandas library. Analyzed data using SQL, R, Java, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
- Worked with different datasets with complexity including both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
- Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
- Analyzed Historical data by using various machine learning algorithms such as clustering, multiple linear regression, logistic regression, SVM, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Conducted exploratory data analysis using Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, SciPy, NLTK in Python for developing various machine learning algorithms.
- Implemented Data Quality validation techniques to validate data and identified many anomalies. Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R and Python.
- Enforced model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.
- I have worked with various kinds of data (open-source as well as internal). I have developed models for labeled and unlabeled datasets, and have worked with big data technologies, such as Hadoop and Spark, and cloud resources, like Azure and Google Cloud.
- Enforced F-Score, AUC/ROC, Confusion Matrix, Precision, and Recall evaluating different model's performance.
- Multi-layers Neural Networks built in Python Scikit-learn, Theano, TensorFlow and keras packages to implement machine learning models.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
Environment: Python 3.6.4, R Studio, MLLib, Regression, NoSQL, SQL Server, Hive, Hadoop Cluster, ETL, Spyder 3.6, Agile, Tableau, Java, NumPy, Pandas, Matplotlib, Power BI, Scikit-Learn, Seaborn, e1071, ggplot2, Shiny, TensorFlow, AWS, Azure, HTML, XML, Informatica Power Center, Teradata.
Data Scientist
Confidential, South Jordan, UT
Responsibilities:
- Performed data wrangling to clean, transform and reshape the data utilizing pandas library. Analyzed data using SQL, R, Java, Scala, Python, Apache Spark and presented analytical reports to management and technical teams.
- Worked with different datasets which includes both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation and Visualization.
- Developed predictive models on large scale datasets to address various business problems through leveraging advanced statistical modeling, machine learning and deep learning.
- Implemented public segmentation using Unsupervised machine learning algorithms by implementing K-means algorithm by using PySpark using data munging.
- Experience in Machine learning using NLP text classification, churn prediction using Python.
- Worked on different Machine Learning models like Logistic Regression, Multi-layer perceptron classifier and K-means clustering.
- Lead discussions with users to gather business processes requirements and data requirements to develop a variety of conceptual, logical and Physical Data models.
- Expertise in Business intelligence and Data Visualization tools like Tableau.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Good knowledge in Azure cloud services, Azure Storage to manage and configure the data.
- Used R and Python for Exploratory Data Analysis to compare and identify the effectiveness of the data.
- Created clusters to classify control and test groups.
- Analyzed and calculated the life cost of everyone in a welfare system using 20 years of historical data.
- Developed Data Mapping, Data Governance, Transformation and Cleansing rules for the Master Data Management
- Architecture involving OLTP, ODS and OLAP.
- Developed triggers, stored procedures, functions and packagers using cursors associated with the project using PL/SQL.
- Used Python, R, SQL to create statistical algorithms involving Multivariate Regression, Linear Regression, Logistic
Environment: Hadoop, HDFS, Python 3.x (Scikit -Learn/ Keras/ SciPy/ NumPy/ Pandas/ Matplotlib/ NLTK/ Seaborn), R (ggplot2/ caret/ trees/ arules), Tableau (9.x/10.x), Machine Learning (Logistic regression/ Random Forests/ KNN/ K-Means Clustering / Hierarchical Clustering/ Ensemble methods/ Collaborative filtering), GitHub, Agile/ SCRUM
Data Analyst
Confidential
Responsibilities:
- Developed complex SQL queries, stored procedures, views, functions and reports that qualify customer requirements using Microsoft SQL Server.
- Worked with the ETL team to document the transformation rules for Data migration from OLTP to Warehouse environment for reporting purposes.
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Optimized the performance of queries with modification in T-SQL queries, removed the unnecessary columns and redundant data, normalized tables, established joins and created index.
- Implemented Predictive analytics and machine learning algorithms to forecast key metrics in the form of designed dashboards on to AWS and Django platform.
- Implemented supervised, semi-supervised, and unsupervised algorithms in machine learning for tasks, like classification, regression, and clustering.
- Used various machine learning algorithms, like decision trees and forests, support vector machines, and deep networks (CNNs, RNNs, and LSTMs).
- Operated univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
- Explored and analyzed the customer specific features by using Matplotlib and ggplot2. Extracted structured data from MySQL databases, developing basic visualizations or analyzing A/B test results.
- Conventionally designed and implemented statistical tests including Hypothesis testing, AVOVA, Chi-square test to verify models' significance by using R.
- Participated in features engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python.
- Work with business stakeholders to refine and respond to their ad hoc requests and improve their existing reporting and dashboards as necessary.
- Predictive analytics helps in targeting the right customer at the right time based on their past behavior and choices. It also helps in boosting revenue by proper planning and reducing the operational costs in the long term.
- Customer sentiment analysis, customer experience and positioning of the company can be analyzed to make the customer experience richer and smoother.
- Develop large scale data analytic solutions in machine learning such as regressions, KNN, random forest, SVM,
- K-means to solve classification and clustering problems.
- Build natural language processing (NLP) and text analytic models such as document retrieval, topic models, and sentiment analysis.
Environment: Python, Scikit -Learn, SciPy, NumPy, Pandas, Matplotlib, NLTK, Seaborn, R-studio, ggplot2, trees, arules, Tableau (9.x/10.x), Machine Learning, Logistic regression, Random Forests, KNN, K-Means Clustering.
