Data Scientist Resume
Richardson, TX
SUMMARY
- Around 5 years of experience in Data analysis, Data Science, Machine Learning and Data visualization.
- Has solid understanding on system development methodologies and has wide knowledge on the concepts of Data Mining, Supervised Learning, Unsupervised Learning, recommendation system and association rules system.
- Has very rich experience in working with large data sets and classification of data.
- Can analyze and extract relevant information from large amounts of data to help automate for self - monitoring, self-diagnosing and optimize key process. Has outstanding proficiency in understanding statistical and other tools/languages such as R, Python, C, C++, MATLAB, LTspice.
- Specialties: Data Mining, Random forest, Breiman's random forest, ADAboost, decision trees, Support Vector Machine, Time Series Analysis, Modeling and Forecasting, including, Regression Forecasting, Anomaly detection, Random Walk, Spectral Analysis, Collaborative filtering, Stationary Models, Multivariate Data Analysis, including Discriminate, Factor, and Cluster Analysis, Forecasting with predictors, Event and Intervention Analysis, Regression Analysis and Modeling, Bayesian Models Design and Analysis of Experiment, Neural Networks-based Modeling and Project Management.
TECHNICAL SKILLS
Data Analytics Tools: Python (NumPy, SciPy, pandas, Seaborn, Plotly, Matplotlib), SQL,Tableu.Data Visualization Tableau, Visualization packages, Microsoft Office.
Software & Tools: MS Projects, Excel, Spyder, Pycharm, Jupyter, Matlab, PostgreSQL, Ambari SandBox, HDFS, Hive.
Databases: SQL, NoSQL, Ms Access.
Operating System: Windows, UNIX, Linux, Mac OS
Languages: Python, C, C++
Machine Learning: Regression, Classification, Clustering, Association, Simple Linear Regression, Multiple Linear Regression, Decision Trees, Random forest, Logistic Regression, K-NN, SVM, Recommendation system, Association Rules, Apriori, PCA, Time series Analysis, KNeighbor, Unsupervised Learning, NLTK, Count Vectorizer, TFiDF, C-Fuzzy Clustering.
PROFESSIONAL EXPERIENCE
Confidential, Richardson TX
Data Scientist
Responsibilities:
- Used advanced analytical tools and programming languages such as Python (NumPy, pandas, SciPy) for data analysis.
- Constructed and evaluated various types of datasets by performing machine learning models using algorithms and statistical modeling techniques such as clustering, classification, regression, decision trees, support vector machines, anomaly detection, sequential pattern discovery, and text mining from Python libraries (scikits.learn).
- Transformed data is moved to Spark cluster using spark streaming and Kafka.
- Used Kafka with Spark Streaming API to fetch real time data from transaction logs.
- Performing the Post pruning techniques in machine learning to reduce the complexity of the final classifier which results in improving the predictive analysis by reducing over fitting, using python libraries(sklearn).
- Performing predictive analytics and machine learning algorithms especially supervised (SVM, Logistic Regression, Boosting), unsupervised (K-Means, LDA, EM) and Reinforcement learning (Random Forests) methods.
- Evaluated datasets for accuracy and quality.
- Collaborated with Business Owners to develop key business questions and to build datasets that answer those questions.
- Developing machine learning models using concepts of regression, random forest, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning.
- Developed and maintaineddatadictionary to create metadata reports for technical and business purpose.
- Provide technical & requirement guidance to the team members.
- Visualized graphs and reports using matplotlib, seaborn and panda packages in python on datasets for analytical models to know the missing values, outliers, correlation between the features.
- Participated in Business meetings to understand the business needs & requirements.
- Participated in stakeholders meetings to understand the business needs & requirements.
Environment: Machine learning, Pycahrm, Jupyter Notebook, HDFS, Tableau, Python (ScikitLearn/ Scipy/ Numpy/ Pandas ).
Confidential, Memphis, TN
Data Scientist
Responsibilities:
- Developed, tested and productionized a machine learning system for UI optimization, boosting CTR from 18% to 24% for the company’s website.
- Understanding and working knowledge on Machine Learning frameworks like Tensor flow, Azure ML
- Worked on different data formats such as JSON, XML and performed Machine Learning algorithms in Python.
- Performed data preprocessing on huge data sets containing millions of rows including missing data imputation, noise and data consolidation and much more.
- Analyze Data and Performed Data Preparation by applying the historical model to the data set in AZURE ML
- Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine Learning applications, executed Machine Learning use cases under Spark ML and Mllib.
- Tested Python on AWS cloud service and CNTK modeling on MS-Azure cloud service.
- Generalized feature extraction in the machine learning pipeline which improved efficiency throughout the system
- Designed several high-performance prediction models using various packages in Python like Pandas, Numpy, Seaborn, SciPy, Matplotlib, Scikit-learn, Pandas-data reader, and Stats models.
- Utilized python scikit-learn using machine learning algorithms deployed different predictive models and chosen the model that has high accuracy and low variance in the data.
- Implemented dimensionality reduction methods like PCA, t-SNE for features reduction to emphasize variation and bring out strong patterns in a dataset and used to make data easy to explore and visualize.
- Developed several ready-to-use templates of machine learning models based on specifications given and assigned clear descriptions of purpose and variables given as input into the model.
- Created various types of data visualizations using Python and Tableau.
- Involved in creating a monthly retention marketing campaign which improved customer retention rate by 15%
- Prepared reports and presentations using Tableau and MSOffice that accurately convey data trends and associated analysis.
Environment: Python, Tableau, Spark, Scala,Data Mining, Seaborn, Regression, Cluster Analysis, Windows/Linux.
Confidential - TN
Data Scientist
Responsibilities:
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users
- Worked with team of developers on Python applications for RISK management
- Developed Python application for Google Analytics aggregation and reporting
- Worked on Python Open stack API's, used Python scripts to update content in the database and manipulate files
- Implemented machine learning schemes using Python libraries Scikit-learn and SciPy.
- Experience in MVC architecture using Django for web based application in OOP concepts.
- Worked on several python packages like Matpoltlib, Pillow, Numpy, sockets.
- Worked on data transformation, data sourcing and mapping, Conversion and loading.
- Designed and deployed machine learning solutions in Python to classify millions of previously unclassified Twitter users into core data product
- Used Pandas API to put the data as time series and tabular form for east timestamp data manipulation and retrieval to handle time series data and do data manipulation
- Used Machine-learning techniques like unsupervised Classification, optimization and prediction.
- Worked on Python open stack API's.
Environment: Python, Tableau, Pycharm, Pandas, Postgre SQL, Jupyter Notebook.
Confidential
Responsibilities:
- Design, and develop innovative analytic models for analyzing large scale structured and unstructured data and gain actionable insights.
- Design and develop predictive models, data mining, text analytics solutions including custom algorithm solutions like Recommendation Engines, Decision Support Engines
- Design, implement, and operate comprehensive data warehouse systems to balance optimization of data access with batch loading and resource utilization factors, according to customer requirements.
- Develop data warehouse models, including sourcing, loading, transformation, and extraction.
- Create or implement metadata processes and frameworks.
- Write new programs or modify existing programs to data management requirements, using current programming languages and technologies.
- Review designs, codes, test plans, or documentation to ensure quality.
Environment: Python, html, Ms Access, SQL, Jupyter Notebooks, Spyder.