Data Scientist Resume
4.00/5 (Submit Your Rating)
OH
SUMMARY:
- around 3 + years of experience in Data Science and Analytics including Data Mining , Deep Learning/Machine Learning and Statistical Analysis .
- Excellent understanding Agile and Scrum development methodology
- Involved in the entire data science project life cycle and actively involved in all the phases including data cleaning, data extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.
- Experienced with machine learning algorithm such as logistic regression, KNN, SVM, random forest, neural network, linear regression , lasso regression and k - means .
- Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Scipy, Numpy and Pandas for data analysis.
- Hands-on experience in importing and exporting data using Relational Database including Oracle , MySQL and MS SQL Server, and NoSQL database like MongoDB .
- Skilled in performing data parsing, data manipulation, data architecture, data ingestion and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, merge, Remap, subset, reindex, melt and reshape .
- Experienced in Big Data with Hadoop, MapReduce, HDFS and Spark.
- Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio, SSAS , SSIS and SSRS .
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau .
- Comprehensive knowledge and experience in normalization/de-normalization, data extraction, data cleansing and data manipulation .
- Familiarity with AWS and distributed data processing systems .
- Experience in building models with deep learning frameworks like TensorFlow, PyTorch and Keras.
TECHNICAL SKILLS:
SDLC, Agile, Scrum, Python, SQL, SQL Server, MySQL, NoSQL, MongoDB, Machine Learning, Deep Learning, Matplotlib, Scipy, Numpy, Pandas, MS Visio, Hadoop, MapReduce, HDFS, Spark, SSAS, SSIS, SSRS, Tableau, Power BI, TensorFlow, PyTorch, Keras, AWS, Windows, Linux
PROFESSIONAL EXPERIENCE:
Confidential, OH
Data Scientist
Responsibilities:
- Worked on transforming the data using stream sets. Created multiple pipelines from various source points.
- Worked on miss value imputation, outliers identification with statistical methodologies using Pandas , Numpy .
- Refactoring R code to Python for Data scalability .
- Implemented Statistical model and Deep Learning Model(Logistic Regression, XGboost, Random Forest, SVM, RNN, CNN) .
- Collected data using Hadoop tools to retrieve the data required for building models such as Hive and Pig .
- Tested many machine learning techniques like decision tree, random forest, artificial neural network, naïve Bayes probability model, regression models for classification and prediction.
- Worked on Amazon Web Services ( AWS ) cloud virtual machine to do machine learning on big data.
- Collected unstructured data from MongoDB and completed data aggregation.
- Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards .
- Developed MapReduce pipeline for feature extraction using Hive .
- Used Meta data tool for importing metadata from repository, new job categories and creating new data elements.
- Mine data to prototype models for targeting and personalization.
- Performed data visualization with Tableau and generated dashboards to present the findings.
- Used Git to apply version control. Tracked changes in files and coordinated work on the files among multiple team members.
- Identifying relevant key performing factors; testing their statistical significance .
Confidential
Data Scientist
Responsibilities:
- Design and develop analytics, machine learning models , and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.
- Design, built and deployed a set of Python modeling APIs for customer analytics, which integrate multiple machine learning techniques for various user behaviour prediction.
- Application of various Artificial Intelligence (AI)/ machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised , regression models.
- Segmented the customers based on demographics using K-means Clustering.
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau and Power BI. Also used R to generate regression models to provide statistical forecasting.
- Implemented Key Performance Indicator ( KPI ) Objects, Actions, Hierarchies and Attribute Relationships for added functionality and better performance of SSAS Warehouse.
- Used big data tools Spark ( PySpark, Spark SQL, Mllib ) to conduct real time analysis of loan default based on AWS .
- Conducted Data blending , Data preparation using Alteryx and SQL for Tableau consumption and publishing data sources to Tableau server .
- Created sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using SSRS .
- Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized score and predict residency.
- Collected data needs and requirements by Interacting with the other departments.