Data Scientist Resume

SUMMARY

Experienced Data Scientist with 6+ years of experience in Acquisition of Datasets, Data Engineering to extract features using Statistical Techniques, performing Exploratory Data Analysis, build diverse Machine Learning Algorithms for developing Predictive Models and plotting visualizations for Business profitability.
Outstanding preeminence in Data extraction, Data cleaning, Data Loading, Statistical Data Analysis, Exploratory Data Analysis, Data Wrangling, Predictive Modeling using R, Python and Data visualization using Tableau.
Profound knowledge in Machine Learning Algorithms like Linear, Non - linear and Logistic Regression, SVR, Natural Language Processing, Random forests, Ensemble Methods, Decision tree, Gradient-Boosting, K-NN, SVM, Naïve Bayes, K-Means Clustering.
Experienced in implementing Ensemble Methods (Bagging and Boosting) to enhance the model performance.
Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA, K-fold cross validation, R-Square, CAP Curve, Confusion Matrix, ROC plot, Gini Coefficient and Grid Search.
Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Seaborn, Matplotlib, NLTK and Scikit-learn)
Experienced in visualization tools like, Tableau9.X, 10.X for creating KPI’s, Forecasting & other analytical dashboards.
Strong understanding of advanced Tableau features including calculated fields, parameters, table calculations, row-level security, R integration, joins, data blending, and dashboard actions
Scheduled & distributed reports in multiple formats using Tableau, SSRS and Visual Studio.
Worked in data collection, transformation, and storage from various sources including relational databases, APIs, logs, and unstructured file.
Strong knowledge in Database, Data warehouse concepts, ETL processes & dimensional modeling, (Star and Snowflake Schemas).
Strong experience in Big data technologies like Spark 1.6, Spark sql, pySpark, Hadoop 2.X, HDFS, Hive.
Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQLServer2008, NoSQL databases like MongoDB3.2
Created database objects like tables, indexes, views, user defined functions, stored procedures, triggers, cursors, data integrity using SQL. Experienced in SQL tuning techniques.
Progressive involvement in Software Development Life Cycle (SDLC), GIT, Agile methodology and SCRUM process using powerful change management tools like Jira & Service Now.
Strong business sense and abilities to communicate data insights to both technical and non-technical clients.
Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining & reporting solutions that scales across massive volume of structured and unstructured data.
Extensive experience in creating and maintaining source to target data mapping documents.
Documented Traceability Matrices based on defined business rules in Microsoft Excel.
Strong documentation and analytical skills, strong problem-solving skills.
Worked in development environment like Git and VM.
Facilitated weekly and monthly business review meetings to keep SME’s, Stakeholders, product managers, executive staff and team members apprised of goals, project status, resolving issues and conflicts and preparing product demonstrations.
Solid business communication, and presentation skills, written correspondence, web-based products, reports,(project status reports, PowerPoint presentations, e-mails ).
Organized, goal-oriented, self-starter, and ability to master new technologies, manage multiple tasks while following through from start to completion with limited supervision.
Competent, confident and strategic decision maker organized with effective Time Management and Prioritizing skills in conflicting demands to execute business objectives.
Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS

SDLC: Agile (Scrum)

BI Tools: Tableau, MicroStrategy

Data science: Experiment design, A/B testing, Hypothesis testing, Supervised & Unsupervised Learning, Statistical Inference, KNIME Analytics Platform

Programming languages: Python, R, Java, C, C++, PHP, JavaScript, jQuery, HTML

Cloud Services: AWS, CTRLS Datacenters

Big Data Ecosystems and languages: Hadoop, SCALA, Spark, Agile, Apache Cordova

Database: SQL Queries (SQL Server, MySQL, PostgreSQL)

ETL Tool: Informatica

Task Management Tools: Jira, Rally

IDE and Automation Platforms: PyCharm, Eclipse

MS Office: Excel (pivot tables, v-lookups), Access, PowerPoint, Visio

Others: User Stories, Business Requirements, Functional Requirements, Use Cases, Activity Diagrams, Sequence Diagrams, Data Flow Diagrams

PROFESSIONAL EXPERIENCE

Confidential

Data Scientist

Responsibilities:

Cleaned and manipulated complex datasets to create the data foundation for further analysis and the development of key insights (MS SQL server, R, Tableau, Excel)
Incorporated various machine learning algorithms and advanced statistical analysis like decision trees, regression models, SVM, clustering using scikit-learn package in Python
Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python
Performed Exploratory Data Analysis (EDA) to visualize through various plots and graphs using matplotlib and seaborn library of python, and to understand and discover the patterns on the Data, understanding correlation in the features using heatmap, performed hypothesis testing to check significance of the features
Developed analytical approaches to answer high-level questions and provided insightful recommendations
Conducted various statistical analysis like linear regression, ANOVA and classification models to the data for analysis
Involved in extracting customer's Big Data from various data sources (Excel, Flat Files, Oracle, SQL Server, MongoDB, Teradata, and also log data from servers) into Hadoop HDFS
Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn in Python for developing various machine learning algorithms
Assured data quality and data integrity, and optimized data collection procedures on a weekly and monthly basis
Created Data Quality Scripts using SQL and Hive to validate successful data load and assured the quality of data.
Worked on different data formats such as JSON, XML, CSV, .dat and exported the data into data visualization/ ETL platform
Evaluated model performance using techniques like R square, adjusted R square, confusion matrix, AUC, ROC curve, Root mean squared error etc.
Incorporated, Developed and applied metrics and prototypes that could be used to drive business decisions
Participated in ongoing research, and evaluation of new technologies and analytical solutions to optimize the model performance
Used problem-solving skills to find and correct the data problems, applied statistical methods to adjust and project results when necessary
Worked across cross-functional teams to understand the data requirements and provided the detailed analytical reports to accomplish the business decisions

Confidential

Data Scientist

Responsibilities:

Participated in all phases of data mining, data cleaning, data collection, developing models, validation, and visualization
Accomplished data pipeline process, collected required data from different sources and converted into structured form using SQL
Hands on experience in Working with the large volume of data with more than 10M customer records with 20+ features
Incorporated Exploratory Data Analysis to identify the correlation between variables, multicollinearity, and hidden patterns, trends and seasonality using Numpy and Pandas libraries to perform data analysis
Analyzed customer churn distribution over the different attributes such as tenure, different services a customer has signed up for and demographic data using visualization libraries matplotlib and seaborn in python
Performed PCA, backward feature selection, correlation analysis for Dimensionality Reduction of the data to achieve the accuracy in the results
Performed SMOTE analysis to deal with unbalanced distribution of the data over the training data set
Implemented various classification models such as Logistic Regression, Decision Trees, Random Forest, KNN, XGBoost, and SVM and applied most efficient algorithm to predict the results
Performed K-Fold cross-validation to test models with different batches to optimize the model and prevent overfitting
Participated in feature engineering such as feature generating, PCA, feature normalization and label encoding with Scikit-learn preprocessing
Achieved 90% customer monthly retention by predicting the likelihood of returning customers using a logistic regression model in R
Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI
Improved data cleansing and mining process based on R and SQL, resulting in a 50% of time reduction
Conducted analysis and pattern on customer’s needs in different location, different categories and different months by using time series modeling techniques

Confidential

Data Scientist

Responsibilities:

Developed different modules to enhance the organizations outcomes.
Inclusion of AAR Template to create reports and graphs automatically to help organization clearly understand and analyze their inventory status.
Development of predictive reports to assist clients in their decisions of effectively handling procurement.
Created Android Application for online inventory management and invoicing system(Cordova).
Query optimization to increase the performance of the application.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship