- Over 12+ Years of experience in IT industry within Data Science (Machine Language) and Data Analysis. Hands on experience in SDLC (Software Development Life Cycle) like gathering requirements, designing, developing, implementing and testing projects before moving to production.
- Experienced in building production - level Machine Learning models, with a strong background in Statistics.
- Hands on experience manipulating data using Python and R.
- Experience with Statistical Analysis, Data Mining and Machine Learning Skills using R, Python and SQL.
- Experience in manipulating the large data sets with R packages like tidyr, tidyverse, dplyr reshape, lubridate and visualizing the data using lattice and ggplot2 packages.
- Good Knowledge on Natural Language Processing (NLP) in Python and R.
- Experience in Performance tuning and Query Optimization, Data Transformation Services and Database Security.
- Experience in Visualized data using different visualization tools Azure Machine Learning, Python and Power BI.
- Extensive experience in Text Analytics, generating Data Visualization using Python creating dashboards using tools like Tableau.
- Built visualizations to convey technical research findings, and to ensure alignment of DataScience Ideas with client use cases.
- Experience in problem solving, Data science, Machine learning, Statistical inference, predictive analytics, descriptive analytics, prescriptive analytics, graph analysis, natural language processing, and computational linguistics; with extensive experience in predictive analytics and recommendation.
- Experienced T-SQL programming (DDL, DML and DCL) skills like creating Stored Procedures, User Defined Functions, Constraints, Querying, Joins, Keys, Indexes, Data Import/Export, Triggers, Tables, Views and Cursors.
- Evaluated & reviewed test plan against functional requirements and design documents policies, procedures, and regulatory requirements.
- Demonstrated leadership abilities and team work skills as well as the ability to accomplish tasks under minimal direction and supervision.
- Ability to deliver highly complex technical information in terms and concepts that the end users or management can readily grasp or understand.
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
- A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Pythonand R. programming.
- Responsible for defining the key business problems to be solved while developing, maintaining relationships with stakeholders, SMEs, and cross - functional teams.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- Installed and used Caffe Deep Learning Framework.
- Implemented Hypothesis testing kit for sparse sample data by wiring R packages.
- Carrying out specified data processing and statistical techniques such as sampling techniques, hypothesis testing, time series, correlation and regression analysis using R.
- Data collections and exploratory data analysis in R.
- Extracting data from over one lac excel sheets with different formats in R.
- Extensively used open source tools - Spyder (Python) for statistical analysis and building machine learning algorithms.
- Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, Gradient Boosting Machine to build predictive model using scikit-learn package in Python.
- Generated data analysis reports using Matplotlib successfully delivered and presented the results for C-level decision makers.
- Used Tableau and Python for programming for improvement of model.
- Used Pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
- Translate business needs into mathematical model and algorithms and build exceptional machine learning and Natural Language Processing (NLP) algorithms (using Python, java and NLP modules).
- Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
- Implemented Agile Methodology for building an internal application.
- Programmed a utility in Python that used multiple packages (Scipy, NumPY and Pandas).
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Validated the machine learning classifiers using ROC Curves and Lift Charts.
- Extracted data from HDFS and prepared data for exploratory analysis using data munging.
Environment: ER Studio, AWS, MDM, GIT, Unix, Python, MLLib, SAS, Regression, Logistic Regression, Hadoop, OLTP, OLAP, HDFS, NLTK, SVM, JSON, XML.
- Involved in gathering, analyzing and translating business requirements into analytic approaches.
- Worked with Machine learning algorithms like Neural network models, Linear Regressions (linear, logistic etc.), SVM's, Decision trees for classification of groups and analyzing most significant variables.
- Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.
- Implementing analytics algorithms in Python, R programming languages.
- Performed K - means clustering, Regression and Decision Trees in R.
- Worked on Na ve Bayes algorithms for Agent Fraud Detection using R.
- Performed data analysis, visualization, feature extraction, feature selection, feature engineering using Python.
- Generated detailed report after validating the graphs using Pythonand adjusting the variables to fit the model.
- Worked on Clustering and factor analysis for classification of data using machine learning algorithms.
- Used Power Map and Power View to represent data very effectively to explain and understand technical and non-technical users.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
- Used TensorFlow, Keras, Theano, Pandas, NumPy, SciPy, Scikit-learn, NLTK in Python for developing various machine learning algorithms such as Neural network models, Linear Regression, multivariate regression, na ve Bayes, random Forests, decision trees, SVMs, K-means and KNN for data analysis.
- Responsible for developing data pipeline with AWS S3 to extract the data and store in HDFS and deploy implemented all machine learning models.
- Worked on Business forecasting, segmentation analysis and Data mining and prepared management reports defining the problem; documenting the analysis and recommending courses of action to determine the best outcomes.
- Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
- Worked with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.
Environment: Python, Jupyter, MATLAB, SSRS, SSIS, SSAS, MongoDB, HBase, HDFS, Hive, Pig, SAS, Power Query, Power Pivot, Power Map, Power View, SQL Server, MS Access.
Jr. Data Scientist
Confidential, Boca Raton, FL
- Involved in defining the Source to business rules, Target data mappings and data definitions.
- Performing Data Validation /Data Reconciliation between disparate source and target systems for various projects.
- Identifying the Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.
- Prepared Data Visualization reports for the management using R.
- Used R machine learning library to build and evaluate different models.
- Utilized a broad variety of statistical packages like R, MLIB, Python and others.
- Performed data cleaning using R, filtered input variables using the correlation matrix, step - wise regression, and Random Forest.
- Performed Multinomial Logistic Regression, Random Forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers and Data Scientists.
- Segmented the customers based on demographics using K-means Clustering.
- Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.
- Extensively using MS Excel for data validation.
- Generating weekly, monthly reports for various business users according to the business requirements.
Environment: Python, R, ETL, BI, TSQL, SQL, Machine Language, MS Excel and Windows.