- A Passionate, team - oriented Data Scientist with over 6 years of experience in Data Extraction, Data Modelling, Statistical Modeling, Data Mining, Machine Learning and Data Visualization.
- Expertise in transforming business resources and tasks into regularized data and analytical models, designing algorithms, developing data mining and reporting solutions across a massive volume of structured and unstructured data.
- Extensive experience in Machine Learning solutions to various business problems and generating data visualizations using Python.
- Used Pandas, NumPy, Scikit-learn in Python for developing various machine learning models.
- Proficient at Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Neural Networks, Random Forest, Ensemble Models, SVM, KNN and K-means clustering.
- Solid knowledge and experience in Deep Learning techniques including Feedforward Neural Network, Convolutional Neural Network (CNN), Recursive Neural Network (RNN), pooling, regularization.
- Implemented deep learning models and numerical Computation with the help of data flow graphs using Tensor Flow Machine Learning.
- Excellent proficiency in model validation and optimization with Model selection, Parameter/Hyper-Parameter tuning, K-fold cross validation, Hypothesis Testing, Principle Component Analysis (PCA).
- We implemented and analyzed RNN based approaches for automatically predicting implicit relations in text. The disclosure relation has potential applications in NLP tasks like Text Parsing, Text Analytics, Text Summarization, Conversational systems.
- Worked on Gradient Boosting decision trees with XGBoost to improve performance and accuracy in solving problems.
- Worked with numerous data visualization tools in python like matplotlib, seaborn, ggplot, pygal
- Experience in designing visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Worked and extracted data from various database sources like Oracle, SQL Server, DB2, and Teradata.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and Pig.
- Knowledge and experience working in Waterfall as well as Agile environments including the Scrum process and using Project Management tools like ProjectLibre, Jira/Confluence and version control tools such as Github.
- Worked closely with the QA Team in executing the test scenarios, plans, providing test data, creating test cases, Issuing STR’s upon identification of bugs and collecting the test metrics
- Experience in performing user acceptance testing (UAT) and End to End testing monitoring test results and Networks C++ escalating based on priorities
- Worked with NoSQL Database including HBase, Cassandra and MongoDB.
Languages: SQL, T-SQL, PL/SQL, Java, C++, XML, HTML, MATLAB, DAX, Python, Matlab, R
Statistical Analysis: R, Python, SAS E-miner 7.1, SAS Programming, MATLAB, Minitab, Jupyter
Databases: SQL Server 2014/2012/2008/2005/2000 , MS-Access, Oracle 11g/10g/9i and Teradata, Hadoop-bigdata, Amazon Redshift.
BI Tools: Tableau, SSRS, Pentaho, Kettle, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, R-Studio.
Database Design Tools and Data Modelling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modelling, Fact & Dimensions tables, physical & logical data modelling, Normalization and De-normalization techniques, Kimball & Inmon Methodologies
Tools: and Utilities: SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA, Confluence.
Confidential, New York, NYC
- Championed the design & execution of machine learning projects to address specific business problems determined by consultation with business partners.
- Exercised Machine Learning Algorithms such as linear regression, SVM, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data. Piping and processing massive data-streams in distributed computing environments such as Hadoop to facilitate analysis (ETL).
- Strong experience of Python Matplotlib package and Tableau to visualize and graphically analyses data; Data pre-processing, splitting identified data set into Training set & Test set using other libraries in python.
- Performeddatawrangling to clean, transform and reshape datautilizing pandas library; analyzed data using SQL, R, Scala, Python, Apache Spark; presented analytical reports to management & technical teams.
- Built custom accuracy, precision and recall module to work with Watson Natural Language Classifiers.
- Detected the near-duplicated news by applying NLP methods (e.g. word2vec) and developing machine learning models like label spreading, clustering.
- I have worked with various kinds of data (open-source as well as internal). I have developed models for labeled and unlabeled datasets, and have worked with big data technologies, such as Hadoop and Spark, and cloud resources.
- Implemented batch and real-time model scoring to drive actions. Developed proprietary machine learning algorithms to build customized solutions that go beyond standard industry tools and lead to innovative solutions.
- Superintended usage of Python NumPy, SciPy, Pandas, Matplot, Stats packages to perform dataset manipulation, data mapping, data cleansing and feature engineering. Built and analyzed datasets using R and Python.
- Enforced model Validation using test and Validation sets via K- fold cross validation, statistical significance testing.
- Pre-researches on big data techniques such as Spark, Cassandra, NoSQL databasesand assess the advantages and disadvantages of them for a particular goal of the project.
- Implemented Data Factory pipelines, datasets, copy and transform data in bulk via Data Factory UI and PowerShell, scheduling and exporting data. Designed and developed standalone data migration applications to retrieve and populate data from AWS Tables storage to Python and Power BI.
- Conceptualized and built data models, tools, custom visualizations and dashboards in Tableau that communicate results to clients, developed a compelling story with the data and optimized their performance.
- Generating weekly, monthly reports for various business users according to the business requirements.
Environment: - Python 3.6.4, R Studio, MLLib, Regression, NoSQL, SQL Server, Spyder 3.6, Agile, Tableau, Java, NumPy, Pandas, Matplotlib, Scikit-Learn, ggplot2, Shiny, Tensorflow, AWS (EC2, S3), Teradata.
Confidential, Princeton, NJ
- Client was a commercial bank in Ghana. I performed a predictive analysis of credit scoring to predict whether or not credit extended to a new or an existing applicant will likely result in profit or losses.
- Improved classification of bank authentication protocols by 20% by applying clustering methods on transactiondatausing Python Scikit-learn locally, and Spark MLlib on production level.
- Data was extracted extensively by using SQL queries and used R, Python packages for Data Mining tasks.
- Performed Exploratory Data Analysis, Data Wrangling and development of algorithms in R and Python for data mining and analysis.
- Implemented Natural Language Processing (NLP) methods and pre-trained word2vec models for the improvement of in-app search functionality.
- Involved in transforming data from legacy tables to HDFS and HBASE tables using Sqoop. Research on Reinforcement learning and control (Tensorflow, Torch) and machine learning model (Scikit-learn).
- Used Python based data manipulation and visualization tools such as Pandas, Matplotlib, Seaborn to clean corrupted data before generating business requested reports.
- Developed extension models relying on but not limited to Random forest, logistic, Linear regression, Stepwise, Support Vector machine, Naive Bayes classifier, ARIMA/ETS model, K-Centroid clusters.
- Used Machine Learning to build various algorithms (Random Forest, Decision trees, Naive Bayes) classification models.
- Extracteddatafrom HDFS and prepareddatafor exploratory analysis using Data Munging.
- Extensively used R packages like (GGPLOT2, GGVIS, CARET, DPLYR) on huge data sets.
- Used R, Python programming languages to graphically analyses the data and perform data mining.
- Did extensive data mining to find out relevant features in an anonymized dataset using R and Python. Used an ensemble of Xgboost (Tuned using Random Search) model to make predictions.
- Explored 5 supervised Machine Learning algorithms (Regression, Random Forest, SVM, Decision tree, Neural Network) & used parameters such as Precision/Adjusted R-Squared /residual splits to select the winning model.
- Developed Tableau based dashboard from oracle, SQL databases to present to business team for data visualization purpose.
Environment: - R (dplyr, caret, ggplot2), Python (Numpy, Pandas, PySpark, Scikit-learn, Matplotlib, NLTK), T-SQL, MS SQL Server, R Studio, Spyder, Jupyter notebook, Tensorflow, MATLAB, Scala, Shiny, Oracle, Teradata, Tableau.
- The project involved developing the predictive models for loan issues, risk management for new foreign exchange products, and fraud detection, customer segmentation, for the data from multiple Data sources .
- Actively develop predictive models and strategies for effective fraud detection for credit and customer banking activities using clustering K-means.
- Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Used Python Matplotlib packages to visualize and graphically analyses the data.
- Data pre-processing, splitting the identified data set into Training set and Test set.
- PerformedDataWrangling to clean, transform and reshape thedatautilizing pandas library.
- Data cleaning, Data wrangling, manipulation, and visualization. Extract data from relational databases and perform complex data manipulations. Also conducted extensive data checks to ensure data quality.
- Used R, Python programming languages to graphically analyses the data and perform data mining. Also Built and analyzed datasets using Python,MATLABand R.
- Handled importingdatafrom variousdatasources, performed transformations using Hive, Map Reduce, and loadeddatainto HDFS.
- Understand transactiondataand develop analytics insights using statistical modeling usingArtificial Intelligence (AI)using Python.
- Analyzed performance of image segmentation using ConvolutionalNeuralNetworks(CNN).
- Analyzed performance of recurrentNeuralNetworksfor data over time.
- Used Python NumPy, SciPy, Pandas packages to perform dataset manipulation.
- Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
- Applied NLP techniques (LDA, NMF) on Steam game descriptions and discovered latent features incorporated as item sidedata
- Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R,MATLAB and Python.
- Detected and classified required container images usingdeep learning algorithms (NN, ANN, CNN with backend in Keras,Tensorflow) in Python.
- Used Python Scikit-learn, Theano, Tensorflow and keras packages to train machine learning models.
- Performed Pre-research on big data tools such as Spark, Cassandra, NoSQL databasesand assess the advantages and disadvantages of them.
- Extensively used open source tools - R Studio(R), Spyder (Python), Jupyter Notebooks for statistical analysis and building the machine learning models.
Environment: R, Python, Spark MLlib, TensorFlow, Keras, Sypder, Jupyter notebook, R Studio, Tableau, Scala, NoSQL, NLP, NLTK, NumPy, SciPy, Pandas, AWS (EC2, RDS, S3), Matplotlib, Scikit-learn, Shiny
Confidential, New Jersey
- The project was designed and built for one of the first multivariate model-based continuous risk differentiation in the industry.
- Worked in all phases of research like Data Cleaning, Data Mining, Feature Engineering, Developing tools, Validation, Visualizations and performance monitoring.
- Handling huge data and performing creating, reading, updating and deleting(CRUD) operations on NoSQL databases like MongoDB 3.2.12.
- Feature Engineering, PCA, Feature Selecting, Data Cleaning (missing value), Clustering methods, K-mean
- Used Python for managing, transforming and integrating with datasets in preparation for analytics.
- Performed exploratory data analysis, data cleaning, Outlier Detection, Feature Scaling, Feature engineering and data visualizations using Python Libraries such as Numpy, Pandas, Scipy, Seaborn, Sklearn, Imblearn and Matplotlib.
- Developed predictive models using Logistic regression, Decision Tree, Random Forest and KNN algorithms.
- Used Cross-Validation for checking overfitting in suggested model.
- To check the performance of model Confusion matrix, Recall Rate and Precision Rate.
- Performed Hypothesis testing to find the accuracy of the model created using machine learning algorithms.
- Created interactive Dashboards on the desktop platform to visualize the data by using Power BI in MS Excel and Tableau 9.3.
- Project life cycle involved in Agile Methodology and used Git as version control.
- Interacted with the other departments to understand and identify data needs and requirements and work with other members of the IT organization to deliver and address their needs.
Environment: Python, MongoDB, Oracle SQL, Numpy 1.12.1, Pandas 0.18.1, Scipy 0.19.0, Matplotlib 2.0.0, Sci-kit learn, Anaconda 3.0, Power BI, Tableau 9.0, Git
Confidential, New Jersey
- Performed k-Means clustering in order to understand customer backgrounds and segment the customers based on the customer transaction behavior information for customized product offering and priority service, to avoid customer churn and to improve existing profitable relationships, etc.
- Assisted data analytics team to do text mining on customer review data, using topic modeling and sentimental classification, using deep learning algorithms like CNN, RNN, LSTM, and GRU, to remediate according financial products.
- Assisted senior quantitative analyst in assessing risk management of FX products using machine learning techniques for providing appropriate investment recommendations using Collaborative filtering recommender system.
- Mentored sophisticated organizations on large scale customer data and analytics using advanced machine learning and statistical models relying for loan issue.
- Determine data reduction methodologies for dealing with missing values (median Impute, knn impute), and outliers (center, scale, NZV), and noisy data (correlation matrix, principal components analysis, clustering).
- Performed Random forest, CART, C5.0 Boosting, SVM algorithm and compared them using R Caret package and performed model tuning.
- Work with Data Analytics team to develop time series and optimization.
- Worked on Interactive Dashboards for building story and presenting to business using Tableau.
- Implementing Hadoop to provision big data analytics platforms for bank customer data. Used MapReduce, Sqoop, Hive and Spark to migrate and analyze large call-quality-data datasets from multiple Data sources like integrated funds transfer system for fraud detection and risk management for securities, treasury or credit derivatives, and web-based cash management systems for customer account reconciliation based on positive pay, and automated Cash Handling, balance reporting, etc.
- Involved in development and maintenance of Oracle database using PLSQL and back-end development using C/C++ for intra-net management system for Employee Management System (EMS) and Agent Pay-out System (APS).
Environment: - UNIX, Jupyter Notebook, PyCharm, R-Studio, Tableau, Apache Spark, Hadoop (Sqoop, Pig, Hive), Oracle 11g, Oracle PL/SQL, MS Visio 2000, Erwin 4.1, MS Excel 2010, Crystal Reports
Confidential, North Arlington, NJ
- Successfully conducted market research for multi-national consumer shopping trends by collaborating with marketing and engineering organizations during the feature’s life cycle, including performance analysis and A/B testing.
- Performed data visualizations and dashboards by using Metabase and Tableau, scrutinized questionnaires by SPSS, and presented meaningful metrics to multiple audiences.
- Optimized predictive models used in a variety of areas including attribution modeling and order frequency prediction.
- Compiled large data sets gathered through multiple sources and databases to generate reports and analytical to support marketing initiatives and growth goals, determined measurement approaches, generated actionable insights, and aligned analytical strategy for new challenges.
- Delivered hands-on support in executing Business Operations projects, which included planning, requirements gathering,
- Performed daily reporting on business KPIs and delivered recommendations, evaluated marketing campaigns across both online/offline channels as well as provided effective marketing strategic plans.
- Effectively utilized excel to extract, clean, and manage clients’ data and reports; generated and analyzed deep insights into hotel employees’ data through its membership and check-in data system.
- Scrutinized operational and financial performance of marketing channels, interacted with internal teams for campaign set-up and authorization, and evaluated campaigns results.
- Recognized and fixed analytics needs and identified new opportunities to enhance product performance in cooperation with teams.
- Built, managed, and presented a suite of PowerBI dashboards and other self-service tools for key stakeholders.
Environment: - SQL Server, Linux, Python, (Scikit Learn, NumPy, Pandas, Matplotlib), R, ML algorithms, Tableau