- Accomplished Data Scientist and Analytics professional with software development experience and 9+ years of hands - on experience leveraging machine learning and Deep Learning models to solve challenging business problems.
- Dual Master’s in computer science and Computer Applications.
- Utilized the programming languages, Python and R to create predictive and statistical models, and libraries for data mining.
- Utilizing the apache spark framework for application development, and custom machine learning algorithms.
- Experience working with the executive team to present Data Science based recommendations for growth strategy.
- Professional qualified Data Scientist/Data Analyst with around 5+ years of experience in Data Science and Analytics including Deep Learning/Machine Learning, Data Mining, and Statistical Analysis.
- Over 5 years of experience in Machine Learning, Data mining with large datasets of Structured and unstructured data, Data Acquisition, Data Validation, Predictive modelling, Data Visualization.
- Expertise in transforming business resources and requirements into manageable data formats and analytical models, designing algorithms, building models, developing data mining and reporting solutions and scale.
- Proficient in managing entire data science project life cycle including Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Statistical Modeling, Testing and Validation, Visualization and Reporting.
- Expertise in building statistical models using algorithms like Regression, Random Forest, Decision Trees, Market Basket Analysis, KMeans
- Proven expertise in employing techniques for Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization methods and Natural Language Processing(NLP), Time Series Analysis.
- Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, scikit-learn, Matplotlib, Seaborn, BeautifulSoup, Orange, Rpy2, LibSVM, neurolab, NLTK.
- Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling, and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python, and Tableau.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop.
- Development as well as machine learning, big data, visualization, web scraping, statistics and analytics.
- Collaborated with an ad-tech firm to analyze household-level data; leveraged Principal component analysis to deal with high-dimensional data, K-means clustering to find the relationships in data, location analyses and supervised learning methods to develop propensity recommendations.
- Applied random forest, gradient boosting and linear regression machine learning algorithms combined with feature engineering to minimize the mean absolute error.
- Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting.
- Proficient in Tableau and R-Shiny data visualization tools to analyze and obtain insights into large datasets create visually powerful and actionable interactive reports and dashboards.
- Strong SQL programming skills, with experience in working with functions, packages and triggers.
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.
- Developed and fine-tuned AI based object detection models using Tensorflow.
- Gained an all-round idea of different ingredients of self-driving car development pipeline.
- Research, develop, optimize, deploy and support advance computer vision algorithms.
- Improved image detection accuracy by hyper-parameter optimization on different models (CNN) like ResNet, VGG, MobileNet, Inception, etc. and modifying existing layers/models as per the use case.
- Applying Deep learning for Object Detection, segmentation, classification and anomaly detection purposes.
- Deep Learning: GPU coding Machine Learning
- Artificial Intelligence Computer Vision: Internet of Things, Augmented Reality Blockchain
- TensorFlow/Python/OpenCV/CUDA/OpenGL/Google Cloud Platform/ R
Coding Languages: C, C++, C#, CUDA, OpenGL, Python (2.x/3.x), R, Java, Scala
Python Libraries: TensorFlow, OpenCV, Scikit-Learn, Numpy, MatPlotlib, Pandas, Keras, SciPy, seaborn, NLTK, SciPy, Plotly, XGBoost, LightGBM, PyQT, MLIB, Rpy2, LibSVM, NLTK, neurolab, BeautifulSoup.
Databases: SQL Server 2008/2012/2014 , MongoDB, MySQL
Tools: SAS, PyCharm, UML, Jupyter, Eclipse, Android Studio, Docker
Reporting & Visualization Tools: Tableau 8.X, SSRS, SSAS, SSIS, Seaborn, Matplotlib, ggplot2.
Hadoop Ecosystem: Hadoop 2.x, Spark 2.x, Map Reduce, Hive, HDFS, Pig.
Cloud Services: Amazon Web Services (AWS) EC2, Lambda, S3, Google Cloud Platform for Vision api’s
Operating Systems: Linux (Ubuntu 14.x - 16.x), Windows 7 - 10, Rasbian Jessie Pixel, Android things
Environment: Anaconda, Visual Studio
Blockchain Platform: Hyperledger Fabric, Hyperledger Composer, Smart contracts, Hyperledger Explorer
IoT: Raspberry Pi, Arduino
- Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Predictive model for chat/email to identify the issues that customers have with the products.
- Performed Ad-hoc reporting/customer profiling, segmentation using R/Python.
- Extracted data from database, copied into HDFS File system and used Hadoop tools such as Hive and Pig Latin to retrieve the data required for building models.
- Used AWS S3, DynamoDB, AWS lambda, AWS EC2 for data storage and models' deployment.
- Created and maintained reports to display the status and performance of deployed model and algorithm with Tableau.
- Used PCA and other feature engineering techniques to reduce the high dimensional data, feature normalization techniques and label encoding with Scikit-learn library in Python.
- Implemented, tuned and tested the model on AWS Lambda with the best performing algorithm and parameters.
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.
- Developed NLP models for Topic Extraction, Sentiment Analysis.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Work with NLTK library to NLP data processing and finding the patterns.
- Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
- Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
- Implemented a deep learning using Tensorflow to create word semantic representations of customer data
- Performed data cleaning and feature selection using MLLib package in Spark and working with deep learning frameworks such as tensorflow.
Environment: Python (Scikit- Learn/SciPy/Numpy/Pandas), Tensorflow, Scikit-Learn, Keras, NLTK, Hive, DynamoDB, AWS Lambda, Spark, MLLib
- Developed functional modules for project
- Developing a prototype system, termed as MIAS-0.5(Media Intelligence - Automation Stage) for automatic segmentation of limited print media
- Demonstrating MIAS 0.5 version to clients on project changing over developing the code.
- Providing the Documentation support for the Team
- Communicating with Google Cloud Platform for Vision api’s
- Processing the Image datasets for configuring to MIAS
- Analyzing the MIAS Product features, Automatic Segmentation, Logo Detection, Annotation Search
- Checking for AD and Non-AD features in Images.
- Extract the appropriate the segmented data from the image datasets and presenting them on MIAS GUI
- Searching for match title from image and providing the classification information
- Automatic cropping information from the image with given features from the customized data provided by the client
- Implemented Google Cloud platform for MIAS product and applying machine learning api’s and vision api’s for automatic language identification.
- Features include segment by each advertisement and contents, classify ad from text, segment ad from image.
- Image Acquisition module include extracting text and segmentation
Environment: Google Cloud Platform, Vision Api’s, OpenCV, Python, Pycharm, PyQT
- Utilized machine learning algorithms such as linear regression, multivariate regression, Naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Used Python Matplotlib packages to visualize and graphically analyses the data.
- Performed data wrangling to clean, transform and reshape the data utilizing panda’s library.
- Analyzed data using Sql, R, Java, Scala, Python, Apache Spark and present analytical reports to management and technical teams.
- Used R programming language to graphically analyses the data and perform data mining.
- Used Python NumPy, SciPy, Pandas packages to perform dataset manipulation.
- Used Data Quality validation techniques to validate Critical Data elements (CDE) and identified various anomalies.
- Extensively worked on statistical analysis tools and adept at writing code in Advanced Excel, R, and Python.
- Built and analyzed datasets using R, and Python (in decreasing order of usage).
- Used Python Scikit-learn, Tensorflow packages to train machine learning models.
- Analyzing Business requirements, data mapping requirement specifications and responsible for extracting data as per the business requirements.
- Utilize a broad variety of statistical packages like SAS, R, MLIB, Graphs, Spark, Map Reduce, Pig and others.
- Create data pipelines using Hadoop, spark as big data technologies etc.
- Constructed and trained a neural network to predict advertisement click through rates.
Environment: Machine learning, Spark, HDFS, Hive, Linux, Python (Scikit-Learn/SciPy/Numpy/Pandas), R, MySQL, Eclipse, MongoDB, PL/SQL, Tableau, Visual Studio
- Participated in Architect solution meetings and guidance in Dimensional Data Modeling design.
- Participated in stake holder’s meetings to understand the business needs and requirements.
- Design ETL framework & development.
- Extensively performed large data read/writes to and from csv and excel files using pandas.
- Design Logical and Physical Data Model using MS Visio 2003 data modeler tool. Coordinate & communicate with technical teams for any data requirements
- Responsible for creating and maintaining the integrity of large client datasets generated from multiple and disparate data streams.
- Experience in creating and implementing machine learning algorithms and advanced statistics such as: regression, clustering, decision trees, exploratory data analysis methodology, simulation, scenario analysis, modeling, and neural networks.
- Design, Develop, Test and implement analytical models on existing platforms.
- Execute ad hoc reports with SSMS (SQL Server) and MS Access for advanced analysis of data and report generation.
- Used Tableau to connect data from SQL Server, build up visualized reports and analysis.
- Extensive working knowledge of data and analytics
Environment: Tableau, Oracle, R, Python, Spark, Hive, Machine learning algorithms, HDFS, SSAS, SSIS, SSRS, PL/SQL