- Around 8+years of professional IT experience in Machine Learning Statistic Modeling, Predictive Modeling, Data Analytics, Data Modeling, Data Architecture, Data Analysis, Data Mining, Text Mining and Natural Language Processing (NLP), Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL.
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling
- Extensive experience in using different modeling techniques like decision trees, regression models, neural networks, SVM, clustering, dimensionality reduction using Principal Component Analysis and deep learning
- Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau
- Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation
- Good knowledge in Database Creation and maintenance of physical data models with Oracle, Teradata, Netezza, DB2 and SQL Server databases
- Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and expertise R user
- Experience in Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Ab Initio and InformaticaPowerCenter
- Experienced in developing Physical Data Model for multiple platforms - SQL Server/ DB2/ Oracle/ Teradata
- Experienced in working on SAP Customer Relationship Management to optimize sales, service and management cycles
- Experience in applying PredictiveModeling and MachineLearning algorithms for Analytical projects.
- Collaborated with the lead Data Architect to model the Data warehouse using Snowflake schema with 3NF format
- Experience in coding SQL/PLSQL using Procedures, Triggers and Packages
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, desktop platforms and Storyline on webs.
- Highly skilled in using visualization tools like ggplot2, Tableauand d3.js for creating dashboards
- Proficiency in understanding statistical and other tools/languages - R, Python, C, C++, Java, SQL, UNIX, H2o.ai driverless AI and many custom built tools
- Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypotheticaltesting, normal distribution and other advanced statistical
- Experienced in using ETL tools in (SSIS) MS SQL 2016, 2014, MS SQL 2012, MS SQL 2008, MSSQL 2005 and DTS in MS SQL 2000.
- Experience in Deploying Machine Learning models into production environment
Key words: Machine Learning, Deep Learning, Natural Language, Python, R, Pandas, Data Science, Big Data, Apache Spark, AWS
Programming: R, Python, Groovy, SQL, Java
Databases: Oracle, PL SQL, SQL Server, MySQL
Business Intelligence Tools: Tableau, D3js, Plotly, Shiny App
Machine Learning: Decision Trees, Naive Bayes classification, Logistic Regression, Neural Networks, Support Vector Machines, Clustering Algorithms and PCA
R Packages: Dplyr, caret, data. table, reshape, ggplot2, quantMod, sqldf, ggmap, ggvis, dplyr, fselector, lattice, randomForest, rpart, lm, glm, nnet, xgboost, ksvm, lda, qda, adabag, adaboost, lars and lasso
Python packages: Numpy, Pandas, Scikit-learn, SciPy, matplotlib, Keras, TensorFlow
Big Data Technologies: Apache Hadoop, Hive and Spark
Senior Machine Learning Engineer
Skill set: Machine Learning, Python, R, Groovy, SQL, Tableau, Diffbot API, NLP, Deep Learning
- Design, Deploy and maintain Machine learning models in production systems
- Data analysis, cleaning, sampling and data preprocessing for Machine Learning Algorithms
- Worked on Tools, scripts to fine tune, Monitor the Model results
- Analyze, visualize and present the insights to the management
- Experienced in Text mining, Natural Language and worked on health-related datasets
- Saved a total of 500,000 USD in human effort automation
- Worked on Document relevance, topic modelling, keyword extraction, summarization and NER on Patent corpus
- Web scrapping and information extraction using DiffBot API
- Working on Big data related technologies including Apache Spark, Hadoop, Databricks
Senior Data Scientist
Confidential, Franklin, TN
Skill Set: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer
- Contribute to Finance and Risk management, Operations management, and Marketing; and maximize ROI on advanced big data analytics technology
- Worked on commercial data mining tools such as Splunk, R, Map reduced, Yarn, Pig,Hive, Floop, Oozie, Scala, HBase, Master HDFS, Sqoop, Spark, Scala (Machine learning tool )
- Design, model, validate and test statistical algorithms against various real-world data sets including behavioral data and deploy models in the backend (batch) and cloud (streaming)
- Apply different Machine Learning algorithms/methods on data sets to predict credit risk, fraud detection, customer churn, and target marketing
- Work on data to increase cross & up-sell revenues, enhance customer value or reduce non-credit losses.
- Understand OCR process& contribute implementing NLP to identify, extract, summarize, and reduce or categorize the relevant qualitative financial input information like sentiment/feedback/news according to specific structures (templates) from a source text (digital news) to support decision making
- Perform sentiment analysis using Naive Bayes algorithm and gathering insights from large volume of unstructured/text data
- Generation of TLFs and summary reports, etc. ensuring on-time quality delivery
- Apply different Clustering techniques as exploratory data analysis tools to develop geodemographic customer segmentation models. Run Decision Tree algorithms on credit risk model
- Leverage AWS Cloud Computing for Reproducible Research to support ad-hoc and exploratory Q & A such as back test investment strategies faster by spreading work out across machines
- Build anti-fraud models using Neural Network techniques that perform pattern recognition& anomaly detection and generate risk-prone alerts across incident management lifecycle, and optimize anti-fraud counter measures. Experience working with highly confidential information
- Conduct market research& track external trends relative to specific products, geo-demographic segments for effective collaborative-& content-based recommendation systems
- Building, publishing& scheduling customized interactive reports& dashboards using Tableau Server
- Access big data and apply predictive analytic techniques and visualize analysis outcomes such as patterns, anomalies and future trends by using Tableau
- Create multiple workbooks, dashboards, and charts using calculated fields, quick table calculations, Custom hierarchies, sets& parameters to meet business needs
- Responsible for working with stakeholders to troubleshoot issues, communicate to team members, leadership and stakeholders on findings to ensure models are well understood and optimized
Confidential, Wilmington, DE
Skill Set: R (dplyr, caret, ggplot2), Python (Numpy, Pandas, PySpark, Scikit-learn, MatplotLib, NLTK), NLP, MS SQL Server, Tensorflow, Theano, Caffe, ETL, Shiny, h2o, Oracle, Java, Tableau, Supervised & Unsupervised Learning
- Responsible for predictive analysis of credit scoring to predict whether or not credit extended to a new or an existing applicant will likely result in profit or losses
- Improved classification of bank authentication protocols by 20% by applying clustering methods on transaction data using Python Scikit-learn locally, and Spark MLlib on production level
- Data was extracted extensively by using SQL queries and used R, Python packages for the Data Mining tasks
- Performed Exploratory Data Analysis, Data Wrangling and development of algorithms in R and Python for data mining and analysis
- Implemented Natural Language Processing (NLP) methods and pre-trained word2vec models for the improvement of in-app search functionality
- Research on Reinforcement learning and control (Tensorflow, Torch) and machine learning model (Scikit-learn)
- Used Python based data manipulation and visualization tools such as Pandas, Matplotlib, Seaborn to clean corrupted data before generating business requested reports
- Developed extension models relying on but not limited to Random forest, logistic, Linear regression, Stepwise, Support Vector machine, Naive Bayes classifier, ARIMA/ETS model, K-Centroid clusters
- Extracted data from HDFS and prepared data for exploratory analysis using Data Munging
- Extensively used R packages like (GGPLOT2, GGVIS, CARET, DPLYR) on huge data sets
- Used R, Python programming languages to graphically analyses the data and perform data mining
- Explored 5 supervised Machine Learning algorithms (Regression, Random Forest, SVM, Decision tree, Neural Network) and used parameters such as Precision/Adjusted R-Squared /residual splits to select the winning model of the 5 different models
- Developed Tableau, Shiny based interactive dashboards from various databases to present to business team for data visualization purpose
Confidential, Wayne, PA
- Analyzed the association of buying behaviors from larger databases of market and track buying patterns using linear regression models
- Ensured successful delivery of the projects in the portfolio/program as per the agreed timeline, cost & quality
- Worked with project and product team to deliver quality product
- Managed multiple projects using AGILE or standard waterfall based on agreed development methodology
- Worked on Python scripts to help our team internally with data management
- Development of Ganglia clusters monitoring plugins in Python
- Having good experience in NLP with Apache, Hadoop and Python
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms
- Interacted and managed customer expectation and escalation program and project level
- Created/monitored overall program plan from initiation to closer or release
- Program governance, execution & stakeholder management
- Prepared and reviewed the estimation, project plan and revenue planner
- Conducted weekly status meeting with respective project team to review project status and provide appropriate status to higher management
- Identified risks, issues and dependencies Confidential program level with proper mitigation/action plan
- Portfolio finance management - Revenue forecasting, growth, profitability & revenue recognition
- Resource Management - Staffing projections, utilization and attrition management
- Project prioritization and roadmap planning
- Site/Page optimization, Promotional campaigns assessment, Customer segmentation and assessing revenue against targets Confidential regular intervals.
- Developed visualizations using sets, Parameters, Calculated Fields, Dynamic sorting, Filtering, Parameter driven analysis, gathered data from different data marts
- Reporting designs based on the business specific problems, Reporting implementation on Tableau to suggest business ways to improve their customer experience both in terms of sales and user interface
- Ability to use dimensionality reduction techniques and regularization techniques
- Expert in data flow between primary DB and various reporting tools
- Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly
- Proficient in requirement gathering, writing, analysis, estimation, use case review, scenario preparation, test planning and strategy decision making, test execution, test results analysis, team management and test result reporting
Confidential, PREORIA, IL
- Research machine learning algorithms and implement by tailoring to particular business needs and tested on large datasets.
- Worked closely with business, data governance, SMEs and vendors to define data requirements.
- Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
- Designed the prototype of the Data mart and documented possible outcome from it for end-user.
- Involved in business process modeling using UML
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
- Implementing Spark Mlibutilities such as including classification, regression, clustering, collaborative filtering and dimensionality reduction.
- Manipulating/mining data from database tables (Redshift, Oracle, Data Warehouse)
- Create automated metrics using complex databases.
- Providing analytical network support to improve quality and standard work result.
- Root cause research to identify process breakdowns within departments and providing data through use of various skill sets to find solutions to breakdown
- Foster culture of continuous engineering improvement through mentoring, feedback, and metrics
- Broad knowledge of programming, and scripting (especially in R / Java / Python)
- Implemented Event Task for execute Application Automatically.
- Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.