- Above 8.5+ years of extensive IT experience in the field of Data Analysis/Business Data Analysis and Software Engineering.
- Above 5+ years of Data Scientist/Modeler with experience in Analyzing, cleaning, coding, wrangling, Implementing, testing, and maintaining database solutions using Statistical solutions.
- Experienced in Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR (Support Vector Regression), Decision Tree Regression, Random Forest Regression.
- Experienced in Machine Learning Classification Algorithms like Logistic Regression, K - NN, SVM, Kernel SVM, Naive Bayes, Decision Tree & Random Forest classification.
- Experience and expertise in machine learning based predictive modeling projects with machine learning models like Gradient Boosting, Collaborative filtering, Bayesian Methods, Random Forest, SVM, Markov Models.
- Experienced in advanced statistical analysis and predictive modeling in structured and unstructured data environment.
- Strong expertise in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Governance, Data Lineage, Data Integration, Master Data Management(MDM), Metadata Management Services, Reference Data Management (RDM).
- Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, scikit-learn, Matplotlib, Seaborn, Beautiful Soup, Rpy2, LibSVM, neurolab, NLTK.
- Hands on experience on R packages and libraries like ggplot2, Shiny, h2o, dplyr, reshape2, plotly, RMarkdown, ElmStatLearn, CA Tools etc.
- Efficiently accessed data via multiple vectors (e.g. NFS, FTP, SSH, SQL, Sqoop, Flume, Spark).
- Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in writing/documenting Technical Design Document (TDD), Functional Specification Document (FSD), Test Plans, GAP Analysis and Source to Target mapping documents.
- Experienced in Artificial Neural Networks (ANN) and Deep Learning models using Theano, Tensor flow and keras packages using Python.
- Excellent understanding of Hadoop architecture and Map Reduce concepts and HDFS Framework.
- Strong understanding of project life cycle and SDLC methodologies including RUP, RAD, Waterfall and Agile.
- Strong expertise in ETL, Data warehousing, Operational Data Store (ODS), Data Marts, OLAP and OLTP technologies.
- Experience working on BI visualization tools (Tableau, Shiny & QlikView).
- Excellent Team player and self-starter, possess good communication skills.
Data Modeling Tools: MS Visio, Erwin
Testing Tools: HP Quality Center ALM, Jira, Rally
Big Data Tools: Hadoop Stack, Apache Spark, Storm
Reporting & Visualization: Tableau, Matplotlib, Seaborn, ggplot, Crystal Reports, Cognos, Shiny
Databases: Oracle, DB2, MySQL, MS SQL Server, MS Access, Teradata
OS: UNIX, Linux, Windows, Mac
Confidential, Menomonee Falls, WI
- Building model using machine learning algorithms like logistic regression, Naïve Bayes, random forest, SVM, SVR.
- Worked on a POC project for NLP(Natural Language processing) with real-time log data.
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using R and Python packages.
- Used Python based data manipulation and visualization tools such as Pandas, Matplotlib, Seaborn to clean corrupted data before generating business requested reports
- Text analytics on review data machine learning technique in python using NLTK.
- Applied advanced machine learning algorithms including PCA, K-nearest neighbors, random forest, gradient boosting, neural network and xgboost to predict weights and labels of Higgs Boson with high accuracy.
- Building NLP service to identify and extract text features for pre-populating fields in the client's data reporting and abstraction application.
- Building the correlation model for the ecosystems using the deep machine learning techniques.
- Using Python, R and Spark to develop variety of models and algorithms for analytic purposes.
- Creating custom dashboards with Splunk tool for various applications in the Confidential ’s ecosystems for Applications, Networks, Servers and Devices.
- Built Dashboards using Splunk log data with different data events.
- Developed visualizations using sets, Parameters, Calculated Fields, Dynamic sorting, Filtering, Parameter driven analysis, gathered data from different data marts.
- Reporting designs based on the business specific problems, Reporting implementation on Tableau.
- Advanced charts, drill downs and intractability are incorporated in the reporting for different stakeholders and integrating the publishing of reports to the clients SharePoint infrastructure.
Environment: Python 3.6, Splunk, Splunk Machine Learning toolkit, Scikit-Learn, MySQL, SQL, Data Modeling, Middleware Integration, Gradient Boost, Random Forest, Git, xgboost, Tableau.
Confidential, Seattle, WA
Data Scientist/ Sr. Data Analysis
- Played a role of Data Scientist, working directly with Business Partner on Data design and sourcing to be able to leverage existing BI capabilities to scale applications for future advance analytics.
- Evaluation of machine learning algorithms and data usage for scoring models and classification.
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using Python packages.
- Understanding business models and select best approaches to improve their performance. Also, analyzing data for trends.
- Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
- Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
- Business Analysis, Requirement Gathering, Functional and Architecture for Credit Risk Rating 3.
- Implemented data collection, curation, and analysis scripts using Hadoop, PIG, HIVE technologies.
- Developed NLP service to identify and extract text features for pre-populating fields in the client's data reporting and abstraction application.
- Developed needs-based segmentation that aided management in gaining a deeper understanding of consumer behavior, these segments assisted management in development and marketing of services.
- Developed distributed data processing applications to automate data cleaning and normalization process using Hadoop Stack.
- Performed scoring and financial forecasting for collection priorities using Python and SAS machine learning algorithms.
- Used Natural Language Processing (NLP) for response modeling and fraud detection efforts for credit cards.
- Used Python, R and Spark to develop variety of models and algorithms for analytic purposes.
- Designed and build large and complex data sets, from spurious sources while thinking strategically about uses of data and how data use interacts with data design.
- Source system analysis, data analysis, analysis of integration between modules, analysis of business sense of source entities and relationships.
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python
Environment: Python 3.6, Apache Spark and Kibana, IPython, Hadoop, Kafka, PIG, HIVE, MLib, Scikit-Learn, MySQL, SQL, Data Warehouse, Data Modeling, Middleware Integration, Gradient Boost, Random Forest, xgboost, OpenCV, sklearn etc.
Confidential, Troy, MI
- Developed parsing algorithms to clean and distribute large amounts of data. On a Day to day basis used to work with datasets persisting millions of records.
- Collected and analyzed data on prospective customers and competitors
- Developed key performance indicators to monitor sales and improve cost efficiency
- Managed projects honing analytical techniques and reporting / statistical analysis skills
- Highly experienced and knowledgeable in developing analytics and statistical models as per organizational requirements and ability to produce alternate cost effective and efficient models.
- Data modeling techniques such as linear regression, logistic regression and multivariate analysis for prediction/ explanation of business outcomes
- Provide and present model results, insights and recommendation to senior management
- Well experienced in Normalization & De - Normalization techniques for optimum performance in relational and dimensional database environments.
- Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors.
- Prepared scripts to ensure proper data access, manipulation and reporting functions with python programming language.
- Identified, analyzed and interpreted trends or patterns in complex data sets using data mining tools.
- Determined customer satisfaction and helped enhance customer experience using NLP.
Environment: Python (Numpy, Pandas, PySpark, Scikit-learn, MatplotLib, NLTK), TSQL, MS SQL Server, Data Lineage, XML, R Studio, Spyder, ETL, Machine Learning, Shiny, h2o, Oracle, Teradata, Java, Tableau.
Confidential, Bloomington, IL
- Used data analysis techniques to validate business rules and identify low quality for Missing data in the existing Confidential Enterprise data warehouse (EDW).
- Work with users to identify the most appropriate source of record and profile the data required for sales and service.
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Involved in generating Test cases for property and casualty (P&C) Insurances for Different Levels of Business.
- Involved in defining the business/transformation rules applied for ICP data.
- Define the list codes and code conversions between the source systems and the data mart.
- Worked with internal architects and, extraction assisting in the development of current and target state data architectures.
- Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
- Responsible for defining the functional requirement documents for each source to target interface.
- Utilized Informatica toolset (Informatica Data Explorer, and Informatica Data Quality) to analyze legacy data for data profiling.
- Evaluated data, profiling cleansing, integration and tools (e.g. Informatica)
Environment: SQL Server, Tableau, Oracle, Python, MS-Office, Agile, Teradata, XML, SQL, Business Objects
Software Engineer Intern
- Analyzed user requirements and written requirements, technical and design specifications.
- Designed class diagrams, sequence diagrams and component diagrams using Rational XDE.
- Developed the end-of-day module thin client in Java using Swing components.
- Used design patterns: Data Access Objects, MVC and Data transfer objects.
- Developed PERL scripts for build purposes.
- Used Java Mail API to send mails to members, visitors and customer care representatives.
- Developed PDF’s file generation using Jasper Reports API and iReport tool.
- Used CSS to control the page layout, look and feel of Webpages.
- Written shell scripts that runs on AIX and LINUX as batch jobs.
- Written ANT script to create war/ear file and deploy the application into application server.
Environment: Java1.6, J2EE, JSP, Tiles, Spring, Servlet, WebSphere 5.1, WSAD, DB2 UDB, JNDI, Swing, RESTful, PERL, Ant, CSS, HTML, XML, ClearCase, Toad, JDBC, UML, Linux, OOAD and RUP