- Around 8+ years of experience in IT as Data Scientist with strong technical expertise, business experience, and communication skills to drive high - impact business outcomes through data-driven innovations and decisions.
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python, andTableau.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale acrossa massive volume of Structured and unstructured data.
- Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis and good knowledge on Recommender Systems.
- Experienced in working with various Python Integrated Development Environments like Net Beans, PyCharm, PyScripter, Spyder, PyStudio, PyDev and Sublime Text.
- Worked on several python packages like NumPy, matplotlib, Beautiful Soup, Pickle, PySide, SciPy, wxPython, PyTables etc.
- Expert in distilling vast amounts of data to meaningful discoveries at requisite depths. Ability to analyze most complex projects at various levels.
- Worked and extracted data from various database sources like Oracle, SQLServer, DB2, and Teradata.
- Well experienced in Normalization&De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Skilled in SystemAnalysis, E-R/DimensionalDataModeling, DatabaseDesign and implementingRDBMS specific features.
- The experience of working in text understanding, classification, pattern recognition, recommendation systems, targeting systems and ranking systems using Python.
- Expertise in all aspects of SoftwareDevelopmentLifecycle (SDLC) from requirement analysis, Design, Development Coding, Testing, Implementation, andMaintenance.
- Hand on working experience in machine learning and statistics to draw meaningful insights from data. I am good at communication and storytelling with data.
- Utilize analytical applications/libraries like Polly, D3JS, andTableau to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into marketing strategies that drive value.
- Hands on experience on Spark MLLib utilities such as classification, regression, clustering, collaborative filtering, dimensionality reduction
- Extensive experience in Text Analytics, developing different StatisticalMachine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python, andTableau.
- Strong knowledge of statistical methods (regression, time series, hypothesis testing, randomized experiment), machine learning, algorithms, data structures and data infrastructure.
- Proficient in statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbours) in Forecasting/Predictive Analytics, Segmentation methodologies, Regression-based models, Hypothesistesting,Factoranalysis/ PCA, Ensemble.
- Expertise in Technical proficiency in Designing, DataModelingOnline Application, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
- Good Understanding of working on Artificial Neural Networks and Deep Learning models using Theano and TensorFlow packages using in Python.
- Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu13/14, Cosmos.
Programming & Scripting Languages: C, C++, JAVA, PL/SQL.
Databases: MS-Access, Oracle 12c/11g/10g/9i,Mysql,DB2.
Statistical Software: SPSS, R, SAS.
ETL/BI Tools: Tableau, SAS, SAS/Macro, SAS/SQL
Querying Languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL.
Statistics: Hypothetical Testing, ANOVA, Confidence Intervals, Bayes Law, MLE, Fish Information, Principal Component Analysis (PCA), Cross-Validation, correlation.
BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
Python Libraries: Beautiful Soup, NumPy, SciPy, Matplotlib, python-twitter, Pandas data frame, urllib2
Data Modeling Tools: MS Visio, Rational Rose, Erwin r9.6/9.5, ER/Studio 9.7, Star-Schema Modeling.
Confidential, Chicago, IL
Sr. Data Scientist /Machine Learning Engineer
- Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for forwarding/Reverse Engineered Databases.
- Designed algorithms to identify and extract incident alerts from a daily pool of incidents.
- Reduced redundancy among incoming incidents by proposing rules to recognize patterns.
- Scheduled searches Using Splunk tool.
- Worked with Machine learning algorithms like Regressions (linear, logistic etc...), SVMs and Decision trees.
- Worked on Data Subset in TDM. Data subset is the process of slicing a part of the Production Database and loading it into the Test Database.
- Worked on TDM prevents bug fixes and rollbacks and overall creates a more cost-efficient software deployment process. It also lowers the organization's compliance and security risks.
- Worked on Talend That can build custom components in Java and integrate them into the studio without any hassle.
- Worked on Clustering and classification of data using Machine learning algorithms.
- Good applied statistics skills, such as statistical sampling, testing, regression, etc.
- Build analytic models using a variety of techniques such as logistic regression, risk scorecards, and pattern recognition technologies.
- Work with technical and development teams to deploy models. Build Model Performance Reports and Modeling Technical Documentation to support each of the models for the product line.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
- Analyzed data from Primary and secondary sources using statistical techniques to provide daily reports.
- Estimation and Requirement Analysis of project timelines.
- Analyzed data and recommended new strategies for root cause and finding the quickest way to solve big data sets.
- Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis.
- Have knowledge ofA/B Testing, ANOVA, Multivariate Analysis, Association Rules and Text Analysis using R.
- Performed Exploratory Data Analysis using R. Also involved in generating various graphs and charts for analyzing the data using Python Libraries.
- Involved in the execution of multiple business plans and projects Ensures business needs are being met Interpret data to identify trends to go across future data sets.
- Developed interactive dashboards, created various Adhoc reports for users in Tableau by connecting various data sources.
- Used packages like dplyr, tidyr&ggplot2 in R Studio for data visualization.
- Enhancing data collection procedures to include information that is relevant for building analytic systems.
- Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.
- Improve fraud prediction performance by using random forest and gradient boosting for feature selection with PythonScikit-learn.
- Work on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
- Implement machine learning model (logistic regression, XGBoost, SVM) with PythonScikit- learn.
- Work on different data formats such as JSON, XML and applied machine learning algorithms in Python.
- Processing, cleansing, and verifying the integrity of data used for analysis
- Doing the ad-hoc analysis and presenting results in a clear manner
- Constant tracking of model performance
- Worked with Data Governance, Data quality, data lineage, Data architect to design various models.
- Designed data models and data flow diagrams using Erwin and MS Visio.
- As an Architect implemented MDM hub to provide clean, consistent data for anSOA implementation.
- Experience with common data science toolkits, such as R, Python, Spark, etc.
- Developed and designed SQL procedures and Linux shell scripts for data export/import and for converting data.
- Used Test Driven Development (TDD) for the project.
- Written SQLQueries, Stored Procedures, Triggers and functions for MySQL Databases.
- Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.
- Perform a proper EDA, Univariate and bivariate analysis to understand the intrinsic effect/combined.
- Established Data architecture strategy, best practices, standards, and roadmaps.
- Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.
- Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, spacetime.
- Interacted with the other departments to understand and identify data needs and requirements.
- Involved in the analysis of Business requirement, Design, and Development of the High level and Low-level designs, Unitand Integration testing.
Environment: R, R Studio, Splunk, SQL, MYSQL and Windows, UNIX, Python 3.5, MLLib, SAS, regression, logistic regression, NoSQL, Teradata, TensorFlow, OLTP, random forest, OLAP, HDFS, ODS.
Confidential, New Jersey
Sr. Data Scientist
- Perform Data Profiling to learn about behavior with various features of turnover before the hiring decision, when one has no on-the-job behavioral data.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Analyze Data and Performed Data Preparation by applying the historical model to the data set in AZUREML.
- Application of various machine learning algorithms and statistical Modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB.
- Performed data cleaning and feature selection using MLLib package in PySpark and working with deep learning frameworks such as Caffe, Neon etc.
- Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of through a discovery approach.
- Built Artificial Neural Network using Tensor Flow in Python to identify the customer's probability of canceling the connections. (Churn rate prediction)
- Understanding the business problems and analyzing the data by using appropriate Statistical models to generate insights.
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.
- Developed NLP models for Topic Extraction, Sentiment Analysis
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Evaluate models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like ElasticSearch, Kibana etc.
- Work with NLTK library to NLP data processing and finding the patterns.
- Categorize comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
- Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
- Addressed overfitting by implementing the algorithm regularization methods like L2 and L1.
- Use Principal Component Analysis in feature engineering to analyze high dimensional data.
- Create and design reports that will use gathered metrics to infer and draw logical conclusions from past and future behavior.
- Use MLLib, Spark's Machine learning library to build and evaluate different models.
- Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Create Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Create various types of data visualizations using Python and Tableau.
- Communicate the results with operations team for taking best decisions.
- Collect data needs and requirements by Interacting with the other departments.
Environment: Python 2.x, R, CDH5, HDFS, Hive, Linux, Spark, IBM SPSS, Tableau Desktop, SQL Server 2012, Microsoft Excel, MATLAB, Spark SQL, PySpark.
Sr. Data Scientist
- Collaborated with data engineers and operation team to implement the ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
- Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
- Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and dashboards in Tableau.
- Performed data imputation using Scikit-learn package in Python.
- Participated in features engineerings such as feature generating, PCA, feature normalization and label encoding with Scikit-learn pre-processing.
- Used Python 2.x/3.X (NumPy, SciPy, Pandas, Scikit-learn, Seaborn to develop a variety of models and algorithms for analytic purposes.
- Experimented and built predictive models including ensemble methods such as Gradient boosting trees and Neural Network by Keras to predict Sales amount.
- Conducted analysis and patterns on customers' shopping habits in a different location, different categories and different months by using time series modeling techniques.
- Used RMSE/MSE to evaluate different models' performance.
- Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.
Environment: Python 2.x/3.x, (Scikit-Learn/SciPy/NumPy/Pandas/Matplotlib/Seaborn), Tableau, Machine Learning algorithms (Random Forest, Gradient Boosting tree, Neural network by Keras), GitHub.
Confidential, Fremont, CA
- Designed, Build the Dimensions, cubes with star schema and Snow Flake Schema using SQL Server Analysis Services (SSAS)
- Participated in a JAD session with business users and sponsors to understand and document the business requirements in alignment with the financial goals of the company.
- Involved in the analysis of Business requirement, Design, and Development of the High level and Low-level designs, Unit, and Integration testing
- Performed data analysis and data profiling using complex SQL on various sources systems including Teradata, SQL Server.
- Developed the logical data models and physical data models that confine existing condition/potential status data fundamentals and data flows using ER Studio
- Reviewed and implemented the naming standards for the entities, attributes, alternate keys, and primary keys for the logical model.
- Performed second and third normalizations for ER data model of OLTP system
- Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
- Translate business and data requirements into Logical data models in support of Enterprise Data Models, ODS, OLAP, OLTP, Operational Data Structures and Analytical systems.
- Design and model the reporting data warehouse considering current and future reporting requirement
- Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.
- Worked with Data Scientist in order to create a Datamart for data science specific functions.
- Determined data rules and conducted Logical and Physical design reviews with business analysts, developers, and DBAs.
- Used External Loaders like Multi-Load, TPump and Fast Load to load data into Oracle and Database analysis, development, testing, implementation, and deployment.
- Reviewed the logical model with application developers, ETL Team, DBAs, and testing team to provide information about the data model and business requirements.
Environment : Erwin r7.0, Informatica 6.2, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro and PL/SQL.
Data Analyst /Data Modeler
- The business design work involved in establishing the reporting layouts for various reports and the frequency of report generation.
- Identifying the information needs within and across functional areas of the organization and Modeling the process in the enterprise-wide scenario.
- Field mapping work involved establishing relationships between the databases Tables, filter criteria, formulas etc., needed for the reports and managed database optimization and table-space fragmentation.
- Actively involved in the full Software Development Lifecycle (SDLC).
- Responsible for developing, implementing, and testing data migration strategy for overall project in the database using SQL 2012 as a platform with global resources.
- Developed database objects including tables, Indexes, views, sequences, packages, triggers and procedures to troubleshoot any database problems.
- Worked on Informatic PowerCenter tool - Source Analyzer, Data Warehousing designer, Mapping &Mapp let Designer and Transformation Designer and Developed Informatica mappings and tuning of mappings for better performance.
- Extracted data from different flat files, MS Excel, MS Access and transformed the data based on user requirement using Informatica PowerCenter and loaded data into the target, by scheduling the sessions.
- Used the dynamic SQL to perform some pre-and post-session task required while performing Extraction, transformationand loading.
- Designing the ETLprocess using Informatica to populate the DataMart using the flat files to Oracle database
- Created complex mappings to populate the data in the target with the required information.
- Wrote SQL Scripts and PL/SQL Scripts to extract data from Database and for Testing Purposes.
- Performed testing and QA role: Developed Test Plan, Test Scenarios and wrote SQL plus Test Scripts for execution on converted data to ensure correct ETL data transformations and controls.
- Automation of file provisioning process using UNIX, Informatica mappings and Oracle utilities.
Environment : MS Access, MS Excel, MS Visio, Oracle, Informatica Power Center, Unix, Qlik View, SDLC, SQL Server.
Data Analyst/Data Modeler
- Implemented Microsoft Visio and Rational Rose for designing the Use Case Diagrams, Class model, Sequence diagrams, and Activity diagrams for SDLC process of the application
- Worked with other teams to analyze customers to analyze parameters of marketing.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Was a part of the complete life cycle of the project from the requirements to the production support.
- Created test plan documents for all back-end database modules
- Used MS Excel, MS AccessandSQLto write and run various queries.
- Used a traceability matrix to trace the requirements of the organization.
- Recommended structural changes and enhancements to systems and databases.
- Conducted Design reviews and Technical reviews with other project stakeholders.
- Maintenance in the testing team for System testing/Integration/UAT
- Guaranteeing quality in the deliverables.
Environment : UNIX, SQL, Oracle 10g, MS Office, MS Visio.