- Proficient with Data preparation such as Data Extraction, Data Cleansing, Data Validation and Exploratory Data Analysis to ensure the data quality.
- Data cleaning & Data Imputation (outlier detection, missing value treatment)
- Data Transformation (Features scaling, Features engineering)
- Statistical modeling both linear and nonlinear (logistic, linear, Naïve Bayes, decision trees, Random forest, neural networks, SVM, clustering, KNN)
- Experienced with statistics methodologies such as Time Series, Hypothesis Testing, ANOVA, and Chi - Square Test.
- Experience with statistical programming languages like R and Python 2.x/3.x including Big Data technologies like Hadoop, Hive.
- Worked at every stage of Data Science project from inception thru deployment which included
- Data Gathering & sampling of Data ex: stratified sampling, clustering etc.
- Hypothesis testing (Power Analysis, effect size, T test, ANOVA, data distribution, chi sq test)
- EDA (Descriptive stats, Inferential stats, data visualization)
- Good with Feature Engineering by implementing both Feature Selection and Feature Extraction.
- Experienced with Deep learning techniques such as Convolutional Neural Networks, Recurrent Neural Networks by using Keras and Tensorflow.
- Familiar with Recommendation System Design by implementing Collaborative Filtering, Matrix Factorization and Clustering Methods.
- Experienced with Natural Language Processing along with Topic modeling and Sentiment Analysis.
- Experienced in working with Relational DB with strong SQL skill set.
- Ability to write SQL queries for various RDBMS such as SQL Server, MySQL, Teradata and Oracle; worked on NoSQL databases such as MongoDB and Cassandra to handle unstructured data.
- Experienced with streaming database Kafka.
- In depth understanding of building and publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau and SSRS.
- Expertise in Python programming with various packages including NumPy, Pandas, SciPy and Scikit Learn.
- Proficient in Data visualization tools such as Tableau, Plotly, Python Matplotlib and Seaborn.
- Familiar with Hadoop Ecosystem such as HDFS, HBase, Hive, Pig and Oozie.
- Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, Spark ML).
- Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with big data tools, solve the data storage issue and work on deployment solution.
- Experienced in ticketing systems such as Jira/confluence and version control tools such as GitHub.
- Worked on deployment tools such as Azure Machine Learning Studio, Oozie, AWS Lambda.
- Strong understanding of SDLC in Agile methodology and Scrum process.
Confidential - Mount Laurel, NJ
- Involved in the entire data science project life cycle including data extraction, data cleansing, transform and prepare the data ahead of analysis and data visualization phase.
- Used R to manipulate data, develop and validate quantitative models.
- Cleansed the data by eliminating duplicate and inaccurate data in R and Python.
- Designed tables and implemented the naming conventions for Logical and Physical Data Models in Erwin 7.0. Used R to manipulate data, develop and validate quantitative models.
- Cleansed the data by eliminating duplicate and inaccurate data in R and Python.
- Identified the missing values, pattern recognition techniques were used to find the pattern for missing values and handled them with K-NN imputation.
- Unstructured data was scaled and combined with structured data to apply statistical methods.
- Dummy variables created for categorical variables.
- Sorted attributes into categorical and numeral variables.
- Data was visualized using different visualization (scatter plot, box plots, and histograms) techniques from ggplot2 package in R.
- Worked with ETL SQL Server Integration Services (SSIS) for data investigation and mapping to extract data and applied fast parsing and enhanced efficiency.
- Tried and implemented multiple models to evaluate predictions and performance.
- Machine learning algorithms like Random forest and logistic regression models were built on correlated data sets and emphasized on advanced algorithms like neural networks, SVM.
- Fine-tuned models to obtain more recall than accuracy. Trade-off between False Positives and False Negatives.
- Evaluated models using Recall, Precision, Cross Validation and ROC.
- Migrated data from Heterogeneous Data Sources and legacy system (DB2, Access, Excel) to centralized SQL Server databases using SQL Server Integration Services (SSIS).
- Z-score standardization, Laplace estimator and other techniques was applied on the model for performance improvement.
- Preparing the Final Documents with all the recommendations and ensure delivery to the Client before EOD.
Environment: Programming, Python, MS Excel, MS Access, SQL Server 2008, Microsoft office, SQL Server Management Studio, Power point.
Confidential - Atlanta, GA
- Data mining using state-of-the-art methods.
- Extending company's data with third party sources of information when needed.
- Enhancing data collection procedures to include information that is relevant for building analytic systems
- Processing, cleansing, and verifying the integrity of data used for analysis.
- Doing ad-hoc analysis and presenting results in a clear manner.
- Creating automated anomaly detection systems and constant tracking of its performance.
- Strong command of data architecture and data modelling techniques.
- Hands on experience with commercial data mining tools such as R and Python depending on job requirements.
- Utilizing NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.
- Knowledge in ML& Statistical libraries (e.g. Scikit-learn, Pandas).
- Having knowledge to build predictive models to forecast risks for product launches and operations and help predict workflow.
- Having experience with visualization technologies such as Tableau.
- Draw inferences and conclusions, and create dashboards and visualizations of processed data, identify trends, anomalies.
- Generation of TLFs and summary reports, etc. ensuring on-time quality delivery.
- Participated in client meetings, teleconferences and video conferences to keep track of project requirements, commitments made and the delivery thereof.
- Solved analytical problems, and effectively communicate methodologies and results.
- Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.
- Foster culture of continuous engineering improvement through mentoring, feedback, and metrics
Environment: R Programming, Python, Jupyter, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, SQL Server Management Studio, Hadoop, Business Intelligence Development Studio, SAP Business Objects and Business Intelligence.
Confidential - Charlotte, NC
- Work independently and collaboratively throughout the complete analytics project lifecycle including data extraction/preparation, design and implementation of scalable machine learning analysis and solutions, and documentation of results.
- Performed statistical analysis to determine peak and off-peak time periods for rate-making purposes
- Conducted analysis of customer data for the purposes of designing rates.
- Identified root causes of problems and facilitated the implementation of cost-effective solutions with all levels of management.
- Application of various machine learning algorithms and statistical modeling like decision trees, regression models, clustering, SVM to identify Volume using Scikit-learn package in R.
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis.
- Performed K-means clustering, Regression and Decision Trees in R.
- Partner with technical and non-technical resources across the business to leverage their support and integrate our efforts.
- Worked on Text Analytics and Naive Bayes creating word clouds and retrieving data from social networking platforms.
- Pro-actively analyze data to uncover insights that increase business value and impact.
- Prepared Data Visualization reports for the management using R.
- Hold a point-of-view on the strengths and limitations of statistical models and analyses in various business contexts and can evaluate and effectively communicate the uncertainty in the results.
- Application of various machine learning algorithms and statistical modelings like decision trees, regression models, and SVM.
- Approach analysis in multiple ways to evaluate approaches and compare results.
Environment: Python, R, SQL, and SQL Script. Regression analysis, Decision Tree, Naïve Bayes, SVM, K-Means Clustering and KNN.
Confidential - Dallas, TX
- Worked for Risk management team in identifying the risk involved in the Mortgage process by evaluating the customer and property records.
- Involved in addressing a wide range of challenging problems using techniques from applied statistics, machine learning and data mining fields.
- Involved in providing insights using Machine learning algorithms by tailoring to particular needs and evaluated on large data sets using Caret, e1071, rpart, randomForest, glmnet, gbm, mboost and arules in R.
- Integrated new tools and developed technology frameworks/prototypes to accelerate the data integration process and empower the deployment of predictive analytics by developing Spark Scala modules with R.
- Experience in descriptive statistics and hypothesis testing using Chi-square, T-test, Pearson correlation and Analysis of variance (ANOVA).
- Advanced knowledge of statistical techniques in Sampling, Probability, Multivariate data analysis, PCA, and Time-series analysis using SAS.
- Experience in transferring and managing data to Hadoop clusters using Kafka, Sqoop, Oozie and Zookeeper.
- Involved in fixing invalid mappings, testing of Stored Procedures and Functions, Unit and Integrating testing of Informatica Sessions, Batches, and the Target Data.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Analyzed Relational & Non-relational data using MySQL and HBase.
- Contributed Technology, Project management, and Business management functions to push the business forward with innovative solutions.
- Performed data acquisition and exploratory data analysis in R.
- Visualized team metrics and communicated to the higher management using PowerBI.
Environment: R, MySQL, Hive, HBase, Spark, PowerBI and GIT.
- Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
- Designed, implemented and automated modeling and analysis procedures on existing and experimentally created data.
- Implemented a job which leads an electronic medical record, extract data into Oracle Database and generate an output.
- Used Model Manager Option in Erwin to synchronize the data models in Model Mart approach.
- Parsed data, producing concise conclusions from raw data in a clean, well-structured and easily maintainable format. Developed clustering models for customer segmentation using Python.
- Developed data mapping documents between Legacy, Production, and User Interface Systems.
- Reverse Engineering the reports and identified Data Elements (in the source system) . Dimensions, Facts and Measures required for reports.
- Developed logical and Physical data models using ER winto design OLTP system for different applications.
- Worked with DBA group to create Best-Fit Physical Data Model with DDL from the Logical Data Model using Forward engineering.
- Analyzed the data and provide the insights about the customers using Tableau.
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW)
- Used Teradata utilities such as Fast Export, Multi LOAD for handling various tasks.
- Created dynamic linear models to perform trend analysis on customer transactional data in Python.
- Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
- Created entity process association matrices using Zachman Framework, functional decomposition diagrams and data flow diagrams from business requirements documents.
- Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Generated ad-hoc repots using Crystal Reports 9 and SQL Server Reporting Services (SSRS).
Environment: Teradata SQL Assistant, Teradata Loading utilities (Bteq, FastLoad, MultiLoad), Python, UNIX, Tableau, MS Excel, MS Power Point, Business Objects, Oracle