- Over 8+ years of Experience on Machine Learning, Statistical Modelling, Predictive Modelling, Data Analytics, Data Modelling, Data Architecture, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP)
- Proficient in gathering and analyzing the Business Requirements with experience in documenting System Requirement Specifications (SRS) and Functional Requirement Specifications (FRS).
- Extensive experience in Text Analytics, developing different Statistical Machine Learning, DataMining solutions to various business problems and generating data visualizations using R, Python,andTableau.
- Experience in Extracting data for creating Value Added Datasets using Python, R, SAS, Azure and SQL to analyze the behavior to target a specific set of customers to obtain hidden insights within the data to effectively implement the project Objectives.
- Good experience of software development in Python (libraries used: Beautiful Soup, NumPy, SciPy, Matplotlib, Python - Twitter, Pandas data frame, network, Urllib2, MySQL dB for database connectivity) and IDEs - Sublime text, Spyder, PyCharm.
- Experience on Artificial Intelligence algorithms, Business Intelligence, Analytics Models (like Decision Trees, Linear&Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and Postgre SQL, Erwin.
- Extensive experienced on business intelligence (and BI technologies) tools such as OLAP, Data warehousing, reporting and querying tools, Data mining and Spreadsheets.
- Worked on different type of Python modules such as requests, boto, flake8, flask, mock and nose
- Excellent working in Big Data HadoopHortonworks, HDFS architecture, R, Python, Jupyter, Pandas, numPy, Scikit, Matplotlib, pyhive, Keras, Hive, noSQL- HBASE, Sqoop, Pig, MapReduce, Oozie, Spark MLlib.
- Efficient in developing Logical and Physical Data model and organizing data as per the business requirements using Sybase Power Designer, Erwin, ER Studio in both OLTP and OLAP applications
- Strong understanding of when to use an ODS or data mart or data warehousing.
- Experienced in employing RProgramming,MATLAB, SAS, Tableau and SQL for datacleaning, data visualization, risk analysis and predictive analytics
- Adept at using SASEnterprise suite, R, Python, and Big Data related technologies including Hadoop, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Map-Reduce and Cloudera Manager for design of business intelligence applications
- Experience in foundational machine learning models and concepts like regression, random forest, boosting and deep learning.
- Ability to provide wing-to- wing analytic support including pulling data, preparing analysis, interpreting data, making strategic recommendations and presenting to client/product teams.
- Hands on experience in Liner, Logistic Regression, K Means Cluster Analysis, Decision Tree, KNN, SVM,Random Forest, Market Basket, NLTK/Naïve Bayes, Sentiment Analysis, Text Mining/Text Analytics, TimeSeries Forecasting.
- Hands on experienced with Machine Learning, Regression Analysis, Clustering, Boosting, Classification, Principal Component Analysis and Data Visualization Tools
- Strong programming skills in a variety of languages such as Python and SQL
- Familiarity with Crystal Reports, and SSRS - Query, Reporting, Analysis and Enterprise Information Management
- Excellent knowledge on creating reports on Pentaho Business Intelligence.
- Experienced in Database using Oracle, XML, DB2, Teradata15/14, Netezza, server, Big Data and NoSQL.
- Worked with engineering teams to integrate algorithms and data into Return Path solutions
- Worked closely with other data scientists to create data driven products./
- Strong experienced in Statistical Modeling/Machine Learning and Visualization Tools
- Proficient in Hadoop, HDFS, Hive, MapReduce, Pig and NOSQL databases like MongoDB, HBase, Cassandra and expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient Machine Learning, Data/Text Mining, Statistical Analysis & Predictive Modeling.
- Experienced in Data Modeling & Data Analysis experience using Dimensional DataModeling and Relational Data Modeling, Star Schema/Snowflake Modeling, FACT& Dimensions tables, Physical & Logical Data Modeling.
Languages: HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide, SAS, R (Caret, Weka, ggplot), python
NO SQL Databases: Cassandra, HBase, MongoDB, MariaDB
Keras, Caffe, TensorFlow, OpenCV, Scikit: learn, Pandas, NumPy, Microsoft Visual Studio, Microsoft Office.
Development Tools: Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.
Packages: ggplot2, caret, dplyr, RWeka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, Matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.
Machine Learning Algorithms: Neural Networks, Decision trees, Support Vector Machines, Random forest, Convolutional Neural Networks, Logistic Regression, PCA, K- means, KNN.
Development Methodologies: Agile/Scrum, UML, Design Patterns, Waterfall
Database: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, Mongo DB, Cassandra.
Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos 7.0/6.0.
Big Data Technologies: Hadoop, Hive, HDFS, Map Reduce, Pig, Kafka.
BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.
Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies
Confidential, Littleton, CO
- Built data pipelines for reporting, alerting, and data mining. Experienced with table design and data management using HDFS, Hive, Impala, Sqoop, MySQL, Mem SQL, Grafana/Influx DB, and Kafka.
- Worked with statistical models for data analysis, predictive modeling, machine learning approaches and recommendation and optimization algorithms.
- Working in Business and Data Analysis, Data Profiling, Data Migration, Data Integration and Metadata Management Services.
- Worked extensively on Databases preferably Oracle 11g/12c and writing PL/SQLscripts for multiple purposes.
- Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest using R and Python packages.
- Worked with data compliance teams, data governance team to maintain data models, Metadata, data Dictionaries, define source fields and its definitions.
- Worked with Big Data Technologies such Hadoop, Hive, Map Reduce
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
- Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
- A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
- Performed scoring and financial forecasting for collection priorities using Python, R and SASmachinelearning algorithms.
- Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS
- Managed existing team members lead the recruiting and on boarding of a larger DataScience team that addresses analytical knowledge requirements.
- Worked directly with upper executives to define requirements of scoring models.
- Developed a model for predicting repayment of debt owed to small and medium enterprise (SME) businesses.
- Developed a generic model for predicting repayment of debt owed in the healthcare, large commercial, and government sectors.
- Created SQL scripts and analyzed the data in MS Access/Excel and Worked on SQL and SASscript mapping.
- Developed a legal model for predicting which debtors respond to litigation only.
- Created multiple dynamic scoring strategies for adjusting the score upon consumer behavior such as payment or right-party phone call.
- Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
- Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
- Attained good knowledge in Hadoop Data Lake Implementation and HADOOP Architecture for client business data management.
- Identifying relevant key performing factors; testing their statistical significance
- Above scoring models resulted in millions of dollars of added revenue to the company and a change in priorities of the entire company.
Environment: R, SQL, Python 2.7.x, SQL Server 2014, regression, logistic regression, random forest, neural networks, Topic Modeling, NLTK, SVM (Support Vector Machine), JSON, XML, HIVE, HADOOP, PIG, Sklearn, SciPy, Graph Lab, No SQL, SAS, SPSS, Spark, Hadoop, Kafka, HBase, MLib.
Confidential, Richardson, TX
- Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
- Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for Modeling.
- Identifying and executing process improvements, hands-on in various technologies such as Oracle, Informatica, and Business Objects.
- Design and develop state-of-the-art deep-learning / machine-learning algorithms for analyzing the image and video data among others.
- Develop and implement innovative AI and machine learning tools that will be used in the Risk.
- Performed the feature engineering of supervised and unsupervised machine learning models.
- Implemented SparkMLLib utilities such as including classification, regression, clustering, collaborative filtering,and dimensionality reductions
- Utilized Convolution Neural Networks to implement a machine learning image recognition componentusing TensorFlow.
- Responsible for design and development of Python programs to prepare to transform and harmonize data sets in preparation for Modeling.
- Handled importing data from various data sources(GOVWIN), performed transformations using Spark, and loaded data into HDFS.
- Programmed several components for the companies follow up the product line, using C#, NUnit, SQLite, git, and WiX.
- Improve Bag-of-Words features with TF-IDF algorithm, and advanced text and NLP feature-engineering and Machine learning algorithm to process crawled data
- Interaction with Business Analyst, SME, and other Data Architects to understand Business needs and functionality for various project solutions.
- Implemented Back-propagation in generating accurate predictions.
- Designed the prototype of the Data mart and documented possible outcome from it for end-user.
- Involved in business process Modeling using UML.
- Developed and maintained data dictionary to create metadata reports for technical and business purpose.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Worked on Spark tool collaborating with ML libraries in eliminating a shotgun approach to understand customer buying patterns.
- Responsible for handling Hive queries using Spark SQL that integrates with Spark environment.
- Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS,andPL/SQL.
- Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions,and Data Formats
- Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
- Used Python and Spark to implement different machine learning algorithms including Generalized Linear Model, SVM, Random Forest, Boosting and Neural Network
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Wrote simple and advanced SQL queries and scripts to create standard and Adhoc reports for senior managers.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Participated in Business meetings to understand the business needs & requirements.
- Prepare ETLarchitect& design document which covers ETLarchitect, SSISdesign, extraction, transformation,and loading of Duck Creek data into the dimensional model.
Environment: Python, MDM, MLLib, PL/SQL, Tableau, Git,NLP,SQL Server, MLLib, Scala NLP, SSMS, ERP, CRM, Netezza, Cassandra, SQL, PL/SQL, SSRS, Informatica, Spark, Azure, R Studio, MongoDB, JAVA, HIVE.
Confidential - Houston, TX
Data Scientist .
- Involved in extensive hoc reporting, routine operational reporting, and data manipulation to produce routine metrics and dashboards for management
- Created parameters, action filters and calculated sets for preparing dashboards and worksheets in Tableau.
- Interacting with other datascientists and architects, custom solutions for data visualization using tools like a tableau and Packages in Python.
- Involved in running Map Reduce jobs for processing millions of records.
- Written complex SQL queries using joins and OLAP functions like Count, CSUM, and Rank etc.
- The building, publishing customized interactive reports, report scheduling and dashboards using Tableauserver.
- Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
- Performing statistical data analysis and data visualization using Python.
- Worked on creating filters and calculated sets for preparing dashboards and worksheets in Tableau.
- Created data models in Splunk using pivot tables by analyzing the vast amount of data and extracting key information to suit various business requirements.
- Created new scripts for Splunk scripted input for the system, collecting CPU and OS data.
- Implemented data refreshes on Tableau Server for biweekly and monthly increments based on a business change to ensure that the views and dashboards were displaying the changed data accurately.
- Developed normalized Logical and Physical database models for designing an OLTP application.
- Knowledgeable in AWS Environment for loading data files from on prim to Redshift cluster.
- Performed SQL Testing on AWSRedshift databases.
- Developed TeradataSQLscripts using OLAP functions like rank and rank over to improve the query performance while pulling the data from large tables.
- Developed and implemented SSIS, SSRS and SSAS application solutions for various business units across the organization.
- Designed the Data Marts in dimensional data modelling using star and snowflake schemas.
- Analyzed DataSet with SASprogramming, R and Excel.
- Publish Interactive dashboards and schedule auto-data refreshes
- Maintenance of large data sets, combining data from various sources by Excel, Enterprise, and SAS Grid, Access and SQL queries.
- Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata reference tables and historical metrics.
- Design and development of ETL processes using InformaticaETL tools for dimension and fact file creation.
- Develop and automate solutions for a new billing and membership Enterprise data Warehouse including ETL routines, tables, maps, materialized views, and stored procedures incorporating Informatica and Oracle PL/SQL toolsets.
- Performed analysis of implementing Spark uses Scala and wrote spark sample programs using PySpark.
Environment: SQL/Server, Oracle 10g/11g, MS-Office, Teradata, Informatica, ER Studio, XML, R connector, Python, R, Tableau 9.2
Confidential - Franklin Lakes, NJ
Data Scientist/R Developer.
- The conducted analysis in assessing customer consuming behaviors and discover the value of customers with RMF analysis, applied customer segmentation with clustering algorithms such as K-Means Clustering and HierarchicalClustering.
- Collaborated with data engineers to implement the ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
- Involved in managing backup and restoring data in the live Cassandra Cluster.
- Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
- Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
- Developed personalized product recommendation with Machine learning algorithms, including Gradient Boosting Tree and Collaborative filtering to better meet the needs of existing customers and acquire new customers.
- Used Python and Spark to implement different machine learning algorithms, including Generalized Linear Model, RandomForest, SVM, Boosting and Neural Network.
- Evaluated parameters with K-Fold Cross Validation and optimized performance of models.
- Worked on benchmarking Cassandra Cluster using the Cassandra stress tool.
- A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, UNIXCommands, Python programming, No SQL.
- Worked on data cleaning, data preparation, and feature engineering with Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
- Identified risk level and eligibility of new insurance applicants with MachineLearning algorithms.
- Determined customer satisfaction and helped enhance customer using NLP.
- Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
- Performed datavisualization and Designeddashboards with Tableau and D3.js and provided complexreports, includingcharts, summaries, and graphs to interpret the findings to the team and stakeholders.
Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PY Spark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.
- Used SAS Proc SQL pass-throughfacility to connect to Oracle tables and created SAS datasets using various SQL joins such as left join, right join, inner join and full join.
- Performing data validation, transforming data from RDBMS oracle to SAS datasets.
- Produce quality customized reports by using PROC TABULATE, PROC REPORT Styles, and ODS RTF and provide descriptive statistics using PROC MEANS, PROC FREQ , and PROC UNIVARIATE.
- Developed SAS macros for data cleaning, reporting and to support routing processing.
- Performed advanced querying using SAS Enterprise Guide, calculating computed columns, using afilter, manipulate and prepare data for Reporting, Graphing, and Summarization, statistical analysis, finally generating SAS datasets.
- Involved in Developing, Debugging, and validating the project-specific SAS programs to generate derived SAS datasets , summary tables, and data listings according to study documents.
- Created datasets as per the approved specification collaborated with project teams to complete scientific reports and review reports to ensure accuracy and clarity.
- Experienced in working with data modelers to translate business rules/requirements into conceptual/logical dimensional models and worked with complex de-normalized and normalized data models
- Performed different calculations like Quick table calculations, Date Calculations, Aggregate Calculations, String and Number Calculations.
- Created action filters, user filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
- Used the dynamic SQL to perform some pre-and post-session task required while performing Extraction, transformation, and loading.
- Designing the ETL process using Informatica to populate the Data Mart using the flat files to Oracle database
- Expertise in Agile Scrum Methodology to implement project life cycles of reports design and development
- Combined Tableau visualizations into Interactive Dashboards using filter actions, highlight actions etc. and published them on the web.
- Gathering business requirements, creating business requirement documents ( BRD /FRD ).
- Working with themanager to prioritize requirements and preparing reports on theweekly and monthly basis.
Environmen t: SQL Server, Oracle 11g/10g, MS Office Suite, PowerPivot, Power Point, SAS Base, SAS Enterprise Guide, SAS/MACRO, SAS/SQL, SAS/ODS, SQL, PL/SQL, Visio.
Data Analyst .
- Participated in requirement gathering sessions with business stakeholders to understand the project goals and documented the business requirement documents(BRD)
- Studied the Requirements Specifications, Use Cases and analyzed the data needs of the Business users.
- Implemented conceptual and logical data models using Erwin 7.2 by adopting agile methodologies as per organization standards.
- Redesigned some of the previous models by adding some new entities and attributes as per the business requirements.
- Converted the Logical data models to Physical data models to generate DDL scripts.
- Reverse Engineered existed data models for analyzing and comparing the business process.
- Expertise in the Forward Engineering of logical models to generate the physical model using Erwin.
- Created the Logical data models using Erwin 7.2 and ensured that it follows the normalization rules and have the best possible traceability pattern.
- Migrated several models from Erwin 4.1/7.1 to ERWIN 7.2 and updated the previous naming standards.
- Extensively worked with enterprise data warehouse development by building data marts, staging,andrestaging.
- Scheduled reports for daily, weekly, monthly reports for executives, Business analyst and customer representatives for various categories and regions based on business needs using SQLServerReportingServices (SSRS)
- Worked with business users to understand metric definitions, presentation, and user needs.
Environment: Erwin, Informatica, Cognos, Oracle 9i, SQL Server 2003, SQL, MS Office, Windows 2003.