- Above 8+ years of experience in Machine Learning, Datamining with largedatasets of Structured and Unstructureddata, Data Acquisition, DataValidation, Predictive Modeling, Data Visualization, developing different Statistical Machine Learning, experience in foundational Machine Learning Models and concepts are Regression, RandomForest, Boosting, GBM, NNs, HMMs, CRFs, MRFs, Deep Learning,expertise in transforming business requirements into analyticalmodels, designingalgorithms, buildingmodels.
- Data Mining solutions to various business problems and generating Data Visualizations using R, Python and Tableau, highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards, developing datamining and reportingsolutions that scales across massive volume of structured and unstructured data, sound knowledge of statistical learning theory with a postgraduatebackground in mathematics, experience in designingstunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
- Experience in using Statistical procedures and Machine Learning Algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further ModelBuilding, hands on experience in implementing LDA, Naive Bayes and skilled in RandomForests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neuralnetworks, Principle Component Analysis and good knowledge on Recommender Systems, developing LogicalDataArchitecture with adherence to Enterprise Architecture.
- Proficient in BusinessRequirementgathering, BusinessProcessflow, BusinessProcessModeling (BPM), Process Redesign, Business Process Reengineering, mining, Automatic and manual testing, etc. with a keen awareness of developers and end - users needs and able to deal with user groups at all levels efficiently. Experience in developing BusinessRequirementDocument (BRD) and Functional Requirement Documents (FRD).
- Experience in understanding business requirements and translating them to functional requirement specifications. Facilitated of JAD (Joint Application Design) sessions and coordinating extensive communication networks (interviews, FSR's, RADsessions, writtencorrespondence, reports, implementationrequirements, projectstatusreports, oralpresentations, e-mails, etc.) to keep executive staff and team members apprised of goals, project status, and resolving issues and conflicts.
- Experience with source control and dependency management software, including Git or Maven, experience with Linux administration, experience with Big Data, Cloud computing technologies, NoSQL systems, or Lambda architectures, experience with building complex data extraction, transformation, and loading, including ETL pipelines into structured databases, data warehouses, or data processing systems.
- Adept in statisticalprogramminglanguages like R and also Python including BigData technologies like Hadoop, Hive, skilled in using dplyr and pandas in R and python for performing exploratorydataanalysis, experience working with data modeling tools like Erwin, Power Designer and ERStudio, strong experience with R Visualization, QlikView and Tableau to use in data analytics and graphic visualization
- Experience in designing star schema, Snowflake schema for Data Warehouse, ODSarchitecture, good understanding of Teradata SQL Assistant, Teradata Administrator and data load/ export utilities like BTEQ, FastLoad, MultiLoad, and FastExport, knowledge of working with Proof of Concepts (PoC’s) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using datamunging and Teradata.
- Worked and extracted data from various database sources like Oracle, SQLServer, DB2, Regularly accessing JIRA tool and other internal issue trackers for the Project development, well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
Data Analytics Tools: Python (numpy, scipy, pandas, Gensim, Keras), R (Caret, Weka, ggplot), MATLAB.
Analysis & Modelling Tools: Erwin, Sybase Power Designer, Oracle Designer, Erwin, Rational Rose, SAS, ER/Studio, TOAD, MS Visio, Django, Flask, pip, NPM, Node JS, Spring MVC.
Elicitation Techniques: Brainstorming, Business Rule Analysis, Document Analysis, Focus Group, Interface Analysis, Interviews, Non-Functional Requirement Analysis, Observation, Prototyping, Requirement Workshops, Structured Walkthrough, Survey, User Stories.
Analytical Models: Data Flow Diagram, Data Modelling (Entity Relationship Diagram), Organizational Modelling, Process Modelling, Prototyping, Risk Analysis, Scenarios and Use Cases, Scope Modelling, Sequence Diagram, State Diagram.
Data Visualization: Tableau, Visualization packages, Microsoft Office.
Machine Learning Frameworks: Spark ML, Kafka, Spark MiLB, Scikit-Learn & NLTK.
ETL Tools: Informatica Power Centre, Data Stage 7.5, Ab Initio, Talend.
OLAP Tools: MS SQL Analysis Manager, DB2 OLAP, Cognos Power-play.
R Package: dplyr, sqldf, data table, Random Forest, gbm, caret, elastic net and all sort of Machine Learning Packages.
Databases: SQL Server, Linked Servers, DTS Packages, SSIS,, PL/ SQL, MS SQL Server, Hyperion Essbase, Teradata, DB2 UDB, Netezaa, Sybase ASE, Informix, AWS RDS, Cassandra, and MongoDB, PostgreSQL .
Tools & Software: SAS/STAT, SAS/ETS, SAS E-Miner, SPSS, R, Advance R, TOAD, MS Office, BTEQ, Teradata SQL Assistant.
Methodologies: Ralph Kimball, COBOL.
Version Control: Git, SVN.
Reporting Tools: Business ObjectsXIR 2/6.5/5.0/5.1, Cognos Impromptu 7.0/6.0/5.0, InformaticaAnalytics Delivery Platform, Micro Strategy, SSRS, Tableau.
Operating Systems: Windows 2007/8, UNIX (Sun-Solaris, HP-UX), Windows NT/XP/Vista, MSDOS.
Confidential, Minneapolis, MN
Data Scientist/ Machine Learning
- Built models using Statisticaltechniques like Bayesian HMM and Machine Learning classification models like XGBoost, SVM, and Random Forest.
- A highly immersive DataScience program involving Data Manipulation & Visualization, Web Scraping, MachineLearning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
- Setup storage and dataanalysis tools in AmazonWebServices cloud computing infrastructure.
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
- Used RMSE score, Confusion matrix, ROC, Cross validation and A/B testing to evaluate model performance in both simulated environment and real world.
- Installed and used Caffe Deep Learning Framework.
- Led discussions with users to gather business processes requirements and data requirements to develop a variety of Conceptual, Logical and Physical Data Models. Expert in Business Intelligence and Data Visualization tools: Tableau, Micro strategy.
- Used GoogleAnalytics and SEO analysis with Webmaster
- Used Tableau to automatically generate reports. Worked with partially adjudicated insurance flat files, internal records, 3rd party data sources, JSON, XML and more.
- Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
- Developed various QlikView Data Models by extracting and using the data from various sources files, DB2, Excel, FlatFiles and Bigdata.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
- Participated in all phases of datamining, datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
- DataManipulation and Aggregation from a different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
- Implemented AgileMethodology for building an internal application.
- Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of IdentitySystems.
- Good knowledge of HadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, SecondaryNameNode, and MapReduce concepts.
- As Architect delivered various complex OLAPdatabases/cubes, scorecards, dashboards and reports.
- Programmed a utility in Python that used multiple packages (scipy, numpy, pandas)
- Implemented Classification using supervised algorithms like LogisticRegression, Decision Trees, KNN, and Naive Bayes.
- Used Teradata15 utilities such as FastExport, MLOAD for handling various tasks DataMigration/ETL from OLTPSourceSystems to OLAPTargetSystems
- Experience in Hadoop ecosystem components like HadoopMapReduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
- Updated Pythonscripts to match training data with our database stored in AWSCloudSearch, so that we would be able to assign each document a response label for further classification.
- Data transformation from various resources, data organization, features extraction from raw and stored.
- Validated the machinelearning classifiers using ROCCurves and LiftCharts.
- Extracted data from HDFS and prepared data for exploratory analysis using datamunging.
Environment: ER Studio 9.7, Tableau 9.03, AWS, QlikView, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, Regression, Logistic Regression, Hadoop, NoSQL, Teradata, A/B Testing, OLTP, Random Forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce.
Confidential, Washington, DC
Data Scientist/ Data Architecture
- Responsible for performing Machine-learning techniques regression/ classification to predict the outcomes.
- Design, development, implementation and roll-out of MicroStrategy Business Intelligence applications.
- Coded R functions to interface with Caffe Deep Learning Framework.
- Working in Amazon Web Services cloud computing environment
- Used Tableau to automatically generate reports, Worked with partially adjudicated insurance flat files, internal records, 3rd party data sources, JSON, XML and more.
- Identified and evaluated various distributed machinelearning libraries like Mahout, MLLib (Apache Spark) and R.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau.
- Evaluated the performance of Various Classification and Regressional gorithms using R language to predict the future power.
- Data is extracted from several data sources such as Oracle, SQL, SQL server, Excel files, QVDs into the QlikView reporting layer and modeled for a perfect association of data.
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Involved in DetectingPatterns with UnsupervisedLearning like K-MeansClustering.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Gathering data from heterogenous sources like QVD files, Excel, text files into Qlikview environment.
- Used R and python for Exploratory Data Analysis, A/B testing, Anova test and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
- Performed Exploratory DataAnalysis and Data Visualizations using R, and Tableau.
- Perform a proper EDA, Univariate and bi-variateanalysis to understand the intrinsic effect/combined effects.
- Worked with Data governance, Data quality, Data lineage, Data architect to design various models and processes.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
- Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forwarding/ReverseEngineered Databases.
- Established Data ArchitectureStrategy, BestPractices, Standards, and Roadmaps.
- Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team.
- Performed datacleaning and imputation of missing values using R.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce.
- Take up ad-hoc requests based on different departments and locations.
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau.
- Creating customized business reports and sharing insights to the management.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, A/B Testing, QlikView, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
- Conducted analysis in assessing customer consuming behaviors and discover value of customers with RMF analysis, applied customer segmentation with clustering algorithms such as K-Means Clustering and Hierarchical Clustering.
- Collaborated with data engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from Oracle.
- Involved in managing backup and restoring data in the live Cassandra Cluster.
- Used R, Python, and Spark to develop a variety of models and algorithms for analytic purposes.
- Performed data integrity checks, data cleaning, exploratory analysis and feature engineer using R and Python.
- Developed personalized product recommendation with Machine learning algorithms, including Gradient Boosting Tree and Collaborative filtering to better meet the needs of existing customers and acquire new customers.
- Used Python and Spark to implement different machine learning algorithms, including Generalized Linear Model, Random Forest, SVM, Boosting and Neural Network.
- Evaluated parameters with K-Fold Cross Validation and optimized performance of models.
- Worked on benchmarking Cassandra Cluster using the Cassandra stress tool.
- A highly immersive Data Science program involving Data Manipulation and Visualization, Web Scraping, Machine Learning, GIT, SQL, UNIX Commands, Python programming, NoSQL.
- Worked on data cleaning, data preparation and feature engineering with Python, including Numpy, Scipy, Matplotlib, Seaborn, Pandas, and Scikit-learn.
- Identified risk level and eligibility of new insurance applicants with Machine Learning algorithms.
- Determined customer satisfaction and helped enhance customer using NLP.
- Utilized SQL and HiveQL to query, manipulate data from variety data sources including Oracle and HDFS, while maintaining data integrity.
- Performed data visualization and Designed dashboards with Tableau and D3.js and provided complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.
Environment: R, MATLAB, MongoDB, exploratory analysis, feature engineering, K-Means Clustering, Hierarchical Clustering, Machine Learning), Python, Spark (MLlib, PySpark), Tableau, Micro Strategy, SAS, Tensor Flow, regression, logistic regression, Hadoop 2.7, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML and MapReduce.
Confidential, Harrisburg, PA
- Analysed Data sources and requirements and business rules to perform logical and physical Data modelling.
- Defined the key columns for the Dimension and Fact tables of both the Warehouse and Data Mart
- Wrote MySQL queries from scratch and created views on MySQL for Tableau .
- Interacted with the End users frequently and transferred the knowledge to them
- Conducted and participated JAD sessions with the Project managers, Business Analysis Team, Finance and Development teams to gather, analyse and document the Business and reporting requirements.
- Integrate HP BSM with HP UCMDB 9.03.
- Updated existing models to integrate new functionality into an existing application.
- Conducted one-on-one sessions with business users to gather data warehouse requirements.
- Developed normalized Logical and Physical database models to design OLTP system
- Created dimensional model for the reporting system by identifying required dimensions and facts using Power Designer .
- Created DDL scripts for implementing Data Modelling changes. Created Power Designer reports in HTML, RTF format depending upon the requirement, Published Data model in model mart , created naming convention files, co-coordinated with DBAs ' to apply the data model changes.
- Used forward engineering to create a Physical Data Model with DDL that best suits the requirements from the Logical Data Model .
- Maintaining and implementing Data Models for Enterprise Data Warehouse using Power Designer
- Create and maintain Metadata, including table, column definitions
- Worked with Database Administrators,Business Analysts and Content Developers to conduct design reviews and validate the developed models.
- Responsible for defining the naming standards for data warehouse.
- Possessed strong Documentation skills and knowledge sharing among Team, conducted data modelling review sessions for different user groups, participated in sessions to identify requirement feasibility.
- Extensive experience in PL/SQL programmingStored Procedures, Functions, Packages & Triggers
- Massaged the existing model to create new logical and physical models that formed the basis for the new application.
- Used Power Designer for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Identified the most appropriate data sources based on an understanding of corporate data thus providing a higher level of consistency in reports being used by various levels of management.
- Verified that the correct authoritative sources were being used and that the extract, transform and load (ETL) routines would not compromise the integrity of the source data.
Environment: Power Designer, UNIX, Oracle, Teradata, Informix, MS Excel, Mainframes MS Visio, Rational Rose, Requisite Pro, Tableau.
- Worked with DBA to create the physical model and tables. Scheduled multiple brain storming sessions with DBAs and production support team to discuss about views, partitioning and indexing schemes, case by case for the facts and dimensions.
- Worked on IBM Information Analyser for data profiling and generating various reports
- Identified goals and objectives of projects. Created project plan by gathering the requirements from the management.
- Used Erwin Data Modeller tool for relational database and dimensional data warehouse designs.
- Requirement gathering from the users by conducting a series of meeting with the business system users to gather the requirements for reporting.
- Identification of risks in schedule and marking them in the UseCases .
- Worked on creating new tables and columns to Basel data mart.
- Extensively used StarSchema methodologies in building and designing the logical data model into Dimensional Models .
- Worked on Informatica TDM for data masking and Informatica DVO for data validation between different systems
- Worked on Trillium Data Quality tool for monitoring Production systems for Data Anomalies and resolve issues.
- Wrote complex SQL queries on Netezza and used them in lookup SQL overrides and Source Qualifier overrides.
- Extracted data from various sources like Oracle, Netezza and flat files and loaded into the target Netezza database
- Worked with data investigation , discovery and mapping tools to scan every single data record from many sources.
- Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
- Troubleshoot test scripts , SQL queries, ETL jobs, and data warehouse/data mart/data store models .
- Created the DDL scripts using ER Studio and source to target mappings to bring the data from Source to the warehouse.
- Worked with DBA to create the physical model and tables .
- Scheduled multiple brain storming sessions with DBA s and production support team to discuss about views, partitioning and indexing schemes case by case for the facts and dimensions.
- Worked on the model based volumetric analysis and data based volumetric analysis to provide accurate space requirements to the production support team.
Environment: ER Studio 8, Mainframes, Netezza, UNIX, Aginity, Informatica TDM, DVO, Information Analyzer.
- Worked with the business community to define business requirements and analyze the possible technical solutions.
- Worked extensively with SMEs in understanding and documenting their requirements.
- Translating and documenting user requirements into agile user stories.
- Co-authored Business Requirements Document (BRD) with project teams. Extracted, Discussed, and refined business requirements from business users and team members.
- Conducted workflow, process diagram and gap analysis to derive requirements for existing systems enhancements.
- Vast experience conducting and documenting the As-Is/To-Be processes and Business Process Re-engineering (BPR).
- Designed and developed Use Case Diagrams, Activity Diagrams, Sequence Diagrams, Class Diagrams and web page mock-ups using MSVisio.
- Utilized agile software development methodology SCRUM as a lightweight process we used to manage and control software and product development. Using iterative, incremental practices. SCRUM is an effective tool that significantly increased productivity and reduce the time to achieve benefits.
- Developed and implemented processes and tools for requirements gathering, analysis, planning, tracking and delivery using Rational Rose &Requisite Pro.
- Involved in creating SoftwareRequirementSpecifications (SRS) / Functional Specification Documents (FSD).
- Incorporated User Stories in different release plans for the sprints in Scrum process.
- Analyzed user problems, including automated and manual business processes and identified, researched, investigated, defined and documented business processes
- Retrieved Data using SQL queries.
- Performed Data Design, User Interface (UI) Analysis, GUI Screen Design, and Data Analysis and also helped create Data-Mapping best practices document and trained team members on Data Mapping process and tools.
- Collaborated with the QA team to ensure adequate testing of software both before and after completion, maintained quality procedures, and ensured appropriate documentation is in place.
- Helped the testing team documenting the system requirements and testing system development. Designed and implemented basic SQL queries for QA testing and report / data validation. Set up definitions and process for test phases including Product, Integration, System and User Acceptance Testing (UAT).
- Developed timelines for project, managed timelines and resources to successful completion.