Data Scientist Resume
Dallas, TX
SUMMARY:
- Above 8+ years of experience in large datasets of Structured and Unstructured data, Data Visualization, Data Acquisition, Predictive modeling, NLP/NLU/NLG/AI/machine learning/Computer vision/Probabilistic Graphical Models/Inferential statistics/Graph, Data Validation.
- Hands on experience indata mining algorithms and approach.
- Good at algorithm and design techniques
- Expert level understanding in ApplicationDesign, Development and testing in Mainframeenvironmentsusing PL/1, COBOL, EGL, Easytrieve, DB2, JCL, QC &VAG.
- Experience in developing different Statistical Machine Learning, Text Analytics, Data Mining solutions to various business generating and problems data visualizations using Python, R and Tableau.
- Expertise in transforming business requirements into building models, designing algorithms, developing data mining and reporting solutions that scales across massive volume of unstructured data and structured.
- Regressionanalysis, Statisticaltest analysis, Report and Dashboard generation, Datamanagement.
- Git, Java, MySQL, MongoDB, Neo4J, AngularJS, SPSS, Tableau.
- Python, Numpy, Scikit - Learn, genism, NLTK, Tensorflow, keras.
- Experience in MachineLearning, Statistics, Regression- Linear, Logistic, Poisson, Binomial.
- Experience in designing visualizations using Tableau software and Storyline on web and desktop platforms, publishing and presenting dashboards.
- Single handed built a model to replace the job of doer in the pension sector. This model (Patent under progress) generates experience from structured data and learns through a bootstrapping mechanism new experience from unseen data.
- Single handed Built and designed a whole Information extraction botPOC for KYC extraction. This bot is using adaptive learning techniques and uses some custom supervised classifiers for entity and relation extraction
- Proficient in Machine Learning techniques ( Decision Trees, Linear, Logistics, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors ) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Hands on experience in implementing LDA, Naive Bayes and skilled in Decision Trees, Random Forests, Linear and Logistic Regression, SVM, Clustering, neural networks and good knowledge on Recommender Systems.
- Experience on advanced SAS programming techniques, such as PROC APPEND, PROC DATASETS, and PROC TRANSPOSE .
- Highly skilled in using visualization tools like Tableau, Matplotlib for creating dashboards.
- Extensive working experience with Python including Scikit-learn, Pandas and Numpy .
- Developed data variation analysis and data pair association analysis in the Bioinformatics field.
- Regularly accessing JIRA tool and other internal issue trackers for the Project development.
- Having good domain knowledge on Retail and Airlines.
- Experience in foundational machine learning models and concepts( Regression, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning ).
- Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Analyzed data using R, Perl, Hadoop and queried data using structured and unstructured databases
- Strong programming expertise Python and strong in Database SQL.
- Integration Architect & Data Scientist experience in Analytics, Big Data, SOA, ETL and Cloud technologies.
- Worked and extracted data from various database sources like Oracle, SQL Server and Teradata .
- Skilled in System Analysis, Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
- Facilitated and helped translate complex quantitative methods into simplified solutions for users.
- Knowledge of working with Proof of Concepts and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
- Solid coding and engineering skills in Machine Learning
- Experience with file systems, server architectures, databases, SQL, and data movement (ETL).
- Proficient in Python, experience building, and product ionizing end-to-end systems
- Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning
TECHNICAL SKILLS:
Exploratory Data Analysis: Univariate/MultivariateOutlier detection, Missing value imputation, Histograms/Density estimation, EDA in Tableau
Supervised Learning: Linear/Logistic Regression, Lasso, Ridge, Elastic Nets, Decision Trees, Ensemble Methods, Random Forests, Support Vector Machines, Gradient Boosting, XGB, Deep Neural Networks, Bayesian Learning
Unsupervised Learning: Principal Component Analysis, Association Rules, Factor Analysis, K-Means, Hierarchical Clustering, Gaussian Mixture Models, Market Basket Analysis, Collaborative Filtering and Low Rank Matrix Factorization
Feature Selection: Stepwise, Recursive Feature Elimination, Relative Importance, Filter Methods, Wrapper Methods and Embedded Methods
Statistical Tests: T Test, Chi-Square tests, Stationarity tests,Auto Correlation tests, Normality tests, Residual diagnostics, Partial dependence plots and Anova
Sampling Methods: Bootstrap sampling methods and Stratified sampling
Model Tuning/Selection: Cross Validation, Walk Forward Estimation, AIC/BIC Criterions, Grid Search and Regularization
Time Series: ARIMA, Holt winters, Exponential smoothing, Bayesian structural time series
Machine Learning /Deep Learning: caret, glmnet, forecast, xgboost, rpart, survival, arules, sqldf, dplyr, nloptr, lpSolve, ggplot
Python: pandas, numpy, scikit-learn, scipy, statsmodels, matplotlib,tensorflow
SAS: Forecast server, SAS Procedures and Data Steps
Spark: MLlib, GraphX
SQL: Subqueries, joins, DDL/DML statements
Databases/ETL/Query: Teradata, SQL Server, Redshift, Postgres and Hadoop (MapReduce); SQL, Hive, Pig and Alteryx
Visualization: Tableau, ggplot2 and RShiny
Prototyping: PowerPoint,RShiny and Tableau
PROFESSIONAL EXPERIENCE:
Confidential, Dallas,TX
Data Scientist
Responsibilities:
- Involved in defining the source to target data mappings, business rules, and data definitions.
- Performing data profiling on various source systems that are required for transferring data to ECH using
- Defining the list codes and code conversions between the source systems and the data mart using Reference Data Management (RDM).
- Involved in data collection and induction to Teradata
- Conducted data cleaning, data preparation, and outlier detection
- Finding insights from millions of customer chat and calls records
- Gathering requirements from business
- Reviewing business requirements and analyzing data sources
- Developed predictive models for sales and Finance teams using various ML and DL algorithms
- Utilizing Informatica toolset (InformaticaData Explorer, and Informatica Data Quality) to analyze legacy data for Data Profiling.
- Worked on DTS Packages, DTS Import/Export for transferring data between SQL Server 2000 to 2005
- Involved in upgrading DTS packages to SSIS packages (ETL).
- Involved in Training and Testing the ML Supervised and Unsupervised models
- Researching on Deep Learning to implement NLP
- Presented to the higher management the discovered trends and analysis, forecast data, recommendations, model results and risks identified
- Performing an end to end InformaticaETL Testing for these custom tables by writing complex SQL Queries on the source database and comparing the results against the target database.
- Using HP Quality Center v 11 for defect tracking of issues.
- Involved in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
- Created and presented executive dashboards to show the patterns & trends in the data using Tableau Desktop
- Developed NLP models for Topic extraction, Sentiment Analysis
- Developed Executive Summary KPI, Key value programs, NPI dashboards in Tableau
- Created customized Calculations, Conditions and Filters (Local, Global) for various analytical reports and dashboards
- Was able to identify emerging issues using the models
- Developing & evaluating Machine Learning models
- Developed different visualizations using advanced features and deep analytics in Tableau
- Used algorithms and programming to efficiently go through large datasets and apply treatments, filters, and conditions as needed
- Developed Cross Tab, Chart, Funnel charts, Donut charts, Heat Maps, Tree Maps and Drill Through Reports, 100% stacked bar charts etc. in Tableau Desktop
- Involved in publishing, scheduling and subscriptions with Tableau Server and creating and managing users, groups, sites in Tableau Server .
- Involved in developing and testing the SQL Scripts for report development, Tableau reports, Dashboards and handled the performance issues effectively
- Tested dashboards to ensure data was matching as per the business requirements and if there were any changes in underlying data
Environment: Data Governance, SQL Server, ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, ETL, MS Office Suite - Excel(Pivot, VLOOKUP), DB2, R, Python, Visio, HP ALM, Agile, Azure, Data Quality, Tableau and Reference Data Management .
Confidential, NYData Scientist
Responsibilities:
- A highly immersive DataScience program involving DataManipulation&Visualization, Web Scraping, MachineLearning, Python programming, SQL, GIT, Unix Commands, NoSQL, MongoDB, Hadoop.
- Installed and used CaffeDeepLearningFramework
- Worked on different data formats such as JSON, XML and performed machinelearningalgorithms in Python.
- Participated in all phases of datamining; datacollection, datacleaning, developingmodels, validation, visualization and performed Gapanalysis.
- Developing Voice Bot using AI (IVR ), improving the interaction between Human and the Virtual Assistant .
- Implemented Event Task for execute Application Automatically.
- Involved in developing Patches & Updates Module.
- Setup storage and dataanalysis tools in AmazonWebServices cloud computing infrastructure.
- Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machinelearningalgorithms.
- Development and Deployment using Google Dialogflow Enterprise.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
- Data visualizationusingElasticsearch, Kibana and Logstash in python.
- Used Kibana an open source plugin for Elasticsearch in analytics and Data visualization.
- DataManipulation and Aggregation from different source using Nexus, Toad, BusinessObjects, PowerBI and SmartView.
- Implemented Agile Methodology for building an internal application.
- Extracting the source data from Oracle tables, MS SQL Server, sequential files and excel sheets.
- Migrating Informatica mappings from SQL Server to Netezza Foster culture of continuous engineering improvement through mentoring, feedback, and metrics
- Broad knowledge of programming, and scripting (especially in R / Java / Python)
- Developing and maintaining Data Dictionary to create metadata reports for technical and business purpose.
- Predictive modeling using state-of-the-art methods
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
- Parse and manipulate raw, complex data streams to prepare for loading into an analytical tool.
- Build and maintain dashboard and reporting based on the statistical models to identify and track key metrics and risk indicators.
- Proven experience building sustainable and trustful relationships with senior leaders
- Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
- Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
- Rapid model creation in Python using pandas, numpy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.
- Attained good knowledge in Hadoop Data Lake Implementation and HADOOP Architecture for client business data management.
- Extracted data from HDFS and prepared data for exploratory analysis using datamunging
Environment: ER Studio 9.7, Tableau 9.03, AWS, Teradata 15, MDM, GIT, Unix, Python 3.5.2,, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, OLTP, random forest, OLAP, HDFS, ODS, NLTK, SVM, JSON, XML, MapReduce, Google Dialog Flow.
Confidential, New JerseyData Analyst/Data Scientist
Responsibilities:
- Assisting business by being able to deliver a machine learning project from beginning to end, aggregating and exploring data, building and validating predictive models and deploying completed models to deliver business impacts to the organization
- Created data modeling and data mapping document containing source, formulate transformational rules to populate target fields
- Created impact & gap analysis documents specifying changes introduced as part of the program and lead the business process team
- Work with big data consultants to analyze, extract, normalize and label relevant data using Statistical modeling techniques like Logistic regression, decision trees, Support vector machine, Random forest, Naive Bayes and neural networks
- Developed ETLs for data sources used in production reporting for marketing and operations teams.
- Write SQL queries to perform data analysis, data modeling and prepare data mapping documents to explain the transformation rules from source to target tables
- Led the Change Management stream of an HR/Payroll project resulting from a $16.5 billion acquisition and the formation of UTAS, created Change Management Plan, and ensured team was on target to deliver both communication and training to HR, finance and Payroll staff.
- Review business data for trends, patterns or casual analysis to assist in identifying model drift and retraining models
- Created customized reports and processes in SAS and Tableau Desktop
- Performed data analysis to create reporting requirements by specifying inclusion & exclusion criteria, conditions, business rules and data elements to be included into the report
- Scheduled and facilitated requirements gathering with HR, Payroll, finance and accounting teams to implement ADP eTime and ADP Enterprise v5 and ADP General Ledger and drove requirements for data collection and data modeling with data engineers
- Performed SQL query for data analysis and integration
- Support PMO governance activities; defining and maintaining Project Management standards.
- Responsible for generating ideas for product changes that improve key metrics
- Provided data analytics of the web-portal to the team for feedback and improvement.
Environment: Python, HTML5, CSS3, AJAX, Teradata, OLTP, random forest, OLAP, HDFS, ODS, JSON, jQuery, MySQL, NumPy, SQL Alchemy, Matplotlib,Hadoop, Pig Scripts.
Confidential, Richmond, VAData Analyst/Data Modeler
Responsibilities:
- Involved in defining the source to target data mappings, business rules, data definitions.
- Involved in defining the business/transformation rules applied for sales and service data.
- Worked with project team representatives to ensure that logical and physical ER/Studio data models were developed in line with corporate standards and guidelines.
- Define the list codes and code conversions between the source systems and the data mart.
- Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Responsible for defining the key identifiers for each mapping/interface.
- Responsible for defining the key identifiers for each mapping/interface.
- Implementation of Metadata Repository, Maintaining Data Quality, Data Cleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans
- Performed data quality in TalendOpenStudio.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Responsible for defining the functional requirement documents for each source to target interface.
- Remain knowledgeable in all areas of business operations in order to identify systems needs and requirements.
- Document the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Enterprise Metadata Library with any changes or updates.
- Generate weekly and monthly asset inventory reports.
Environment: Erwin r7.0, SQL Server 2012/2008, Windows XP/NT/2000, Oracle 10g/9i, MS-DTS, UML, UAT, SQL Loader, OOD, OLTP, PL/SQL, MS Visio, Informatica .
ConfidentialBusiness Analyst /Data Analyst
Responsibilities:
- Analyze business information requirements and model class diagrams and/or conceptual domain models.
- Managed the project requirements, documents and use cases by IBM Rational RequisitePro.
- Assisted in building an Integrated LogicalDataDesign, propose physical database design for building the data mart.
- Gather & Review Customer Information Requirements for OLAP and building the data mart.
- Responsible for defining the key identifiers for each mapping/interface
- Responsible for defining the functional requirement documents for each source to target interface.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Enterprise Metadata Library with any changes or updates.
- Document data quality and traceability documents for each source interface.
- Performed document analysis involving creation of Use Cases and Use Case narrations using Microsoft Visio, in order to present the efficiency of the gathered requirements.
- Analyzed business process workflows and assisted in the development of ETL procedures for mapping data from source to target systems.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Terra-data.
- Calculated and analyzed claims data for provider incentive and supplemental benefit analysis using Microsoft Access and Oracle SQL.
- Establish standards of procedures.
- Generate weekly and monthly asset inventory reports.
- Document all data mapping and transformation processes in the Functional Design documents based on the business requirements
Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.
ConfidentialBI Developer/Data Analyst
Responsibilities:
- Developed an Object modeling in UML for Conceptual Data Model using Enterprise Architect.
- Developed logical and Physical data models using Erwin to design OLTP system for different applications.
- Facilitated transition of logical data models into the physical database design and recommended technical approaches for good data management practices.
- Worked with DBA group to create Best-Fit Physical Data Model with DDL from the Logical Data Model using Forward engineering.
- Worked with the ETL team to document the transformation rules for data migration from OLTP to Warehouse environment for reporting purposes.
- Developed Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Performed K-means clustering, Multivariate analysis, and Support Vector Machines in R.
- Extensive system study, design, development and testing were carried out in the Oracle environment to meet the customer requirements.
- Written complex Hive and SQL queries for data analysis to meet business requirements.
- Written complex SQL queries for implementing business requirements
Environment: DB2, Teradata, SQL-Server 2008, Enterprise Architect, Power Designer, MS SSAS, Crystal Reports, SSRS, ER Studio, Lotus Notes, Windows XP, MS Excel, word and Access.