Principal Data Scientist / Business Intelligence Consultant-sme Resume
SUMMARY:
- DATA MINING & ANALYTICS, BUSINESS INTELLIGENCE & DATA VISUALISATION, BIG DATA
- Hortonworks (Hadoop), Apache Spark, Python, R, SAS Base & Enterprise, SSAS, Rapid Miner, AWS, Azure, Pentaho Analytics, Spotfire, Tableau, Qlikview, Orange, SSRS, SPSS, Matlab, Mathematica, Maple, PerformancePoint, Excel, Salesforce, Google Analytics, PowerBI, Azure ML Studio, Looker, Platfora, Zoomdata, Arcadia data
- Exploratory Data Analysis—used dynamic visualizations, advanced statistical techniques and core data mining capabilities to quickly identify relationships and opportunities for outreach.
- Model Development and Deployment—Streamline data mining processes to create highly accurate descriptive and predictive analytic models based on large volumes of data.
- Algorithms for Data Mining, Machine Learning, Credit Analytics and Risk Analytics—Random forest - Decision Trees, Neural Networks, Clustering Algorithms, Naive Bayes, SVM, Linear and Logistic Regression Analysis.
- Design, build and implement dashboards (KPI’s, Metrics) for critical & strategic management decisions and operations for C-Level Execs, SVPs, VPs, Directors & managers of different lines of business.
- Teradata, ODW, SSAS, Informatica, ODI, SSIS, Excel, CloverETL, TalenD, SQL Server, Adeptia, Pentaho, Paxata, Amazon Redshift, EMR, Kafka
- Building, implementing & documenting ETL packages, transformations, sessions, workflows & data mappings, change data capture, slowly changing dimensions & XML.
- Toad Data Modeler, Erwin, SQL Profiler, SSAS, Microstrategy, Visio, Excel, Spotfire, Tableau, Visio, ER Data Architect
- Building & implementing data marts, data cubes (OLAP), schemas, & logical data models.
- Visio, Excel, SharePoint, Word, InfoPath, Project, Oracle PeopleSoft, Web based applications, PowerPoint, Project, MS Dynamics, SAP, Jira, Confluence
- Building workflows & process documentation, creating user stories & epics, business process modeling, lean six sigma QA, user acceptance testing, documenting business requirements, business process reengineering, technical support, project management & implementation, expert systems & best practices research.
- Document sprints & programming unit tasks for AGILE software application development (SCRUM).
- Installing, configuring, maintaining, developing & implementing SAP, Microsoft Dynamics, and Oracle PeopleSoft.
- Generate, QA and validate reports daily, weekly, monthly & quarterly.
- Develop & implement automated scheduled reports.
- Write complex sql queries, stored procedures and user defined functions for adhoc & scheduled reports.
CORE TECHNOLOGIES & COMPETENCIES:
- BIG DATA, MODELING, MACHINE LEARNING, DATA MINING & DATA VISUALIZATION
- Data Wrangling / Machine Learning / Hadoop - Hive / Azure ML / SkyTree / GoodData
- Consolidating structured and unstructured data from disparate data sources to build data products and eventually deploy or integrate solution with other applications in production systems.
- Tuning, fitting and optimizing models; features engineering and deploying models via web service or BI tools.
- Design, architecture & development of analytic solutions to solve business problems.
- Implement these solutions and train analysts, developers to use the analytic tools and / or data products.
- Exploring data sets to generate patterns using time series models, cross-correlations with time lags, signal processing & filtering techniques, spectral analysis and correlograms.
- Configuring HIVE on HDFS, Horton works, Impala for big data analytics.
- Using python scripts to read & move flat files, web scraping using APIs to extract data in JSON & XML.
- Cleaning, rescaling and munging data using R, SAS & python; dimensionality reduction & normalization.
- Building models for machine learning algorithms by using several techniques to solve business problems. This would include: K-nearest neighbor, Naïve Bayes, Simple Linear Regression, Multiple Regression, Logistic Regression, Decision Trees & Neural Networks for supervised learning algorithms.
- Utilizing clustering and reinforcement algorithms for unsupervised learning models.
- Refactoring map reduce code in python and java to optimize query performance.
- Using enhanced techniques to reduce overfitting or underfitting including; Random forest.
- Leveraging natural language processing, recommender systems & network analysis to build custom data products and solve complex business problems.
- Utilizing NumPy, pandas, scikit-learn and other python and R libraries to build data products and decision engines.
- Using statistical and probability techniques, linear algebra, gradient descent, hypothesis and inference to build models for machine learning algorithms.
- Build data products by extracting data from IoT devices and do complex event processing for decision engines, predictive models and live streaming dashboards for monitoring.
- Rapid prototyping of products & solutions after analyzing business problem and going through iterations and simulations of possible solutions; thinking outside the box and challenging status quo techniques for problem solving.
- Critical thinking & domain knowledge in several industries including: Banking, Finance, Telecommunications, Oil & Gas, Pharmaceuticals, Healthcare technology, Supply chain & logistics, Marketing, Consulting, Professional Services and Information Technology.
- Pentaho Business Analytics / Platfora / Zoomdata / Arcadia Data
- Building use cases, proof of concepts and opportunity assessments for big data business intelligence tools including Platfora, Zoomdata, Arcadia, Looker, Bime, Burst, IBM Watson analytics, Pentaho Business Analytics, Spotfire, Tableau, Qliksense, Power BI & similar tools.
- Leveraging new big data business intelligence visual analytics tools designed to handle big data with simplicity and almost real time analytic capabilities.
- Connecting to hadoop (HIVE & PIG), Kafka subscriptions integrated with Spark & Storm to deliver real time dashboards and data products.
- Utilizing appropriate charts and custom visualizations in dashboards and reports to answer business questions and tell user stories.
- Installing, configuring and deployment pentaho server cluster.
- Setting up shared data sources (structured & unstructured) on the pentaho repository.
- Building reports on dashboards using dashboard designer, analyzer, interactive reports and dashboard reports.
- Setting up folders on repository library and environment to environment migrations.
- Performing all administrative tasks; access management, LDAP integration, resolving tickets and troubleshooting.
- Training developers to use the tools and features and setting up Pentaho Business Analytics center of Excellence.
- Embedding dashboards in other applications using API’s.
- Designing & implementing self-service and data discovery business intelligence and analytics semantic layer.
- Spotfire SME / Python / R / MATLAB / JavaScript
- Extending the spotfire platform using Ironpython, R, S+, Spotfire SDK and JavaScript.
- Building complex advanced visualizations all available chart types and custom JavaScript or ironpython charts.
- Proficient with advanced custom expressions.
- Scripting OVER, Statistical, Spatial, Ranking, Math, Logic, Binning, Conversion, Date & Time, Text and Property functions.
- Creating and registering custom data functions in R and/or S-Plus. Running SAS and MATLAB scripts through Spotfire.
- Embedding spotfire analytic data products in web portals, websites and SharePoint.
- Data visualization best practices, interactive dashboards and guided analytics.
- Advanced geomapping configuration with multilayer integration.
- Building elements, joins, procedures & infolinks in info model layer.
- Library administration, Information Designer and Administration Manager Proficiency.
- Managing licenses, setting up and configuring spotfire users (5000+).
- Deploying cluster of spotfire servers including web player server, load-balancing servers, automation services server, statistical services server and spotfire servers.
- Upgrading, patching, monitoring, ldap integration, installations and all other administration duties.
- Configuring Spotfire Application Data Services for multiple environments (Composite) including Netezza, Teradata, MS PDW, Oracle and other big data sources.
- Scheduled updates and automation services xml jobs.
- Server monitoring using geneous, splunk and creating alerts for exceptions.
- Spotfire infrastructure design and platform configuration for clusters, high availability of web player servers.
- Configuring spotfire information model by designing and developing back end stored procedures & complex queries for spotfire server information links.
- Deploying web server based dashboards in Spotfire 4.x, Spotfire 5.x., Spotfire 6.x and Spotfire 7.x
- Spotfire center of excellence standards domain knowledge.
- Training Analysts, building knowledge base, and documentation.
- Tableau SME/ R / JavaScript / AWS Cloud Integration
- Connecting with data; using the Tableau interface to effectively create powerful visualizations.
- Create calculations including string manipulation, advanced arithmetic calculations, custom aggregations and ratios, date math, logic statements and quick table calculations.
- Build advanced chart types and visualizations: bar in bar charts - bullet graphs, box and whisker plots - pareto charts, build complex calculations to manipulate data, using statistical techniques to analyze data, using parameters and input controls to give users control over certain values, implement advanced geographic mapping techniques and using custom images and geocoding to build spatial visualizations of non-geographic data, combine data sources by joining multiple tables and using data blending, make visualizations perform as well as possible by using the data engine, extracts, and using connection methods correctly, build better dashboards using techniques for guided analytics, interactive dashboard design, and visual best practices, implement efficiency tips and tricks.
- Using groups, bins, hierarchies, sorts, sets, and filters to create focused and effective visualizations.
- Using Measure Name and Measure Value fields to create visualizations with multiple measures and dimensions.
- Tableau Administrator: Windows Server monitoring of Tableau Servers (externally), and internally using Tableau Administrative view workbook.
- Tableau directory service integration using Active Directory.
- Full utilization of TABCMD and TABADMIN to do server-side auditing and administration of groups, users, sites, server status through batch scripting or simple DOS prompt commands.
- Implementing end-to-end workbook, database, trusted security strategies by leveraging ISMEMBEROF, FULLNAME, etc. so as to achieve the desired level of security required by end-users.
- Implementing user or core-based licensing strategies.
- Design and deployment of high availability, failover, and distributed Tableau configurations across multiple domains.
- SAML implementation with reverse proxy.
- Tableau JOLT for stress testing.
- Configuration of Tableau VIZQL, Background, and Data Engine processes to adjust for performance across distributed configurations.
- Using F5 load-balancing for very active Tableau servers.
- Dashboard performance recording and tuning. DirectX and browser compatibility in improving user desktop performance.
- Full utilization of Tableau in-built Postgres database server to monitor user, browser, and server activity.
SOFTWARE & WEB DEVELOPMENT, DATABASE DEVELOPMENT:
Tools: SQL Server, Eclipse, SpringSTS, Google Web Toolkit, Oracle 10g, 11g, Microsoft Visual Studio, PyCharm
SQL development: Python, JavaScript, Java, C#, C++ for custom and web based applications. J-Unit testing. Desktops, Systems, Virtualization and Network Support
Tools: VMware ESX, VSphere, Windows Server 2008, Windows XP to Windows 8 Troubleshooting, Active Directory, and Microsoft Exchange & Microsoft Office 2003 - 2013.
Configuring LAN/WAN technologies: VLANs, DHCP, DNS, VPN, Routing, Switching and RAS.
PROFESSIONAL EXPERIENCE:
Confidential
Principal Data Scientist / Business Intelligence Consultant-SME
Responsibilities:
- Deployed Azure machine learning models on data residing in hadoop cluster. Designed, architected and implemented spotfire environment and developed operational insights dashboards for go live. Migrated reports from Power BI to spotfire from different lines of business. Setting up infolinks and data connections to disparate data sources including Hive, Sql Server, Redshift, Teradata and other external data sources.
- Migrated reports from on premise BI tools (Tableau, Crystal Reports, Power BI) to Platfora and Spotfire on a hybrid cloud platform. Designed and enhanced reports and data products due to the data intensive and data driven nature of the business.
- Implemented contract and provider predictive analytics data product. This data product included a series of data products tracking contract and provider metrics, KPI’s, regression analysis with contract coverage, clinical studies, SLA’s, drug manifestations, etc using the Pentaho Analytics platform & Spotfire.
- Designed and developed operational business insights and data products for real world research data on Pentaho business analytics and hybrid cloud platform. End to end implementation of product; data modeling, data mining, ETL, database environment deployment and administration, data wrangling, data product consolidation and release with modeling algorithms for predictive and prescriptive analytics. Compliance and governance with HIPAA and other healthcare industry codes like ICD-9, 10, CPT, etc.
- Designed and developed a global supply chain perfect order data product and other finance, marketing & research data products. Redesigned and configured spotfire server and web player server clustered High-Availability platform to accommodate a 1000+ user base across North America, South America, Europe, and Asia & Africa. This included a hybrid architecture scaling up and scaling out with cloud integration.
- Redesigned and developed a global supply chain dashboard and guided analytics operational intelligence tool using spotfire.
- Designed a reporting new data model from data-warehouse (Teradata), other big data sources (structured & unstructured) and data virtualization layer (Info model-spotfire) to feed the spotfire analytic data products and dashboards with optimum performance.
- Refactoring and development of reports and dashboards using spotfire and extending the platform for extra and custom capabilities with iron-python, JavaScript and R.
- Setting up and configuring automation services for data refreshes and migrations to different spotfire server environment and deploying models to production.
- Training of developers and analysts, knowledge base documentation, setting up a spotfire BI standards center of excellence with best practices use cases well documented.
- Collaborated with data science team to integrate machine learning algorithms and predictive analytics data products within the spotfire platform.
- Planned out and executed spotfire patch and upgrade for 7.0 & 7.5
- ConocoPhillips: Designed and developed operations real-time analytic data products and dashboards for production engineering support, finance, water disposal, production optimization and consolidated data products using spotfire for business unit.
- USDA: Designed and developed predictive analytics data product using tableau for managing business loans to small business, farmers, etc. Product included weather data, census data, loan status data, etc.
- Frontier Communications: Redesigned and architected spotfire platform integrating it with AS400 systems via data virtualization platform.
- Bank of America: Integrating SAS & R into Spotfire & Tableau to build dashboards and analytic products for different lines of business. Developed automated workflows and data products end to end for Governance and compliance department. Built robust sanity check workflows and automated test using Veritas Data Insight and other custom in house tools for file usage audits and security.
- Also worked with offshore administration teams maintaining spotfire and tableau server environments.
- Pioneer member of team that established ETL and BI center of excellence. Transitioned to data science team building machine-learning algorithms, text mining and other data products to support different LOB’s and enable bank to meet SLA’s with client agreements. Served as production DBA for Oracle and MS SQL reporting databases.
Confidential
Business Systems Analyst / SQL Developer /C#, Java, .NET Developer
Responsibilities:
- SQL scripting & tuning, stored procedures, reporting, systems design, analysis & implementation, enterprise data management, requirements gathering, technical & operational support using Microsoft technology stack, SAP & Oracle ERP. Application design and development.
Confidential
Database / Application Developer
Responsibilities:
- Designed, developed & implemented ERP system. Requirements gathering & business process modeling for enhancements to ERP system.
- Logical database design for applications & maintenance of Ops prod DB.