Data Scientist Resume
Santa Clara, CA
SUMMARY:
- Data driven professional with more than 10 years of experience in Data Science, Machine learning algorithms, Statistical analysis, Business Intelligence, Big Data, Cloud computing & Data warehousing.
- Possess analytical knowledge on domains as Finance, Supply Chain, Retail and Web & Social Media analytics.
- Experience using analytical techniques as statistical tests, Regression, Clustering, Decision Trees, Random Forest, Discriminant Analysis, PCA, Correspondence Analysis, Time series forecasting & Text mining.
- Experience exploring, preparing, visualizing data & designing statistical models using different R packages, Python libraries and SAS procedures/functions.
- Cloud computing implementation experience using HDInsight, Azure Data Lake (COSMOS), Azure Data Factory, Azure Machine Learning & PowerShell scripting.
- Experience implementing BI solutions using Analysis Services (SSAS), dashboards, scorecards using Reporting Services (SSRS), Power BI, Tableau & Excel PowerPivot.
- Experience formulating problems with objective function of minimizing/maximizing for solving business problems related to optimization techniques as linear programming.
- Experience working on HADOOP ecosystem and its components as HDFS, Pig, Hive, Sqoop, HBase and Oozie.
- Experience working on SPARK, using pySpark & MLib.
- Possess excellent communication skills along with strong quantitative skills and analytical capabilities.
TECHNICAL SKILLS:
Technologies: Statistical Analysis, Machine Learning, Predictive Analytics, Big Data, Cloud computing, Business Intelligence & Data Warehousing.
Programming: Microsoft R, R prog., Python, pySpark, SAS, SQL, MDX, PowerShell, VBA & NoSQL.
Big Data: HADOOP, SPARK, Azure Data Lake & Azure Data Factory (ADF)
Business Intelligence: SQL Server Analysis Services (SSAS), Integration Services (SSIS), Reporting Services (SSRS), Power BI Desktop, Excel PowerPivot & Tableau.
PROFESSIONAL EXPERIENCE:
Confidential,Santa Clara,CA
Data Scientist
Responsibilities:- Identified patterns and major factors related to non - compliance leading to employee attrition.
- Captured non-compliance activity of employees (e.g. copy confidential data, detect anomalies etc.) and see patterns/trends in relation to employee attrition.
- Performed analysis & developed attrition model for prediction based on complete data in Impala. Data preparation steps followed in Impala and most tuned model implemented for prediction on newer data.
- Performed POC on installation of SparkR on Spark 1.6.0.
- Algorithms: Classification Algorithms (Bayesian Belief Networks, Decision Trees, Logistic Regression, Random Forest, XGBoost and Support Vector Machines)
- Client had to minimize part cleaning costs (testing to see if alloy coating of metals applied over parts are within specified threshold limits).
- Identify parts which frequently gets cleaned in 1st iteration; those parts to be sampled to reduce expenses of testing each & every part.
- Identify patterns for which set of metals frequently lead to multiple iterations leading to increased cleaning costs. Provided high end visuals to engineering team so they could understand flaws in the process and take corrective action.
- Algorithms: Multivariate analysis
- Client had to minimize ticket resolution time by allocation to right department, by avoiding hopping.
- Text description provided by user analyzed preparing corpus using techniques as Latent Dirichlet Allocation.
- Probabilities for each of the topics used in the model to route ticket to the most probable department.
- Algorithms: Text Mining, Latent Dirichlet Allocation (LDA) & Classification techniques.
Environment: R language, R Shiny, Spark 1.6.0 (Python), SparkR, Hadoop, Hive, Impala & Tableau.
Confidential
Big Data Analyst
Responsibilities:
- Performed POC on data sourced in ADLS can be processed in HDInsight using pipeline with Hive activity.
- Used Hive for creation of ORC formatted tables and used ADF for data orchestration to Azure database. Data copied from ADLS to Azure SQL database using ADF pipelines invoked using PowerShell scripting.
- Developed SQOOP queries to export HIVE tables to Azure SQL database scheduled using CRON tab.
- Analyzed gamification design and developed calculations in Hive.
- Metrics developed & dashboard built using Power BI, which would reward drivers for safe driving and keep checks on health of the vehicle.
Environment: HDInsight, Azure database, Azure Data Lake, Azure Data Factory, Hive, SQOOP, PowerShell & Power BI.
Confidential
Data Scientist
Responsibilities:- Performed data exploration and analysis of Sales & Marketing data using R.
- Cross-sell/Up-sell products (Microsoft range of products) to customers based on patterns identified for similar set of customers, thereby improving overall revenue.
- Developed ADF pipelines to move data from on - premise source systems to COSMOS, from COSMOS (with data transformation) to Azure Warehouse (staging), from Azure warehouse to Azure ML (for scoring) and appending scores back to the data in Azure warehouse.
- Involved in automating end-2-end model refresh process using COSMOS, Azure Data Factory (ADF) pipelines, Azure ML and PowerShell scripting.
- Developed dashboards using Power BI as per requirements.
- Algorithms: Collaborative filtering & Market Basket Analysis.
- Predict an opportunity will be win/loss based on set of parameters.
- Helped business identify opportunities which have better chances of win and accordingly identify correct resources & take corrective steps. Developed visualizations which would derive insights into the data.
- Algorithms: Decision Trees, Logistic Regression, Random Forest, Time series forecasting.
Environment: Azure Stack (HDInsight, Data Lake, ADF, AML), R scripting, PowerShell scripting & PowerBI.
Confidential
Lead BI Developer
Responsibilities:- Worked in a multi-team environment, coordinating activities with onsite team and delegating duties to team.
- Incorporated self-service BI model in cube with development of complex calculated members, implementing PII (Personnel Information Identifier) security and improved processing & query performance of the cube.
- Performing Database and ETL development per new requirements as well as actively involved in improving overall system performance by optimizing slow running/resource intensive queries.
- Developed complex SSIS packages as to automate table usage for dimension processing, so as to create multiple OLAP databases to maintain version history of security.
- Migrated fully functional OLAP cube along with its security to a Tabular model using SSAS tabular.
Environment: SQL Server 2016/2014/2012 (SSIS, SSAS, SSRS), PowerBI & Excel 2016/2013.
Confidential,Fort Lauderdale,FL
Sr. BI Developer
Responsibilities:- Developed complex SQL queries as to retrieve dimension counts from Olapquerylog to retrieve dimension usage and queries to retrieve measure count usage from profiler logs.
- Performed cube development tasks as implementing Time Intelligence using shell dimension, Security and use of Many-2-Many dimensions approach to resolve Many-2-Many discrepancies.
- Developing complex ETL packages using Integration Services (SSIS) as automated partition processing of measure groups in cubes, sending automated emails to business users for job failures and packages involving slowly changing dimensions (SCDs) using SSIS expressions and C-Sharp (C#) using scripting component.
Environment: SQL Server 2012 (SSAS, SSIS & SSRS), SQL Sentry and Excel 2013/2010.
Confidential,Redmond,WA
BI Analyst
Responsibilities:- Developed audits to ensure health of system is satisfactory and minimize any downtimes.
- Worked on a migration project in which migrated all of the SQL agent jobs, stored procedures and tables on one environment onto a new environment. Experience migrating CLR stored procedures and .Net assemblies.
- Designed OLAP cube and measures/dimensions in defining KPIs on Performance Point Server.
Environment: SQL Server 2008 R2 (SSAS, SSRS), Excel 2010, Excel Macros, VBA, Batch Files, PPS 2007.
Confidential
BI Analyst
Responsibilities:- Conducted meetings with business teams to on-board new scorecards on existing platform.
- Designed complex calculated members using MDX for Excel scorecards to meet business requirements of time/geography rollup/non-rollups different per metric.
- Worked on cube write back feature, where changes were configured directly in cube using excel macros.
Environment: SQL Server 2008 R2 (SSAS & SSIS), SharePoint 2010, Excel 2010, Excel Macros, VBA, Java Scripting.
Confidential,San Ramon,CA
BI Developer
Responsibilities:- Automated reports using Visual Basic for Applications (VBA) to zip, unzip files, automatically upload to SharePoint, send emails to users & download reports from SharePoint.
- Worked on migration project to move database objects from multiple environments into single environment.
- Interacted with business users to understand requirements and accordingly designed reports.
Environment: SQL Server 2005, Teradata v12.0, Oracle 9i, VBA & Batch Files.
Confidential,Salt Lake City
BI Developer
Responsibilities:
- Performed ETL development using Informatica and developed shell scripts in UNIX.
- Developed adhoc reports using SQL for data retrieval from Teradata.
- Fine-tuned dimensions by choosing right key attributes, avoiding deployment of unnecessary attributes, defining hierarchies for proper rollup.
Environment: SQL Server 2005 (SSIS, SSAS, SSRS), Teradata v12, Informatica 7.1.3 & Unix.
Confidential
BI Developer
Responsibilities:- Developed SSRS reports using cascaded parameters, sub-reports, drill down & drill through reports.
- Performed tuning of database for slow running SQL queries.
- Experience creating different chart types, parameterized reports with use of expressions & global variables.
Environment: SQL Server 2005 (SSRS), Excel 2007, Oracle 9i, TOAD 6.0