We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • 8+ years of work experience in Data Science R/SAS/Python. Expertise in analyzing data and building predictive models to help provide intelligent solutions domain.
  • Experience in working on Windows, Linux and UNIX platforms including programming and debugging skills in UNIX Shell Scripting.
  • Flexible with Unix/Linux and Windows Environments, working with Operating Systems like Centos5/6, Ubuntu 13/14, Cosmos.
  • Defining job flows in Hadoop environment - using tools like Oozie for data scrubbing and processing.
  • Experience in Data migration from existing data stores to Hadoop.

TECHNICAL SKILLS:

  • Data Analytics Tools: Python (numpy, scipy, pandas, Gensim, Keras), R (Caret, Weka, ggplot), MATLAB.
  • Analysis & Modelling Tools: Erwin, Sybase Power Designer, Oracle Designer, Erwin, Rational Rose, ER/Studio, TOAD, MS Visio, SAS, Django, Flask, pip, NPM, Node JS, Spring MVC.
  • Data Visualization; Tableau, Visualization packages, Microsoft Office.:
  • Machine Learning: Simple Linear Regression, Multivariate Linear Regression, Regression, Classification, Clustering,Association, Polynomial Regression, Decision Trees, Random Forest, Logistic Regression, Softmax, K-NN, K-MeansKernel SVM, Gradient Descent, Backprop, Feed Forward ANN, CNN, RNN and Word2Vec.
  • Machine Learning Frameworks: Spark ML, Kafka, Spark MiLB, Scikit-Learn & NLTK.
  • Big Data Tools: Hadoop, Map Reduce, SQOOP, Pig, Hive, NOSQL, Spark, Apache Kafka, Shiny, Yarn, Data Frames, pandas, ggplot2, Sklearn, Theano, Cuda, Azure, HD Insight, etc.
  • ETL Tools: Informatica Power Centre, Data Stage 7.5, Ab Initio, Talend.
  • OLAP Tools: MS SQL Analysis Manager, DB2 OLAP, Cognos Power-play.
  • Programming Languages: SQL, PL/SQL, T-SQL, XML, HTML, UNIX Shell Scripting, Microsoft SQL Server, Oracle PLSQL, Python, Scala, C, C++, AWK, JavaScript.
  • R Package: dplyr, sqldf, data table, Random Forest, gbm, caret, elastic net and all sort of Machine Learning Packages.
  • Databases: Oracle, Teradata, DB2 UDB, MS SQL Server, Netezaa, Sybase ASE, Informix, AWS RDS, Cassandra, and MongoDB, Postgre SQL.
  • Project Execution Methodologies: Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).
  • Tools: & Software SAS/STAT, SAS/ETS, SAS E-Miner, SPSS, R, Advance R, TOAD, MS Office, BTEQ, Teradata SQL Assistant.
  • Methodologies: Ralph Kimball, COBOL.
  • Version Control: Git, SVN.
  • Reporting Tools: Business ObjectsXIR 2/6.5/5.0/5.1, Cognos Impromptu 7.0/6.0/5.0, Informatica Analytics Delivery Platform, Micro Strategy, SSRS, Tableau.
  • Operating Systems: Windows 2007/8, UNIX (Sun-Solaris, HP-UX), Windows NT/XP/Vista, MSDOS.

PROFESSIONAL EXPERIENCE:

Confidential

Data Scientist

Responsibilities:

  • Developed a recommender system using Matchbox recommender in Azure ML, to assign top 5 Senior Agents to Agents seeking help regarding a topic. Thus, facilitated effective query handling and increased operational efficiency by 40%.
  • Performed statistical modeling and developed a model using hierarchal clustering and logistic regression, to identify employees who would need help. Thereby reduced average call handle time by 30% and enhanced customer satisfaction.
  • Achieved an accuracy of more than 90% for the predictive models for each of the projects and presented the results to the clients. Used ensemble across all models for client delivery on a monthly basis.
  • Built proficiency in the rare disease space and generated a revenue growth of at least7 million dollars for each of the clients.
  • Identified the different leading indicators/important variables based on claims, physician and demographic level data. Used dimension reduction based on mean decrease in accuracy, PCA and checked for co linearity to reduce the number of variables from 6500 variables to 20 variables.
  • Experienced in working with high dimensional claims and third party data sets (274 million rows and 6500 columns).
  • Built efficient SQL queries for data mining, data preparation and cleaning.
  • Built chi-square test to compare the values between different groups of severe heart attacks (STEMI) and non-severe (NSTEMI) based on age, gender, ethnicity, geo graphic location, insurance method, BMI index, in hospital procedures etc. Conducted ANOVA to compare the values between different groups and within levels. Used Wilcox test to compare the medians between different groups, calculated the risk ratio between different groups.
  • Managed a 6 member team to build predictive models, conduct statistical analysis and defined KPI’sfor patient journey (demographics, co-morbidity, payer, physician, line of therapy analysis) to help clients make decisions.
  • Designed appropriate reports, visualization and written analyses for clients using R, MSExcel and PowerPoint.
  • Extracted meaningful analyses, interpreted raw data, conducted quality assurance and provided meaningful conclusions and recommendations to the clients based on the data results.
  • Conducted and knowledge sharing session’s for the offshore and onsite team members, interns on various analytical, statistical testing, machine learning concepts and tools.
  • Performed social network analysis and topic modeling in R, on employee chat data, and develop Sankey plot to understand the communication paths, the strength of relations between Agents and the topics frequently discussed between them
  • Analyzed employee behavior and performance data, and developed Shiny dashboards to evaluate team preparedness through metrics, which helped evaluate leadership skills, agent experiences, agent behavior and customer sentiments
  • Developed SQL procedures to synchronize the dynamic data generated from GTID systems with the Azure SQL Server.
  • Creation of intelligent benchmarks for claims KPIs using machine learning to reduce the noise in the existing alert framework
  • Time series forecasting using combination of methodologies to forecast the future values of KPI with dynamic tolerance limits based on the historical pattern
  • Process automation using Python/R scripts with Oracle database to generate and write the results in the production environment on weekly basis
  • Intelligent matching of truck delay data & work order data
  • Root cause analysis using text mining of work order description to find reason behind machine breakdown and the failed part(s) involved
  • Sequence mining to identify pattern of machine breakdown

Environment: R 9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Vision, Rational Rose.

Confidential , Chicago, Illinois

Data Scientist

Responsibilities:

  • Machine Learning Projects based on Python, SQL, Spark and SAS advanced programming. Performed data exploratory, data visualizations, and feature selections
  • Applications of machine learning algorithms, including random forest and boosted tree, SVM, SGD, neural network, and deep learning using CNTK and Tensorflow.
  • Performed data analysis, natural language processing, statistical analysis, generated reports, listings and graphs.
  • Big data analytics with Hadoop, HiveQL, SparkRDD, and SparkSQL.
  • Tested Python/SAS on AWS cloud service and CNTK modeling on MS-Azure cloud service.
  • Built prediction models of major subsurface properties for underground image, geologic interpretation and drilling decisions.
  • Utilized advanced methods of big data analytics, machine learning, artificial intelligence, wave equation modeling, and statistical analysis. Provided exclusive summary on oil/gas seismic data and well profiles, conduct predictive analyses and data mining to support interpretation and operations.
  • Cross-correlation based data analysis method through Python and Matlab on multi-offset-well to help predict the models and pore-pressure ahead a little for real time drilling. Bigdata modeling with in corporation of seismic, rock physics, statistical analysis, well logs and geological information into the 'beyond image'.
  • Using Python, developing, operationalizing, and productionizing machine learning models to make significant impact on the geological pattern identification and subsurface model prediction. Analyzing seismic and log data with sub-group analysis (classification-clustering) and model prediction methods (regression, decision tree, generic programming etc.).
  • Use SAS statistical regression method and SAS/REG polynomial simulation in Excel to simulate the anisotropic trend as 1D depth functions. Validate the simulated function by image quality of depth migration.
  • Tested the migrated data processing system on Google Cloud with velocity model updating tasks.
  • ETL to convert unstructured data to structured data and import the data to Hadoop HDFS. Utilized MapR as a low-risk big data solution to build a digital oilfield. Efficiently integrated and analyzed the data to increase drilling performance and interpretation quality. Analyzed sensors and well log data in HDFS with HiveQL and prepare for prediction learning models.
  • Constantly monitored the data and models to identify the scope of improvement in the processing and business. Manipulated and prepared the data for data visualization and report generation. Performed data analysis, statistical analysis, generated reports, listings and graphs.
  • Co-leader of mathematics community 2015, SchlumbergerEureca.
  • Accomplished customer segmentation using K-means algorithm in R, based on behavioral and demographic tendencies, for improving campaigning strategies. This helped reduce marketing expenses by 10% and helped boost client’s revenue
  • Built customer lifetime value prediction model using historical telecom data in SAS to better serve high priority customers through loyalty bonus, personalized services and draft customer retention plans and strategies
  • Developed PLSQL procedures and functions to automate billing operations, customer barring and number generations
  • Redesigned the workflows of Service request, Bulk service orders using UNIXCron jobs and PL/ SQL procedures, thereby reduced order processing time and average slippages per month dropped by 40%.

Environment: : SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Hadoop, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.

We'd love your feedback!