We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Dallas Tx Data Scientist Dallas, TX

SUMMARY

  • Over 8+ years of experience in Machine Learning, Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • Worked extensively on wide range of projects including NLP, Image Processing, Deep Learning, Statistical analysis, Machine Learning. Proficient in languages like Python, R, SQL, Hive, Py - Spark etc.
  • Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics.
  • Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA.
  • Experience in Business Intelligence/Data Warehousing Design and Architect, Dimension Data Modelling, ETL, OLPA Cube, Reporting and other BI tools.
  • Highly skilled in using visualization tools like R-Shiny, Tableau, Microstrategy, Power BI, ggplot2 and d3.js for creating dashboards.
  • Develop Image processing structures using OpenCV in Python, and fed into a Convolution Neural Network and machine learning models to predict the Expert(dermatologist) scores of the facial and hair features before and after L’Oréal Cosmetic application.
  • Created an Acne detection algorithm and released in app store to detect if the facial image has acne or not using opencv and CNN model using video cache.
  • SWOT analysis of the different developed models and techniques of machine learning models.
  • Analyze statistical inferences and predict customer and product behavioral dynamics.
  • Develop Machine Learning models, validate and push to production. Prediction model based projects over different expanse of projects to meet the needs of marketing & scientific teams in L’Oréal USA.
  • Experience in foundational machine learning models and concepts: regression, random forest, boosting, and deep learning.
  • Develop Machine Learning and fraud detection models to ensure smooth purchase flow in digital sales.
  • Strong experience in the Analysis, design, development, testing and Implementation of Business Intelligence.
  • Solutions using Data Warehouse/Data Mart Design, ETL, OLAP, BI, Client/Server applications.
  • Strong Data Warehousing ETL experience of using Informatica, Power Center Client tools - Mapping.
  • Designer, Repository manager, Workflow Manager/Monitor and Server tools Informatica Server, Repository server manager.
  • Proficient in the Integration of various data sources with multiple relational databases like Oracle11g /Oracle 10g/9i, MS-SQL Server, DB2, Teradata and Flat Files into the staging area, ODS, Data Warehouse and Data Mart.
  • Experience in Extracting data for creating Value Added Datasets using Python, R, SAS, Azure and SQL.
  • Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further Model Building.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining and reporting solutions.
  • Experience in applying Predictive Modeling and Machine Learning algorithms for Analytical projects.
  • Developing Logical Data Architecture with adherence to Enterprise Architecture.
  • Experience in designing stunning visualizations using Tableau software and publishing and presenting dashboards, Storyline on web and desktop platforms.
  • Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
  • Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.
  • Experience in statistical modeling Decision tress, Logistic and linear regression, neural networks, SVM, Clustering Native Bayes and experienced in supervised learning, Classification, Anomaly detection.
  • Experience in Text Analytics, data visualizations using R, Python, and Tableau and large transactional databases Teradata, Oracle, HDFS.
  • Strong experience with Big Data Ecosystems and Experience with H-base, Kafka, and Apache spark and Py-Spark.

TECHNICAL SKILLS

BigData/Hadoop Technologies: Hadoop, HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie

Languages: C, C++, HTML5, DHTML, WSDL, css3 XML, R/R Studio, SAS Enterprise Guide, SAS R, R (Caret, Weka, ggplot), Python (NumPy, SciPy, Pandas), SQL, PL/SQL, Pig Latin, HiveQL, Shell Scripting.

Cloud Computing Tools: Amazon AWS, Azure.

Databases: Microsoft SQL Server 2008 … MySQL 4.x/5.x, Oracle 10g, 11g, 12c, DB2, Teradata, Netezza

NO SQL Databases: HBase, Cassandra, MongoDB, MariaDB

Build Tools: Maven, ANT, Toad, SQL Loader, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall, UML, Design Patterns

Version Control Tools and Testing: API Git, SVM, GitHub, SVN and JUNIT

ETL Tools: Informatica Power Centre, SSIS

Reporting Tools: MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI, SSRS, Cognos7.0/6.0.

Operating Systems: All versions of UNIX, Windows, LINUX, Macintosh HD, Sun Solaris

PROFESSIONAL EXPERIENCE

Confidential - Dallas, TX

Data Scientist

Responsibilities:

  • Develop Machine Learning and fraud detection models to ensure smooth purchase flow in digital sales.
  • Support product, sales, leadership and marketing teams with insights gained from analyzing company data.
  • Analyze Images and extract financial information and credit card automatic info detection purchase workflow.
  • Achieved reduction of recurring chats and calls up to 11% and 7% respectively by filtering redundant inquiries and implementing analytics workflow for Natural language processing on the customer chats/calls and design pipeline to output chat insights from large volumes of chats data (Voice of Customer) using techniques like RNN, bag of words, topic modelling etc.
  • Responsible for performing Machine-learning techniques regression/classification to predict the outcomes.
  • Used various techniques using Python and R data structures to get the data in right format, which used by other internal applications to calculate the thresholds.
  • Utilize machine-learning algorithms such as logistic regression, multivariate regression, K-means, & Recommendation algorithms for data analysis.
  • Used Python, R, SQL to create Statistical algorithms involving Multivariate Regression, Linear Regression, Logistic Regression, PCA, Random forest models, Decision trees, Support Vector Machine for estimating the risks of welfare dependency.
  • Working on building a regression model to predict the likelihood to pay after an account becomes delinquent.
  • Data Collection, Features creation, Model Building (Linear Regression, SVM, Logistic Regression, Decision Tree, Random Forest, GBM), Evaluation Metrics, Model Serving - R, Scikit-learn, Spark SQL, Spark ML, Flask, Redshift, AWS S3.
  • Used Pandas, NumPy, Sea-born, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Responsible for design and development of advanced R/Python programs to prepare transform and harmonize data sets in preparation for modeling.
  • Developed and maintained data dictionary to create metadata reports for technical and business purpose.
  • Handled importing data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Interaction with Business Analyst, SMEs and other Data Architects to understand Business needs and functionality for various project solutions.
  • Created SQL tables with referential integrity and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Performance tuning of the database, which includes indexes, and optimizing SQL statements, monitoring the server.
  • Wrote simple and advanced SQL queries and scripts to create standard and adhoc reports for senior managers.
  • Collaborate the data-mapping document from source to target and the data quality assessments for the source data.
  • Created PL/SQL packages, Database Triggers, developed user procedures, and prepared user manuals for the new programs.
  • Participated in Business meetings to understand the business needs & requirements.
  • Prepare ETL architect & design document, which covers ETL architect, SSIS design, extraction, transformation and loading of Duck Creek data into dimensional model.
  • Provide technical & requirement guidance to the team members for ETL -SSIS design.
  • Participated in Business meetings to understand the business needs & requirements.
  • Design ETL framework and development.
  • Design Logical & Physical Data Model using MS Visio 2003 data modeler tool.
  • Participated in stakeholders meetings to understand the business needs & requirements.
  • Participated in Architect solution meetings & guidance in Dimensional Data Modeling design.
  • Coordinate and communicate with technical teams for any data requirements.

Environment: Machine learning, Fraud, NLP, ChatBot, Anomaly detection, AWS, MS Azure, Cassandra, Spark, HDFS, Hive, Pig, Linux, Python (Scikit-Learn/Scipy/Numpy/Pandas), R, SAS, SPSS, MySQL, Eclipse, PL/SQL, SQL connector, Tableau.

Confidential

Data Scientist

Responsibilities:

  • Data wrangling and scripting in Python, database cleanup in SQL, advanced model building in R/Python, and expertise in data visualization and Tableau dashboard development.
  • Develop Image processing structures using OpenCV in Python, and fed into a Convolution Neural Network and machine learning models to predict the Expert(dermatologist) scores of the facial and hair features before and after L’Oréal Cosmetic application.
  • Created a web platform called SPRINT- simulation for hair color and formula detection using RGB/LAB values to detect closest formula using clustering and simulations for research purposes.
  • Created an Acne detection algorithm and released in app store to detect if the facial image has acne or not using opencv and CNN model using video cache.
  • SWOT analysis of the different developed models and techniques of machine learning models.
  • Analyze statistical inferences and predict customer and product behavioral dynamics.
  • Develop Machine Learning models, validate and push to production. Prediction model based projects over different expanse of projects to meet the needs of marketing & scientific teams in L’Oréal USA.
  • Involves core research and developing scalable data science products for making key decisions. Following data science CRISP-DM protocols for project executions.
  • Apply statistical and econometric models on datasets to measure results and outcomes, identify causal impact, attribution and predict future performance of users or products in an agile development
  • Perform data extraction using various queries/formulas, manipulating using various analytical tools to create reports.
  • Performed data pre-processing and cleaning to prepare data sets for further statistical analysis.
  • Performed ETL operations on raw data from various sources on mainframe platform and load into Teradata platform.
  • Developed complex SQL queries and standardized reports on a daily, weekly basis to equip leaders with dashboard (Using Tableau) providing those critical metrics/insights.
  • Independently created decision-making model on Excel using functions and macros to prioritize project workloads.
  • Streamed solutions for proposed projects by creating UML diagrams on Visio by gathering business requirements.
  • Derived business intelligence solution ensuring accuracy, completeness and quality of the data in Global Sales operation systems.
  • Involving Extraction, Transformation and Loading the data from various data sources (Oracle, Teradata, and Salesforce.com) into target tables.
  • Automated Python jobs running on a daily, weekly and monthly basis.
  • Built visualizations to convey technical research findings, and to ensure alignment of datascience ideas with help of R using client use cases.
  • Developing advanced Statistical and Predictive Models to support business insights that led to fact-based decision making and Predicted trends of customer behaviors by analyzing customer data in R.
  • Predicting trends of customer behaviors by analyzing customer data in R.
  • Using SAS Data integration studio for registering the tables and creating the libraries pointing to Teradata database.
  • Creating various dashboards in Tableau Desktop & SAS visual Analytics for DataVisualization and end user ease.
  • Solutions built include social network analysis, text mining, geospatial statistical analysis (map based algorithms), and others.

Environment: Python, R, SAS 9.3/9.4, SAS DI Studio, SAS/MACRO, SAS Visual Analytics, Azure, AWS, Spark, Hive, TensorFlow, UNIX, MSSQL, Teradata, Oracle 12cR1, Salesforce.com, MS Excel, Tableau 9/10.

Confidential

Data Scientist

Responsibilities:

  • Analyse Market Research Survey Data conducted by Cambridge Research and help building a marketing and sales tool.
  • Develop and Improve the performance of the prediction models using different regularization, hyperparameter tuning to minimize the cost function.
  • Product enhancements with improved prediction metrics of the CTR models using feature engineering, outlier detection, Class Imbalance problems and missing value detection.
  • Leverage analytics to optimize real time bidding strategies on major Ad-Exchanges e.g. Google, Facebook, for ad campaigns in order to maximize revenues and profit margins for the company.
  • Implemented Logistic Regression, Linear Regression, Bagging, Boosting, Decision Trees, Clustering Algorithms, Support Vector Machines (SVM), optimization and Stochastic Processes to predict CTR and CVR models (Click through rates and Conversion rates) and to increase overall sales
  • Analyze customer user data using big data tools Hive, Pig to understand click through patterns & buying behavior that helped increasing customer engagement & conversions for E-Commerce Clients.
  • Work closely with product engineering team to propose, validate and iteratively build data-driven product features like recommendation system, Rules, Product proximity match etc.
  • Performed market research and segmentation analysis using SAS, SQL and Excel to get the potential target customers and identify the market trend and opportunities.
  • Extensively worked on developing the questioner consisting of 132 questions to understand the customer behaviours and the product penetration.
  • Worked on R Studio plotting the graphs based on survey data collected and also used Text miner to understand the customer behaviour.
  • Using Text miner on a survey data was altogether a new task for Kellogg's and the output was much useful to proceed further to implement text miner for other areas.
  • Perform multiple machine learning algorithms like KNN (K nearest neighbour), Random Forest, Decision tree, and Factor analysis to check new business opportunities and to understand the data we receive to help the marketing and sales team.
  • Worked on a special case project to generate special set of samples (Slice Sampling, Hasting algorithm) form the survey using MCMC (Monte Carlo Markov Chain) in R Studio.
  • Performed advanced querying using SAS Enterprise Guide, calculating computed columns, using filter, manipulate and prepare data for Reporting, Graphing, and Summarization, statistical analysis, finally generating SAS datasets.
  • I was also a part of Frozen food data, where I used R Studio and SAS EG to analyse multiple categories of frozen foods based on the weights assigned to each category.
  • Keeping track of project by arranging a weekly meeting with the Global Insight managers and present the weekly report on Tableau dashboard to give them a graphical view of customer segmentation.
  • Performed cluster analysis on grouping the customers based upon 56 variables using K-mean cluster analysis.
  • Responsible for handling requests from the marketing department and inquiries from the Customer Services related to statistics of existing data.
  • The output consisted of 30 individual reports and were replicated on Tableau dash boards

Environment: Python, SQL, R Studio, SAS BI, Tableau, Hive, Spark, JS, Java, Microsoft SQL, IBM Informix, visual analytics, Putty, Teradata, Oracle, MS office.

Confidential

Data Scientist

Responsibilities:

  • Analyzed high volume, high dimensional client and survey data from different sources using SAS and R.
  • Analyze the customer behavioral patterns using classification and segmentation techniques.
  • Calculate Customer Lifetime value (CLTV) and churn/attrition rates for different demographic data.
  • Analyze data and identify financial trends in customer behavior, acquisition, retention, and customer movement through the website.
  • Predicted the google search volume for keywords based on several factors using linear regression
  • Interacted with client side Business Analysts and Technical Leads for requirement analysis and to define Business and Functional Specifications for a leading Finance Company
  • Analyze structured data & identified top profile customers using Correlation & Regression Techniques.
  • Developed and implemented predictive models using Natural Language Processing Techniques and machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Designed and developed Natural Language Processing models for sentiment analysis.
  • Applied clustering algorithms i.e. Hierarchical, K-means with help of Scikit and Scipy.
  • Developed visualizations and dashboards using ggplot, Tableau
  • Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools like SQL, NoSQL.
  • Built and analyzed datasets using R, SAS, and Python (in decreasing order of usage).
  • Participated in all phases of datamining, data collection, data cleaning developing models, validation, visualization, and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI.
  • Implemented Agile Methodology for building an internal application.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • As Architect delivered various complex OLAP databases/cubes, scorecards, dashboards and reports.
  • Programmed a utility in Python that used multiple packages (scipy, numpy, pandas).
  • Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, and Naïve Bayes.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks datamigration/ETL from OLTP Source Systems to OLAP Target Systems
  • Maintenance in the testing team for System testing/Integration/UAT
  • Involved in preparation & design of technical documents like Bus Matrix Document, PPDM Model, and LDM & PDM.
  • Understanding the client business problems and analyzing the data by using appropriate Statistical models to generate insights.
  • Developed Algorithms (Data Mining Query's) to extract data from data warehouse & databases to build Rules for the Analyst & Models Team.
  • High level programming efficiency in the use of statistical modeling tools such as SAS, SPSS and R.
  • Developed predictive models using R to predict customers churn and classification of customers.
  • Worked on Shiny and R application displaying machine learning for improving the forecast of business.

Environment: Python, R 3.0, Tableau 8.0, QlikView, PL/SQL, HDFS, Teradata 14.1, JSON HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, SAS, HIVE, AWS.

We'd love your feedback!