We provide IT Staff Augmentation Services!

Data Scientist Resume

3.00/5 (Submit Your Rating)

Bentonville, AR

SUMMARY:

  • Over 8 years of experience in Analytics, Visualization, Data modelling, Reporting.
  • Independently led analytics, visualization, meetings for business importance with clients, manage SLAs, modelling, reporting and providing actionable insights to managers and C - level executives.
  • Established classification and forecast models, automate processes, text mining, sentiment analysis, statistical models, risk analysis, platform integrations, optimization models, models to increase user experience, A/B testing using R, SAS, Python, SPSS, SAS E-miner, E-Views, tableau, etc.
  • Built data models by extracting and stitching data from various sources, integrated systems with R to cater efficient data analysis.
  • Experience using machine learning models such as random forest, KNN, SVM, logistic regressions and used packages such as ggplot, dpylr, lm, e1071, rpart, randomForest, nnet, tree, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), numpy, sci-kit learn, pandas, etc., in R, SAS and python respectively.
  • Good knowledge and understanding of web designing programming languages like HTML, CSS and JavaScript.
  • Experience in checking with the interconnection of databases with the user interface.
  • Expertise in Marketing & Customer Analytics focused on Market basket analysis, Campaign measurement, Private brand strategy, Sales forecasting, Customer segmentation and lifetime value analyses, SKU rationalization and Marketing mix modeling.
  • Developed complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.
  • Proficient in Big Data, Hadoop, Hive, MapReduce, Pig and NoSQL databases like MongoDB, HBase, Cassandra.
  • Experienced in SQL Queries and optimizing the queries in Oracle, SQL Server, DB2, PostgreSQL, Netezza and Teradata.
  • Strong experience in maintenance of PostgreSQL, Orcale, Big Data databases and updating the versions.
  • Experience in installing, configuring and maintaining the databases like PostgreSQL, Oracle, Big Data HDFS systems.
  • Experience in the analysis and execution of the company’s stress tests as a part of cycle of Dodd-Frank-Act Stress Test (DFAST).
  • Hands on experience on clustering algorithms like K-means & Medoids clustering and Predictive and Descriptive algorithms.
  • Expertise in Model Development, Data Mining, Predictive Modeling, Descriptive Modelling Data Visualization, Data Clearing and Management, and Database Management.
  • Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
  • Used DFAST Modelling and Solutions for expected loss calculations and viewing the results in a dashboard for further insights.
  • Experienced in designing star schema (identification of facts, measures and dimensions), Snowflake schema for Data Warehouse, ODS Architecture by using tools like Erwin Data Modeler, Power Designer, E-R Studio and Microsoft Visio.
  • Having continuous learning approach in Elastic Search engine Lucene/Index based search, Kibana and other new tools.
  • Expertise in Excel Macros, Pivot Tables, Vlook-ups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
  • Excellent experience on Teradata SQL queries, Teradata Indexes, Utilities such as MLOAD, TPump, Fast load and Fast Export.
  • Experiencein Data Mining, Text Mining, Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, DataImport, and Data Export.
  • Experienced in designing Architecture for Modeling a Data Warehouse by using tools like Erwin, Power Designer and E-R Studio.

TECHNICAL SKILLS:

Programming languages: Oracle PL/SQL, Python, SQL, T-SQL, Java

Scripting languages: Python (Numpy, SciPy, Pandas, Keras), R ( ggplot, Caret, Weka), JavaScript

Big Data Technologies: Big Data Hadoop, Hive, HDFS, MapReduce, Pig

Reporting Tools: BI-SSRS,SSIS, Cognos 7.0/6.0, Tableau

Tools: MS Office Suite, Scala, NLP, MariaDb, SAS, Spark MLibKibana, Elastic search packages

Databases: Oracle, PostgreSQL, Teradata, Netezza, MS SQL Server, Mongo DB, HBase, Cassendra

Operating Systems: Windows, Linux

PROFESSIONAL EXPERIENCE:

Confidential, Bentonville AR

Data Scientist

Responsibilities:

  • Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Big Data Hadoop Distributed File System and PIG to pre-process the data.
  • Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
  • Predicting store sales at Store and SKU level using linear regression model with an error of 1% in 95% of stores, using the statistical analytical tools and algorithms and helped retailer integrate the results into their sales and operations tools.
  • Built Tableau dashboards that tracked the pre and post changes in customer behavior post campaign launch; the ROI measurements helped retailer to strategically extend the campaigns to other potential markets.
  • Built forecasting model by applying ARIMA models and come up with statistical analysis on the big data.
  • Modelling and exponential smoothening for multivariate time series data.
  • Developed 11 customer segments using K-means, Gaussian mixture techniques; the clusters helped retailer understanding lifetime values and in designing strategies to boost the per-household values
  • Developed a machine learning system that predicted purchase probability at a particular offer based on customer’s real time location data and past purchase behavior; these predictions are being used for mobile coupon pushes.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Checking the back-end databases connectivity that using the JavaScript and JDBC connections to the databases.
  • Worked with data compliance teams, data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
  • Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics and processed the data using HQL (like SQL) on top of Map-reduce.
  • Created tables, sequences, synonyms, joins, functions and operators in Netezza database.
  • Created and implemented MDM data model for Consumer/Provider for HealthCare MDM product from Variant.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc., to import data into the data warehouse.
  • Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.
  • Coding new tables, views and modifications as well as Pl/PgSQL stored procedures, data types, triggers, constraints in PostgreSQL databases
  • Built and published customized interactive reports and dashboards, report scheduling using Tableau server.
  • Used SQL Loader to load data from the Legacy systems into Oracle databases using control files extensively.
  • Used Oracle External Tables feature to read the data from flat files into Oracle staging tables.

Environment: Teradata, PostgreSQL, Big Data Hadoop, HDFS, Pig, Hive, Python, MapReduce, Time series analysis, ARIMA models, MDM, SQL Server, Netezza, DB2, DFAST,Tableau, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access.

Confidential, Austin Texas

Data Scientist

Responsibilities:

  • Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
  • Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python-based distributed random forest via Python streaming.
  • Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
  • Installed and configured PostgreSQL databases and optimized postgresql.conf for the performance improvement.
  • Forecasted based on exponential smoothing, ARIMA modelling, statistical algorithms and statistical analysis and transfer function models.
  • Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build solution that optimize the quality and performance of data.
  • Demonstrated experience in design and implementation of Statistical models, Predictive and descriptive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Dataenvironments.
  • Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
  • Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
  • Developed LINUXShell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
  • Coding new tables, views and modifications as well as Pl/PgSQL stored procedures, data types, triggers, constraints.
  • Worked on customer segmentation using an unsupervised learning technique - clustering.
  • Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
  • Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.

Environment: Big DataHadoop, Mapreduce, Hive, Pig, Python, Scala, NZSQL, Teradata, PostgreSQL, Tableau, EC2, Netezza, Architecture, SAS/Graph, SAS/SQL, SAS/Access, Time-series analysis, ARIMA

Confidential, San Francisco, California

Data Science Consultant

Responsibilities:

  • Build an in-depth understanding of the problem domain and available data assets.
  • Research, design, implement, and evaluate machine learning approaches and models.
  • Perform ad-hoc exploratory statistics and data mining tasks on diverse datasets from small scale to "big data.
  • Used DFAST testing for Retrieving, maintaining, and standardizing both internal and external data is usually difficult and time consuming.
  • Participate in data architecture and engineering decision-making to support analytics.
  • Take initiative in evaluating and adapting new approaches from data science research.
  • Investigate data visualization and summarization techniques for conveying key findings.
  • Communicate findings and obstacles to stakeholders to help drive the delivery to market.
  • Developed bottom up stress test models in R for bank’s residential real estate loan portfolio.
  • Developed and automated the data manipulation process for above using stored procedures/views in SQL Server.
  • Developed the code as per the client's requirements using SQL, PL/SQL and Data Ware housing concepts.
  • Developed and updated and manipulated PostgreSQL database architecture.
  • Automated the scraping and cleaning of data from various data sources in R.
  • Developed Banks’s loss forecasting process using relevant forecasting and regression algorithms in R.
  • Delivered an interactive dashboard in Tableau to visualize 8 billion rows (1.2 TB) credit data.
  • Designed a scalable data cube structure for a 10x improvement in refresh rate.
  • Built credit risk scorecards and marketing response models using SQL and SAS. Presented results and recommendations to executives and managers from two large banks. Researched performance inference techniques (that reduce sample bias) using statistical and machine learning packages in R.
  • Designed and developed user interfaces and customization of Reports using Tableau and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
  • Integrated various relational and non-relational sources such as DB2, Teradata, Oracle, SFDC, Netezza, SQL Server, COBOL, XML and Flat Files.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Used extensively Base SAS, SAS/Macro, SAS/SQL, and Excel to develop codes and generated various analytical reports.
  • Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.
  • Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.

Environment: Oracle, PostgreSQL, SAS, SQL, PL/SQL, T-SQL, Tableau, TOAD for data analysis, MS Excel, Netezza, DFAST, CCAR.

Confidential

Data Analyst

Responsibilities:

  • Used Star Schema methodologies in building and designing the logical data model into Dimensional Models extensively.
  • Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
  • Designed Context Flow Diagrams, Structure Chart and ER- diagrams.
  • Worked on database features and objects such as partitioning, change data capture, indexes, views, indexed views to develop optimal physical data mode.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Worked with SQL Server Integration Services in extracting data from several source systems and transforming the data and loading it into ODS.
  • Worked with SME's and other stakeholders to determine the requirements to identify Entities and Attributes to build Conceptual, Logical and Physical data Models.
  • Worked with SQL, SQL PLUS, Oracle PL/SQL Stored Procedures, Triggers, SQL queries and loading data into Data Warehouse/Data Marts.
  • Worked with DBA group to create Best-Fit Physical Data Model from the Logical Data Model using Forward engineering using Erwin.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy SQL Server database systems.
  • Reviewed business requirements and analyzing data sources form Excel/Oracle SQL Server for design, development, testing, and production rollover of reporting and analysis projects.
  • Created Logical and Physical data models with Star and Snowflake schema techniques using Erwin in Data warehouse as well as in Data Mart.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle 8i/9i.
  • Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying datamismatch.
  • Designed data model, analyzed data for online transactional processing (OLTP) and Online Analytical Processing (OLAP) systems.
  • Wrote and executed customized SQL code for ad-hoc reporting duties and other tools for routine.
  • Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
  • Customized reports using SAS/MACRO facility, PROC REPORT, PROC TABULATE and PROC.

Environment: Erwin, SQL Server 2005, PL/SQL, SQL, T-SQL, ETL, OLAP, OLTP, SAS, Oracle 9i, DQ Analyzer, XML, and Clear Quest.

Confidential

Data Analyst

Responsibilities:

  • Mentor and groom project team members in the core technology areas, usage of SDLC tools and Agile.
  • Participated in data architecture and engineering decision-making to support analytics.
  • Research or experience solving Big Data, analytic problems using quantitative approaches and a proven passion for generating insights from data.
  • Automated the data extraction and manipulation processes in SQL server.
  • Developed forecasting algorithms and relevant prototypes in R.
  • Developed and automated end to end time series forecasting process for 1000’s SKUs in SAS.
  • Clustered the supply chain of a major fast moving consumer goods company customers based on volume, volatility in demand and proximity to warehouses and identified strategies to better optimize the service level to stores.
  • Worked with SQL, SQL PLUS, Oracle PL/SQL Stored Procedures, Triggers, SQL queries and loading data into Data Warehouse/Data Marts.
  • Segmented customers and products based on seasonality attributes. Recommended actions towards better inventory management based on product segments.
  • Built credit risk scorecards and marketing response models using SQL and SAS.
  • Reviewed business requirements and analyzing data sources form Excel/Oracle SQL Server for design, development, testing, and production rollover of reporting and analysis projects.
  • Developed recommender system to identify products that could potentially be sold to new customers
  • Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.
  • Work closely with product managers, developers and QA teams to create framework/process for instrumentation and tagging.

Environment: SQL, PL/SQL, Oracle, SQL Server, R, SAS, MS Excel

We'd love your feedback!