Data Scientist Resume
Redwood, CA
SUMMARY:
- Over 7+ years of experience in Analytics, Visualization, Data modelling, Reporting.
- Independently led analytics, visualization, meetings for business importance with clients, manage SLAs, modelling, reporting and providing actionable insights to managers and C - level executives.
- Established classification and forecast models, automate processes, text mining, sentiment analysis, statistical models, risk analysis, platform integrations, optimization models, models to increase user experience, A/B testing using R, SAS, Python, SPSS, SAS E-miner, E-Views, tableau, etc.
- Built data models by extracting and stitching data from various sources, integrated systems with R to cater efficient data analysis.
- Experience using machine learning models such as random forest, KNN, SVM, logostic regressions and used packages such as ggplot, dpylr, lm, e1071, rpart, randomForest, nnet, tree, PROC-(pca, dtree, corr, princomp, gplot, logistic, cluster), numpy, sci-kit learn, pandas, etc in R, SAS and python respectively.
- Good knowledge and understanding of web designing programming languages like HTML, CSS and JavaScript.
- Experience in checking with the interconnection of databases with the user interface.
- Expertise in Marketing & Customer Analytics focused on Market basket analysis, Campaign measurement, Private brand strategy, Sales forecasting, Customer segmentation and lifetime value analyses, SKU rationalization and Marketing mix modeling
- Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.
- Proficient in Big Data, Hadoop, Hive, MapReduce, Pig and NoSQL databases like MongoDB, HBase, Cassandra.
- Experienced in SQL Queries and optimizing the queries in Oracle, SQL Server, DB2, PostgreSQL, Netezza and Teradata.
- Strong experience in maintenance of PostgreSQL, Orcale, Big Data databases and updating the versions.
- Expereince in installing, configuring and maintaining the Databses like PostgreSQL, Oracle, Big Data HDFS systems.
- Experience in the analysis and execution of the company's stress tests as a part of cycle of Dodd-Frank-Act Stress Test (DFAST).
- Hands on experience on clustering algorithms like K-means & Medoids and Predictive algorithms.
- Expertise in Model Development, Data Mining, Predictive Modeling, Data Visualization, Data Clearing and Management, and Database Management.
- Expertise in applying data mining techniques and optimization techniques in B2B and B2C industries and proficient in Machine Learning, Data/Text Mining, Statistical Analysis and Predictive Modeling.
- Used DFAST Modelling and Solutions for expected loss calculations and viewing the results in a dashboard for further insights
- Experienced in designing star schema (identification of facts, measures and dimensions), Snowflake schema for Data Warehouse, ODS Architecture by using tools like Erwin Data Modeler, Power Designer, E-R Studio and Microsoft Visio.
- Having continuous learning approach in Elastic Search engine Lucene/Index based search, Kibana and other new tools.
- Expertise in Excel Macros, Pivot Tables, vlook-ups and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
- Excellent experience on Teradata SQL queries, Teradata Indexes, Utilities such as MLOAD, TPump, Fast load and Fast Export.
- Experiencein Data Mining, Text Mining, Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, DataImport, and Data Export.
- Experienced in designing Architecture for Modeling a Data Warehouse by using tools like Erwin, Power Designer and E-R Studio.
- Very good knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and Identifying Data Mismatch.
PROFESSIONAL EXPERIENCE:
Data Scientist
Confidential - Redwood,CA
Responsibilities:
- Utilized domain knowledge and application portfolio knowledge to play a key role in defining the future state of large, business technology programs.
- Provided the architectural leadership in shaping strategic, business technology projects, with an emphasis on application architecture.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
- Developed MapReduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS. Implemented a Python - based distributed random forest via Python streaming.
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Installed and configured PostgreSQL databases and optimized postgresql.conf for the performance improvement.
- Forecasted based on exponential smoothing, ARIMA modelling, statistical algorithms and statistical analysis and transfer function models.
- Conducted studies, rapid plots and using advance data mining and statistical modelling techniques to build solution that optimize the quality and performance of data.
- Demonstrated experience in design and implementation of Statistical models, Predictive models, enterprise data model, metadata solution and data life cycle management in both RDBMS, Big Dataenvironments.
- Created ecosystem models (e.g. conceptual, logical, physical, canonical) that are required for supporting services within the enterprise data architecture (conceptual data model for defining the major subject areas used, ecosystem logical model for defining standard business meaning for entities and fields, and an ecosystem canonical model for defining the standard messages and formats to be used in data integration services throughout the ecosystem).
- Analyzed large data sets apply machine learning techniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
- Worked on database design, relational integrity constraints, OLAP, OLTP, Cubes and Normalization (3NF) and De-normalization of database.
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
- Developed LINUXShell scripts by using NZSQL/NZLOAD utilities to load data from flat files to Netezza database.
- Coding new tables, views and modifications as well as Pl/PgSQL stored procedures, data types, triggers, constraints.
- Worked on customer segmentation using an unsupervised learning technique - clustering.
- Worked with various Teradata15 tools and utilities like Teradata Viewpoint, Multi Load, ARC, Teradata Administrator, BTEQ and other Teradata Utilities.
- Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for client.
- Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
Environment: Big DataHadoop, Mapreduce, Hive, Pig, Python, Scala, NZSQL, Teradata, PostgreSQL, Tableau, EC2, Netezza, Architecture, SAS/Graph, SAS/SQL, SAS/Access, Time-series analysis, ARIMA
Data Scientist
Confidential - Washington,DC
Responsibilities:
- Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Big Data Hadoop Distributed File System and PIG to pre - process the data.
- Transformed Logical Data Model to Erwin, Physical Data Model ensuring the Primary Key and Foreign Key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index Considerations.
- Predicting store sales at Store and SKU level using linear regression model with an error of 1% in 95% of stores, using the statistical analytical tools and algorithms and helped retailer integrate the results into their sales and operations tools
- Built Tableau dashboards that tracked the pre and post changes in customer behavior post campaign launch; the ROI measurements helped retailer to strategically extend the campaigns to other potential markets
- Built forecasting model by applying ARIMA models and come up with statistical analysis on the Big data.
- Modelling and exponential smoothening for multivariate time series data.
- Developed 11 customer segments using K-means, Gaussian mixture techniques; the clusters helped retailer understanding lifetime values and in designing strategies to boost the per-household values
- Developed a machine learning system that predicted purchase probability at a particular offer based on customer's real time location data and past purchase behavior; these predictions are being used for mobile coupon pushes.
- Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes, Random Forests, K-means, & KNN for data analysis.
- Checking the back-end databases connectivity that using the JavaScript and JDBC connections to the databases.
- Worked with data compliance teams, data governance team to maintain data models, Metadata, Data Dictionaries; define source fields and its definitions.
- Performed Source System Analysis, database design, data modeling for the warehouse layer using MLDM concepts and package layer using Dimensional modeling.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics and processed the data using HQL (like SQL) on top of Map-reduce.
- Created tables, sequences, synonyms, joins, functions and operators in Netezza database.
- Created and implemented MDM data model for Consumer/Provider for HealthCare MDM product from Variant.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website and managed and reviewed Hadoop log files.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.
- Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.
- Coding new tables, views and modifications as well as Pl/PgSQL stored procedures, data types, triggers, constraints in PostgreSQL databases
- Built and published customized interactive reports and dashboards, report scheduling using Tableau server.
- Used SQL Loader to load data from the Legacy systems into Oracle databases using control files extensively.
- Used Oracle External Tables feature to read the data from flat files into Oracle staging tables.
Environment: Teradata, PostgreSQL, Big Data Hadoop, HDFS, Pig, Hive, Python, MapReduce, Time series analysis, ARIMA models, MDM, SQL Server, Netezza, DB2, DFAST, Tableau, Architecture, SAS/Graph, SAS/SQL, Tableau, SAS/Connect and SAS/Access.
Data Science Consultant
Confidential - Jersy city, NJ
Responsibilities:
- Build an in - depth understanding of the problem domain and available data assets
- Research, design, implement, and evaluate machine learning approaches and models
- Perform ad-hoc exploratory statistics and data mining tasks on diverse datasets from small scale to "big data
- Used DFAST testing for Retrieving, maintaining, and standardizing both internal and external data is usually difficult and time consuming
- Participate in data architecture and engineering decision-making to support analytics
- Take initiative in evaluating and adapting new approaches from data science research
- Investigate data visualization and summarization techniques for conveying key findings
- Communicate findings and obstacles to stakeholders to help drive the delivery to market
- Developed bottom up stress test models in R for bank's residential real estate loan portfolio
- Developed and automated the data manipulation process for above using stored procedures/views in SQL Server
- Developed the code as per the client's requirements using SQL, PL/SQL and Data Ware housing concepts.
- Developed and updated and manipulated PostgreSQL database architecture.
- Automated the scraping and cleaning of data from various data sources in R
- Developed Banks's loss forecasting process using relevant forecasting and regression algorithms in R
- Delivered an interactive dashboard in Tableau to visualize 8 billion rows (1.2 TB) credit data
- Designed a scalable data cube structure for a 10x improvement in refresh rate
- Built credit risk scorecards and marketing response models using SQL and SAS. Presented results and recommendations to executives and managers from two large banks. Researched performance inference techniques (that reduce sample bias) using statistical and machine learning packages in R
- Designed and developed user interfaces and customization of Reports using Tableau and designed cubes for data visualization, mobile/web presentation with parameterization and cascading.
- Integrated various relational and non-relational sources such as DB2, Teradata, Oracle, SFDC, Netezza, SQL Server, COBOL, XML and Flat Files.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Used extensively Base SAS, SAS/Macro, SAS/SQL, and Excel to develop codes and generated various analytical reports.
- Created SSIS Packages using Pivot Transformation, Execute SQL Task, Data Flow Task, etc to import data into the data warehouse.
- Performed administrative tasks, including creation of database objects such as database, tables, and views, using SQL DCL, DDL, and DML requests.
Environment: Oracle, PostgreSQL, SAS, SQL, PL/SQL, T-SQL, Tableau, TOAD for data analysis, MS Excel, Netezza, DFAST, CCAR.
Data Analyst
Confidential
Responsibilities:
- Performed Gathered Business Requirements, interacted with the Users, Designers, Developers, Project Manager, and SMEs to get a better understanding of the Business Processes, and analyzed and optimized the process.
- Created reports containing all the information regarding the sales, traits and analytics data using Tableau.
- Tune much of the code, re - write as needed to utilize newer features, such as bulk collects/DML, function-based indices, convert from dynamic to static SQL when possible
- Responsible for ensuring data integrity using SQL and coordinating efforts with testing and implementation
- Responsible for the development and execution of test plans.
- Tracked the extensive database of customers across the country, analyzed it using SQL queries, Python and visualized data by creating crystalized reports using Tableau, IBM Cognos and MySQL
- Gathered business data requirements for the new data warehouse as per the compliance standards.
- Worked with marketing team on Customer data analysis for different geographic regions using Siebel CRM.
- Expertise in the concepts of Data Warehousing, Data Marts, Dimensional Modeling, Fact and Dimensional Tables.
- Worked with Data Architects, DBA and Development team and assisted in building data marts.
- Performed Gap Analysis, statistical analysis and facilitated data migration from legacy systems.
Environment: HTML/CSS, JS, Bootstrap, Excel/Tableau 8.1, Oracle SQL developer 4.1.5, MS Office Suite
TECHNICAL SKILLS:
Programming languages: Oracle PL/SQL, Python, SQL, T-SQL, Java
Scripting languages: Python (Numpy, SciPy, Pandas, Keras), R ( ggplot, Caret, Weka), Javascript
Big Data Technologies: Big Data Hadoop, Hive, HDFS, Mapreduce, Pig
Reporting Tools: BI-SSRS,SSIS, Cognos 7.0/6.0, Tableau
Tools: MS Office Suite, Scala, NLP, MariaDb, SAS, Spark MLib, Kibana, Elastic search packages
Databases: Oracle, PostgreSQL, Teradata, Netezza, MS SQL Server, Mongo DB, HBase, Cassendra
Operating Systems: Windows, Linux
BI Tools: Tableau, OBIEE, Qlikview Amazon Redshift or Azure Data warehouse