Data Scientist Resume
Phoenix, AZ
SUMMARY:
- Above 8+ years of experience in MachineLearning, Data mining with largedatasets of Structured and Unstructureddata, DataAcquisition, Data Validation, Predictivemodeling, DataVisualization.
- Experience with Statistics, DataAnalysis and MachineLearning using R language.
- Strong with ETL, Data warehousing, Data Store concepts and OLAP technologies.
- Experienced in creating cutting edge data processing algorithms to meet project demands.
- Worked with packages like ggplot2 and in R to understand data and developing applications.
- Worked on Spot fire to create dashboards and visualizations.
- Highly competent Confidential researching, visualizing and analyzing raw data in order to identify recommendations for meeting organizational challenges.
- Proven excellence in personal management and program development.
- Unparalleled capacity to link quantitative and qualitative statistics to improvements in operating standards.
- Strong experience working with SQLServer2008, R - Studio, Oracle, Sybase.
- Ability to perform Data preparation and exploration to build the appropriate machine learning model.
- Worked on Statistical models to create new theories and products.
- Experience working with statistical and regression analysis, multi-objective optimization.
- Designed and implemented supervised and unsupervised machine learning.
- Identify problems and provide solutions to business problems using data processing, data visualization and graphical data analysis.
- Good understanding of JAVA / J2EE Design Patterns like Singleton, Factory, Front Controller, Flyweight.
- Worked in AmazonWebServices cloud computing environment.
- Experienced in SQL programming and creation of relational database models.
- Proficient in Statistical Modeling and Machine Learning techniques in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
- Used springframework to design architecture of the application as per requirements.
- Query optimization, execution plan and Performance tuning of queries for better performance in SQL.
- Implement and practice Machinelearning techniques on structured and unstructured data with equal proficiency.
- Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
- Ability to use dimensionality reduction techniques and regularization techniques
- Expert in data flow between primary DB and various reporting tools. Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly.
- Proficient in requirement gathering, writing, analysis, estimation, use case review, scenario preparation, test planning and strategy decision making, test execution, test results analysis, team management and test result reporting.
- Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
- Solid knowledge of mathematics and experience in applying it to technical and research fields.
- Identifying areas where optimization can be efficient.
- Expertise in MachineLearning models like Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, clustering (K-means, Hierarchical),Bayesian
- Worked with clients to identify analytical needs and documented them for further use.
- Developed predictive models using R to predict customers churn and classification of customers.
- Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
- Worked on Shiny and R application showcasing machine learning for improving the forecast of business.
- Worked on Designing and configuration of the database and back end applications and programs.
- Developed user interface using HTML/CSS, JavaScript and JQuery.
- Well experienced in Normalizationand De-Normalization techniques for optimum performance in relational and dimensional database environments.
TECHNICAL SKILLS:
Languages: R, C#, VC++, Java, PL/SQL,Python
MachineLearning Models : Basic Statistics, Supervised and Unsupervised learning.
Scripting: Unix Shell Scripting
Development Tools: RStudio, Visual Studio.NET 2015, Eclipse, Quest - SQL Navigator, Spotfire, Quick View
Data Science/Big data: Statistical Analysis, Machine Learning, Data Mining, Hadoop, HBase 1.2, HDFS, NoSql, HBase
Operating Systems: Windows 10.0, UNIX with Sun Solaris 8.0, HP-Unix
Databases: MS SQL Server 2005, Oracle 11g, Sybase
Web Technologies: Silverlight, AJAX, ASP.NET, Java Script, (IIS) 7.0, AWS (Amazon Web Services)
Others: .NET 4.5, WPF, WCF, XAML, LINQ, MS Team Foundation Server(TFS), SSRS, Infragistics/Telerik Toolkit
Programming Expertise: R- 3.2.2, SAS -9.2, Python-3.5 & 2.7, Azure ML, Spark, Oracle -SQL, Big data eco-systems (Hive, Pig, Sqoop, Flume, Oozie), Java Script.
Visualization Tools: MS - Office (Excel, Power point), Tableau, R (ggplot2, Shiny), D3.js.
Skills and Traits: SQL, Python, Excel, C/C++, SOLR, Linux, Microsoft Word, Access, Powerpoint, Mathematical Programming Software (Macaulay2, MATLAB, Mathematica)
Techniques: Regression, GLM, Trees (Decision tress, Oblique decision trees, CHAID), Random Forest, Clustering (K-means, Hierarchical, SOM), Association Rules, K-Nearest Neighbors, Neural Nets, XG Boost, SVM, Bayesian, Linear Programming, Quadratic Programming, Genetic Algorithm, Ant colony optimization, Collaborative filtering
PROFESSIONAL EXPERIENCE:
Confidential - Phoenix, AZ
Data Scientist
Responsibilities:
- Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forward/Reverse Engineered Databases.
- Designed algorithms to identify and extract incident alerts from a daily pool of incidents.
- Reduced redundancy among incoming incidents by proposing rules to recognize patterns.
- Scheduled searches Using Splunk tool.
- Worked with Machine learning algorithms like Regressions (linear, logistic etc..), SVMs, Decision trees.
- Solution architecting BIG Data solution for Projects & Proposal using Hadoop, Spark, ELK Stack, Kafka, Tensor flow.
- Worked on Clustering and classification of data using machine learning algorithms.
- Good applied statistics skills, such as statistical sampling, testing, regression, etc.
- Build analytic models using a variety of techniques such as logistic regression, risk scorecards and pattern recognition technologies.
- Work with technical and development teams to deploy models. Build Model Performance Reports and Modeling Technical Documentation to support each of the models for the product line.
- Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
- Analyzed data from Primary and secondary sources using statistical techniques to provide daily reports.
- Created Custom Dashboards and reports using Splunktool.
- Estimation and Requirement Analysis of project timelines.
- Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
- Used packages like dplyr, tidyr&ggplot2 in R Studio for data visualization.
- Extending company's data with third party sources of information when needed.
- Enhancing data collection procedures to include information that is relevant for building analytic systems.
- Experienced in Delivery, Portfolio, Team / Career, Vendor and Program management Competency in Solution Architecture, implementation & delivery of Big Data, data science analytics &DWH projects on GreenPlum, SPARK, Keras, Python and TensorFlow.
- Processing, cleansing, and verifying the integrity of data used for analysis
- Doing ad-hoc analysis and presenting results in a clear manner
- Constant tracking of model performance
- Excellent understanding of machine learning techniques and algorithms, such as Logistic Regression, SVM, Random Forests, Deep Learning etc.
- Worked with Data governance, Data quality, data lineage, Data architect to design various models.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Designed data models and data flow diagrams using Erwin and MS Visio.
- As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
- Experience with common data science toolkits, such as R, Python, Spark, etc.
- Developed and designed SQL procedures and Linux shell scripts for data export/import and for converting data.
- Used Test Driven Development (TDD) for the project.
- Written SQL Queries, Store Procedures, Triggers and functions for MySQL Databases.
- Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.
- Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined.
- Established Data architecture strategy, best practices, standards, and roadmaps.
- Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Interacted with the other departments to understand and identify data needs and requirements.
- Involved in analysis of Business requirement, Design and Development of High level and Low level designs, Unit and Integration testing.
Environment: R, R Studio, Splunk, SQL, MYSQL and Windows, UNIX, Python 3.5, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, TensorFlow, OLTP, random forest, OLAP, HDFS, ODS
Confidential - Orlando, FL
Data Scientist
Responsibilities:
- Implementation of MetadataRepository, MaintainingDataQuality, DataCleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Applied linear regression on data and predicted the sales. Without much advertisement how the sales is happening is predicted and measures are taken from insights how to advertise in a better manner.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Understanding the project Objectives and requirements from a domain prospective and understanding the problem definition.
- Extracting the desired data from multiple sources integrating and preprocessing, cleaning and filling in missing data.
- Evaluation of models for Bias and variance problem. Evaluating models for best parameters using K folds.
- Update and maintained existing universes based on changes in user requirements& in data source.
- Visually analyses chasm traps and Fan traps and resolve loops by creating aliases and contexts.
- Design, develop, test, and deploy reports and dashboards that feed into mobile applications for ABM's, BRM's and Marketing users.
- Manage JAD Sessions with Business Users in Design, Development& maintenance of analysis needs.
- Provide input to project management role in terms of development/test activities and sequencing.
- Utilize Business Objects reporting/ dashboard tools to import KPI's from data warehouse and ERP systems and present business insights to the business leaders and stakeholders.
- Troubleshooting of universe schema with loops, chasm and fan traps, and cardinality problems.
Environment: : Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose and Requisite Pro. Hadoop, PL/SQL, etc..
Confidential - Dallas, TX
Data Scientist
Responsibilities:
- Designed algorithms to identify and extract incident alerts from a daily pool of incidents.
- Reduced redundancy among incoming incidents by proposing rules to recognize patterns.
- Scheduled searches Using Splunk tool.
- Worked with Machinelearning algorithms like Regressions (linear,logistic etc..), SVMs , Decision trees.
- Worked on Clustering and classification of data using machine learning algorithms.
- Created Custom Dashboards and reports using Splunk tool.
- Estimation and Requirement Analysis of project timelines.
- Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
- Used packages like dplyr , tidyr & ggplot2in R Studio for data visualization.
- Analyzed data from Primary and secondary sources using statistical techniques to provide daily reports.
- Developed and designed SQL procedures and Linuxshellscripts for data export/import and for converting data.
- Used Test Driven Development ( TDD) for the project.
- Written SQL Queries, Store Procedures, Triggers and functions for MySQL Databases.
- Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.
Environment: R, R Studio, Splunk, SQL, MYSQL and Windows.
Confidential -San Francisco, CA
R Programmer / Data Scientist
Responsibilities:
- Conducted research on development and designing of sample methodologies, and analyzed data for pricing of client’s products.
- Use Correlation analysis to identify relation between variables, patterns, outliers and causal factors.
- Identified statistically significant variables.
- Investigated market sizing, competitive analysis and positioning for product feasibility.
- Worked on Business forecasting, segmentation analysis and Data mining.
- Used Support vector machines for classification of data in groups.
- Generated graphs and reports using ggplot package in R-Studio for analytical models.
- Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
- Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
- Performed time series analysis using Spotfire.
- Collaborating with dev-ops teams for production deployment.
- Worked in AmazonWebServices cloud computing environment.
- Worked with Caffe Deep LearningFramework.
- Developed various workbooks in Spot fire from multiple data sources.
- Created dashboards and visualizations using Spot fire desktop.
- Worked on R packages to interface with CaffeDeep Learning Framework.
- Performed analysis using JMP.
- Perform validation on machine learning output from R.
- Written connectors to extract data from databases.
Environment: R, R Studio, Excel 2013, Amazon Web Services, Machine Learning,Spotfire, JMP, Segmentation analysis.
Confidential
Data Analyst
Responsibilities:
- Analyze business information requirements and model class diagrams and/or conceptual domain models.
- Gather & Review Customer Information Requirements for OLAP and building the data mart.
- Performed document analysis involving creation of Use Cases and Use Case narrations using Microsoft Visio, in order to present the efficiency of the gathered requirements.
- Calculated and analyzed claims data for provider incentive and supplemental benefit analysis using Microsoft Access and Oracle SQL.
- Analyzed business process workflows and assisted in the development of ETL procedures for mapping data from source to target systems.
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Terra-data.
- Responsible for defining the key identifiers for each mapping/interface
- Responsible for defining the functional requirement documents for each source to target interface.
- Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
- Enterprise Metadata Library with any changes or updates.
- Document data quality and traceability documents for each source interface.
- Establish standards of procedures.
- Generate weekly and monthly asset inventory reports.
- Managed the project requirements, documents and use cases by IBM Rational RequisitePro.
- Assisted in building an Integrated LogicalDataDesign, propose physical database design for building the data mart.
- Document all data mapping and transformation processes in the Functional Design documents based on the business requirements.
Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.
Confidential
Java Developer
Responsibilities:
- Participate in requirements & design discussions
- Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
- Designed and developed user interface using JSP, HTML and JavaScript.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
- Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Validated the fields of user registration screen and login screen by writing JavaScript validations.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed application using AngularJS and Node.JS connecting to Oracle on the backend.
- Designed and developed a Restful APIs using Spring REST API.
- Consumed RESTful web services using AngularJSHTTP service & rendered the JSON data on the screen.
- Used AngularJSframework for building web-apps and is highly efficient in integrating with Restful services.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
Environment: Oracle, Java, Struts, Servlets, HTML, XML, SQL, J2EE, Angular JS, JUnit, RESTful, SOA, Tomcat