We provide IT Staff Augmentation Services!

Data Scientist Resume

Phoenix, AZ

SUMMARY:

  • Above 8+ years of experience in MachineLearning, Data mining with largedatasets of Structured and Unstructureddata, DataAcquisition, Data Validation, Predictivemodeling, DataVisualization.
  • Experience with Statistics, DataAnalysis and MachineLearning using R language.
  • Strong with ETL, Data warehousing, Data Store concepts and OLAP technologies.
  • Experienced in creating cutting edge data processing algorithms to meet project demands.
  • Worked with packages like ggplot2 and in R to understand data and developing applications.
  • Worked on Spot fire to create dashboards and visualizations.
  • Highly competent Confidential researching, visualizing and analyzing raw data in order to identify recommendations for meeting organizational challenges.
  • Proven excellence in personal management and program development.
  • Unparalleled capacity to link quantitative and qualitative statistics to improvements in operating standards.
  • Strong experience working with SQLServer2008, R - Studio, Oracle, Sybase.
  • Ability to perform Data preparation and exploration to build the appropriate machine learning model.
  • Worked on Statistical models to create new theories and products.
  • Experience working with statistical and regression analysis, multi-objective optimization.
  • Designed and implemented supervised and unsupervised machine learning.
  • Identify problems and provide solutions to business problems using data processing, data visualization and graphical data analysis.
  • Good understanding of JAVA / J2EE Design Patterns like Singleton, Factory, Front Controller, Flyweight.
  • Worked in AmazonWebServices cloud computing environment.
  • Experienced in SQL programming and creation of relational database models.
  • Proficient in Statistical Modeling and Machine Learning techniques in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
  • Used springframework to design architecture of the application as per requirements.
  • Query optimization, execution plan and Performance tuning of queries for better performance in SQL.
  • Implement and practice Machinelearning techniques on structured and unstructured data with equal proficiency.
  • Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables and OLAP reporting.
  • Ability to use dimensionality reduction techniques and regularization techniques
  • Expert in data flow between primary DB and various reporting tools. Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly.
  • Proficient in requirement gathering, writing, analysis, estimation, use case review, scenario preparation, test planning and strategy decision making, test execution, test results analysis, team management and test result reporting.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Solid knowledge of mathematics and experience in applying it to technical and research fields.
  • Identifying areas where optimization can be efficient.
  • Expertise in MachineLearning models like Linear, Logistics, Decision Trees, Random Forest, SVM, K-Nearest Neighbors, clustering (K-means, Hierarchical),Bayesian
  • Worked with clients to identify analytical needs and documented them for further use.
  • Developed predictive models using R to predict customers churn and classification of customers.
  • Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.
  • Worked on Shiny and R application showcasing machine learning for improving the forecast of business.
  • Worked on Designing and configuration of the database and back end applications and programs.
  • Developed user interface using HTML/CSS, JavaScript and JQuery.
  • Well experienced in Normalizationand De-Normalization techniques for optimum performance in relational and dimensional database environments.

TECHNICAL SKILLS:

Languages: R, C#, VC++, Java, PL/SQL,Python

MachineLearning Models : Basic Statistics, Supervised and Unsupervised learning.

Scripting: Unix Shell Scripting

Development Tools: RStudio, Visual Studio.NET 2015, Eclipse, Quest - SQL Navigator, Spotfire, Quick View

Data Science/Big data: Statistical Analysis, Machine Learning, Data Mining, Hadoop, HBase 1.2, HDFS, NoSql, HBase

Operating Systems: Windows 10.0, UNIX with Sun Solaris 8.0, HP-Unix

Databases: MS SQL Server 2005, Oracle 11g, Sybase

Web Technologies: Silverlight, AJAX, ASP.NET, Java Script, (IIS) 7.0, AWS (Amazon Web Services)

Others: .NET 4.5, WPF, WCF, XAML, LINQ, MS Team Foundation Server(TFS), SSRS, Infragistics/Telerik Toolkit

Programming Expertise: R- 3.2.2, SAS -9.2, Python-3.5 & 2.7, Azure ML, Spark, Oracle -SQL, Big data eco-systems (Hive, Pig, Sqoop, Flume, Oozie), Java Script.

Visualization Tools: MS - Office (Excel, Power point), Tableau, R (ggplot2, Shiny), D3.js.

Skills and Traits: SQL, Python, Excel, C/C++, SOLR, Linux, Microsoft Word, Access, Powerpoint, Mathematical Programming Software (Macaulay2, MATLAB, Mathematica)

Techniques: Regression, GLM, Trees (Decision tress, Oblique decision trees, CHAID), Random Forest, Clustering (K-means, Hierarchical, SOM), Association Rules, K-Nearest Neighbors, Neural Nets, XG Boost, SVM, Bayesian, Linear Programming, Quadratic Programming, Genetic Algorithm, Ant colony optimization, Collaborative filtering

PROFESSIONAL EXPERIENCE:

Confidential - Phoenix, AZ

Data Scientist

Responsibilities:

  • Developed, Implemented & Maintained the Conceptual, Logical & Physical Data Models using Erwin for Forward/Reverse Engineered Databases.
  • Designed algorithms to identify and extract incident alerts from a daily pool of incidents.
  • Reduced redundancy among incoming incidents by proposing rules to recognize patterns.
  • Scheduled searches Using Splunk tool.
  • Worked with Machine learning algorithms like Regressions (linear, logistic etc..), SVMs, Decision trees.
  • Solution architecting BIG Data solution for Projects & Proposal using Hadoop, Spark, ELK Stack, Kafka, Tensor flow.
  • Worked on Clustering and classification of data using machine learning algorithms.
  • Good applied statistics skills, such as statistical sampling, testing, regression, etc.
  • Build analytic models using a variety of techniques such as logistic regression, risk scorecards and pattern recognition technologies.
  • Work with technical and development teams to deploy models. Build Model Performance Reports and Modeling Technical Documentation to support each of the models for the product line.
  • Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.
  • Analyzed data from Primary and secondary sources using statistical techniques to provide daily reports.
  • Created Custom Dashboards and reports using Splunktool.
  • Estimation and Requirement Analysis of project timelines.
  • Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
  • Used packages like dplyr, tidyr&ggplot2 in R Studio for data visualization.
  • Extending company's data with third party sources of information when needed.
  • Enhancing data collection procedures to include information that is relevant for building analytic systems.
  • Experienced in Delivery, Portfolio, Team / Career, Vendor and Program management Competency in Solution Architecture, implementation & delivery of Big Data, data science analytics &DWH projects on GreenPlum, SPARK, Keras, Python and TensorFlow.
  • Processing, cleansing, and verifying the integrity of data used for analysis
  • Doing ad-hoc analysis and presenting results in a clear manner
  • Constant tracking of model performance
  • Excellent understanding of machine learning techniques and algorithms, such as Logistic Regression, SVM, Random Forests, Deep Learning etc.
  • Worked with Data governance, Data quality, data lineage, Data architect to design various models.
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
  • Designed data models and data flow diagrams using Erwin and MS Visio.
  • As an Architect implemented MDM hub to provide clean, consistent data for a SOA implementation.
  • Experience with common data science toolkits, such as R, Python, Spark, etc.
  • Developed and designed SQL procedures and Linux shell scripts for data export/import and for converting data.
  • Used Test Driven Development (TDD) for the project.
  • Written SQL Queries, Store Procedures, Triggers and functions for MySQL Databases.
  • Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined.
  • Established Data architecture strategy, best practices, standards, and roadmaps.
  • Lead the development and presentation of a data analytics data-hub prototype with the help of the other members of the emerging solutions team.
  • Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
  • Interacted with the other departments to understand and identify data needs and requirements.
  • Involved in analysis of Business requirement, Design and Development of High level and Low level designs, Unit and Integration testing.

Environment: R, R Studio, Splunk, SQL, MYSQL and Windows, UNIX, Python 3.5, MLLib, SAS, regression, logistic regression, Hadoop, NoSQL, Teradata, TensorFlow, OLTP, random forest, OLAP, HDFS, ODS

Confidential - Orlando, FL

Data Scientist

Responsibilities:

  • Implementation of MetadataRepository, MaintainingDataQuality, DataCleanup procedures, Transformations, Data Standards, Data Governance program, Scripts, Stored Procedures, triggers and execution of test plans.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Applied linear regression on data and predicted the sales. Without much advertisement how the sales is happening is predicted and measures are taken from insights how to advertise in a better manner.
  • Analyzed the partitioned and bucketed data and compute various metrics for reporting.
  • Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Understanding the project Objectives and requirements from a domain prospective and understanding the problem definition.
  • Extracting the desired data from multiple sources integrating and preprocessing, cleaning and filling in missing data.
  • Evaluation of models for Bias and variance problem. Evaluating models for best parameters using K folds.
  • Update and maintained existing universes based on changes in user requirements& in data source.
  • Visually analyses chasm traps and Fan traps and resolve loops by creating aliases and contexts.
  • Design, develop, test, and deploy reports and dashboards that feed into mobile applications for ABM's, BRM's and Marketing users.
  • Manage JAD Sessions with Business Users in Design, Development& maintenance of analysis needs.
  • Provide input to project management role in terms of development/test activities and sequencing.
  • Utilize Business Objects reporting/ dashboard tools to import KPI's from data warehouse and ERP systems and present business insights to the business leaders and stakeholders.
  • Troubleshooting of universe schema with loops, chasm and fan traps, and cardinality problems.

Environment: : Erwin r9.0, Informatica 9.0, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel, Mainframes MS Visio, Rational Rose and Requisite Pro. Hadoop, PL/SQL, etc..

Confidential - Dallas, TX

Data Scientist

Responsibilities:

  • Designed algorithms to identify and extract incident alerts from a daily pool of incidents.
  • Reduced redundancy among incoming incidents by proposing rules to recognize patterns.
  • Scheduled searches Using Splunk tool.
  • Worked with Machinelearning algorithms like Regressions (linear,logistic etc..), SVMs , Decision trees.
  • Worked on Clustering and classification of data using machine learning algorithms.
  • Created Custom Dashboards and reports using Splunk tool.
  • Estimation and Requirement Analysis of project timelines.
  • Analyzed data and recommended new strategies for root cause and finding quickest way to solve big data sets.
  • Used packages like dplyr , tidyr & ggplot2in R Studio for data visualization.
  • Analyzed data from Primary and secondary sources using statistical techniques to provide daily reports.
  • Developed and designed SQL procedures and Linuxshellscripts for data export/import and for converting data.
  • Used Test Driven Development ( TDD) for the project.
  • Written SQL Queries, Store Procedures, Triggers and functions for MySQL Databases.
  • Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.

Environment: R, R Studio, Splunk, SQL, MYSQL and Windows.

Confidential -San Francisco, CA

R Programmer / Data Scientist

Responsibilities:

  • Conducted research on development and designing of sample methodologies, and analyzed data for pricing of client’s products.
  • Use Correlation analysis to identify relation between variables, patterns, outliers and causal factors.
  • Identified statistically significant variables.
  • Investigated market sizing, competitive analysis and positioning for product feasibility.
  • Worked on Business forecasting, segmentation analysis and Data mining.
  • Used Support vector machines for classification of data in groups.
  • Generated graphs and reports using ggplot package in R-Studio for analytical models.
  • Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
  • Developed predictive models using Decision Tree, Random Forest and Naïve Bayes.
  • Performed time series analysis using Spotfire.
  • Collaborating with dev-ops teams for production deployment.
  • Worked in AmazonWebServices cloud computing environment.
  • Worked with Caffe Deep LearningFramework.
  • Developed various workbooks in Spot fire from multiple data sources.
  • Created dashboards and visualizations using Spot fire desktop.
  • Worked on R packages to interface with CaffeDeep Learning Framework.
  • Performed analysis using JMP.
  • Perform validation on machine learning output from R.
  • Written connectors to extract data from databases.

Environment: R, R Studio, Excel 2013, Amazon Web Services, Machine Learning,Spotfire, JMP, Segmentation analysis.

Confidential

Data Analyst

Responsibilities:

  • Analyze business information requirements and model class diagrams and/or conceptual domain models.
  • Gather & Review Customer Information Requirements for OLAP and building the data mart.
  • Performed document analysis involving creation of Use Cases and Use Case narrations using Microsoft Visio, in order to present the efficiency of the gathered requirements.
  • Calculated and analyzed claims data for provider incentive and supplemental benefit analysis using Microsoft Access and Oracle SQL.
  • Analyzed business process workflows and assisted in the development of ETL procedures for mapping data from source to target systems.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Terra-data.
  • Responsible for defining the key identifiers for each mapping/interface
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Enterprise Metadata Library with any changes or updates.
  • Document data quality and traceability documents for each source interface.
  • Establish standards of procedures.
  • Generate weekly and monthly asset inventory reports.
  • Managed the project requirements, documents and use cases by IBM Rational RequisitePro.
  • Assisted in building an Integrated LogicalDataDesign, propose physical database design for building the data mart.
  • Document all data mapping and transformation processes in the Functional Design documents based on the business requirements.

Environment: SQL Server 2008R2/2005 Enterprise, SSRS, SSIS, Crystal Reports, Windows Enterprise Server 2000, DTS, SQL Profiler, and Query Analyzer.

Confidential

Java Developer

Responsibilities:

  • Participate in requirements & design discussions
  • Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
  • Designed and developed user interface using JSP, HTML and JavaScript.
  • Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
  • Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
  • Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
  • Used DAO and JDBC for database access.
  • Developed application using AngularJS and Node.JS connecting to Oracle on the backend.
  • Designed and developed a Restful APIs using Spring REST API.
  • Consumed RESTful web services using AngularJSHTTP service & rendered the JSON data on the screen.
  • Used AngularJSframework for building web-apps and is highly efficient in integrating with Restful services.
  • Design and develop XML processing components for dynamic menus on the application.
  • Involved in postproduction support and maintenance of the application.

Environment: Oracle, Java, Struts, Servlets, HTML, XML, SQL, J2EE, Angular JS, JUnit, RESTful, SOA, Tomcat

Hire Now