We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Findlay, OhiO

PROFESSIONAL SUMMARY:

  • Above 8+ years of experience in large datasets of Data Visualization , Data Acquisition, Predictive modeling and Data Validation.
  • Deep understanding of dimensionality reduction using Factor Analysis.
  • Good in fundamentals of machine learning theories and practices, including regression models, GLM, SVM, and tree - based models (e.g. Boosting, Random Forest).
  • Experience in Excel Macros, Pivot Tables and other advanced functions and expertise R user with knowledge of statistical programming languages SAS.
  • Proficient in leveraging large sets of structured and unstructured data to develop strategic insights.
  • E xperience with techniques like Feature Clustering, Regression, engineering, Time Series Forecasting, Sentiment Analysis, Neural Networks, Bayesian network, CNN and RNN.
  • Experience in managing entire data science project life cycle including Data Acquisition, Data Cleaning, Data Analysis, Features Scaling and Data Loading.
  • Adept in identifying business needs and implementing solutions using information technology tools and good experience in Business Analytics.
  • Experience in Machine Learning (Supervised, Unsupervised and Semi -supervised algorithms).
  • Expert in Python and R scripting.
  • Expert in Statistical Analysis by using R.
  • Experience on Big Data technologies and tools like Hadoop, Hive, MapReduce, Spark, and Scala.
  • Profound analytical and problem-solving skills along with an ability to understand current business processes and implement effective solutions to problems.
  • Extensive experience in using R packages like GGPLOT2, DPLYR, MASS, etc.
  • Expert in innovation and formulation of new ideas and predictive models.
  • Adept and deep understanding of Statistical modeling, Multivariate Analysis, testing, problem analysis, comparison and validation.
  • Expert in Relational Database Management Systems.
  • Possess programming ability to navigate through large SQL database and No SQL database.
  • Expertise in SQL Server Developer Skills including writing Stored Procedures, User Defined Functions, Views, and Queries.
  • Proficiency in using SQL to manipulate data, query expressions, join statements, subquery etc.
  • Worked in stats function with Numpy, visualization using Tableau, Seaborn and Pandas for organizing data.
  • Skillful experience in Text Analytics, generating data visualizations using Python and creating dashboards using tools like Tableau.
  • Ability to explain complex technical materials to nontechnical audiences.

TECHNICAL SKILLS:

Database Design Tools: Fact & Dimensions tables, Normalization and De-normalization techniques, Kimball, logical data modeling.

Data Modelling Tools: MS Visio, ER/Studio, SAP Power designer

Programming Languages: R, SQL, Python, NoSQL, SAS and C#

Scripting Languages: Python (Pandas, matplotlib, scikit, learn Numpy, SciPy and seaborn), R (dplyr, knitr, ggplot, Weka, caret), BI and Visualization Tableau, Hadoop, Spark, Scala, QlikView, Python and R.

Databases: SQL (Oracle database, MSSQL Server and MySQL) and NoSQL (Hive and NoSQL)

Reporting Tools: Business Intelligence, Crystal Reports XI and Tableau.

Business Analytical: Time scheduling, Risk Analysis and Presentation.

Modeling technique: Predictive Modeling/Linear Regression/ ANOVAs, Cluster analysis Machine Learning/ Logistic Regression, LDA, Naïve Bayes, Decision Trees, Regression models, random forests, K-means clustering, Market Basket Analysis, support vector machines and Time-series.

Version control: Jupyter Notebook, Git, GitHub

Process/Model Tools: HTML, CSS, MS Office, MS Project, MS Visio, MS Excel, MS Power Point, MS Word Web Development.s

PROFESSIONAL EXPERIENCE:

Confidential, Findlay, Ohio

Data Scientist

Responsibilities:

  • Developed Spark Scala code to perform on the data in data pipeline in different stages.
  • Worked with Data Architects to understand the movement of data.
  • Conducted segmentation analysis by using clustering, Deep Learning and Natural Language Processing (NLTK).
  • Used Pandas, NumPy, Seaborn, SciPy, Matplotlib in Python for developing various machine learning algorithms and utilized machine learning algorithms such as Logistic Regression, Support Vector Machine, Random Forests, and A/B testing.
  • Handled importing data from various data sources, performed transformations using Hive, Pig, and loaded data into HDFS.
  • Created Hive queries that helped analysts spot emerging trends by comparing fresh data with EDW reference tables and processed the data using HQL on top of Map-reduce.
  • Designed user interfaces and customization of reports using Tableau Server.
  • Leverage AWS Cloud Computing for Reproducible Research to support ad-hoc and exploratory Q&A such as back test investment strategies faster by spreading work out across machines.
  • Developed MapReduce/Spark Python modules for Machine Learning & Predictive analytics in Hadoop on AWS.
  • Implemented a Python-based distributed random forest via Python streaming.
  • Extracted data from source files for analysis.
  • Involved in gathering requirements while uncovering and defining multiple dimensions.
  • Implemented models using Statistical techniques like Machine Learning classification models like Support Vector Machine and Random Forest.
  • Worked thoroughly with data compliance teams, data governance team to maintain Metadata, data models, data Dictionaries and define source fields.
  • Worked with BTEQ to submit SQL statements, import and export data, generate reports in Teradata.
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.

Environment: Machine Learning, SAS, Teradata, Python, Hadoop, Spark, Tensor flow, Scala, HDFS, AWS, SQL Server 2012, Pig, Hive, HBase, MapReduce, NoSQL, Tableau.

Confidential, Houston, Texas

Data Scientist

Responsibilities:

  • Built an algorithm to identify Customers how likely purchased insurance.
  • Setup storage and data analysis tools in Amazon Web Services cloud computing infrastructure.
  • Fine-tuned the Machine Learning algorithm to meet the acceptable standards in Python using the packages NLTK and Scikit-learn.
  • Analyzed the dataset, performed feature selection, created new features for designing the predictive model.
  • Gathered, analyzed and translated business requirements into relevant analytic approaches.
  • Designed Logistic Regression Model, Support Vector Machine, and Random Forest to calculate precision, recall and F-factor.
  • Hands on with various Data Cleaning processes like handling missing values by using techniques such as replacing by mean, forward/backward fill, removing entire rows/columns/values, removing outliers and normalizing, and scaling data.
  • Prepared comprehensive documented observations, analyses and interpretations of results including technical reports, summaries, protocols and quantitative analyses.
  • Developed and designed NoSQL procedures for data export/import and for converting data.
  • Developed personalized products recommendation with Machine Learning algorithms including Collaborative filtering and Gradient Boosting, to better meet the needs of existing customers and acquire new customers.
  • Worked with stakeholders to troubleshoot issues, communicated to team members, leadership and stakeholders on findings to ensure well understanding of models and optimization.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, Pig and loaded data into HDFS in Hadoop.
  • Analyzed, transformed, and contextualized a variety of ingested data - social data, GIS data, POI and AOI data, and some consumer behavior data for building direct marketing predictive models.
  • Delivered Interactive visualizations/dashboards using Tableau, Matplotlib, and ggplot to present analysis outcomes in terms of patterns, anomalies, and predictions use of bar charts, graphs, and histograms.
  • Performed Boosting method on predictive models to improve/optimize model performance.
  • Contribution for implementing NLP to identify, extract, summarize, and categorize the relevant qualitative financial input information like sentiment/feedback/news according to specific structures (templates) from a source text (digital news) to support decision-making.
  • Applied customer segmentation with clustering algorithms and developed geodemographic customer segmentation models.
  • Classified text documents using Naive Bayes algorithm for Sentiment analysis and gathering insights from a large volume of unstructured/text data.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.

Environment: Machine Learning, NLP, R, SAS, Hadoop, Spark, Python, HDFS, Pig, Hive, Microsoft Word and Microsoft Excel, HBase, MapReduce, Tableau, NoSQL and AWS.

Confidential, Lincoln, RI

Data Scientist/Data Analyst

Responsibilities:

  • Documented all programs and procedures to ensure an accurate historical record of work completed on an assigned project as well as to improve quality and efficacy.
  • Accomplished Data analysis in Python by using Numpy and Pandas and statistical analysis by using R.
  • Provided daily change management process support, ensuring that all changes to program baselines are properly documented and approved, maintained, managed and issued changed schedules.
  • Data collection with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
  • Established a business analysis methodology around the Agile Software Development Process.
  • Data cleaned, filtered and transformed data in the specified format in Python by using Numpy and Pandas and R.
  • Worked with employers to identify the most appropriate source of records.
  • Built, published customized interactive reports and dashboards, report scheduling using Tableau server.
  • Monitored company databases in high performance/high availability environment with supported configuration using Hive, HBase, Map Reduce, Pig and loaded data into HDFS in Hadoop.
  • Analyzed requirements and created Business Requirement Document (BRD), Current business process flow, Future business process flow, Use Case Diagrams, and Activity Diagrams using Microsoft Visio.
  • Used R and Python to identify different products performance via various Machine Learning Algorithms such as Classification, tree map, and regression models along with visualizing data for interactive understanding and decision-making.

Environment: Machine Learning, Agile methodology, SQL, Pig, Tableau, Hadoop, HDFS, Hive, HBase, MapReduce, Python, R, Microsoft Visio, MS Excel, Microsoft Project, and MS Word.

Confidential, Bartlesville, OK

Data Modeler

Responsibilities:

  • Participated in the Agile planning process and daily scrums, provided details to create stories based on technical solutions and estimates and worked with internal architects and assisted in the development of current and target state data architectures.
  • Explored and analyzed customer historical billing information to build a predictive model to forecast customers increasing or declining product use.
  • Analyzed the client data and business terms from a data quality and integrity perspective.
  • Involved in requirements collection, gap analysis, and reported and document creation.
  • Assessed completeness, consistency, and validity of customer data and created models and simulations.
  • Documented the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Analyzed sales and performance records, and interpreted results.
  • Sourced and analyzed data from a variety of sources like SAS data sets, MS Access, MS Excel, CSV and flat files etc.
  • Used ETL process to Extract, Transform and Load the data into stage area and data warehouse.
  • Performed Data alignment and Data cleansing. Involved in Data Migration between Teradata, and MS SQL server.
  • Used Tableau and MS PowerPoint and MS Excel to produce reports.
  • Prepared sales forecasts by collecting and analyzed sales data to evaluate current sales goals.
  • Developed complex SQL queries to bring data together from various systems.
  • Evaluated data profiling, cleansing, combination and extraction devices.
  • Assisted the team for standardization of reports using SAS macros and SQL.

Environment: MS Visio, MS Project, MS-Office, MS Excel, MS PowerPoint, MS Word, Macros,SQL Server 2005 Enterprise, ER Studio, Teradata, Tableau, ETL, Business Objects and XML.

Confidential

Data Analyst

Responsibilities:

  • Write SQL scripts to test the mappings and Developed Traceability Matrix of Business Requirements mapped to Test Scripts to ensure any Change Control in requirements leads to test case update.
  • Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services.
  • Maintained metadata and version controlling for the data model.
  • Co-ordinate with various business users, stakeholders to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to graphical interface.
  • Applied Business Objects best practices during development with a strong focus on reusability.
  • Developed SQLscripts for creating tables, Sequences, Triggers and materialized views.
  • Developed Tableau visualizations using Tableau Desktop.
  • Developed and executed load scripts using Teradata client utilities MULTILOAD and BTEQ.
  • Responsible for development and testing of conversion programs for importing Data from text files into map Oracle Database utilizing, shell scripts & SQL*Loader.
  • Designed different type of STAR schemas for detailed data marts and plan data marts in the OLAP environment.
  • Utilized Erwin's forward/reverse engineering tools and target database schema conversion process.

Environment: Oracle SQL Developer, SQL*LOADER, SQL*PLUS, TOAD, MS SQL Server, PL/SQL, Business Objects Business Objects, Tableau, Informatica, XML, Windows XP.

Confidential

Data Analyst

Responsibilities:

  • Coordinated meetings with vendors to define requirements and system interaction agreement documentation between client and vendor system.
  • Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Terra-data.
  • Responsible for defining the key identifiers for each mapping/interface
  • Document, clarify, and communicate requests for change requests with the requestor and coordinate with the development and testing team.
  • Gather & Review Customer Information Requirements for OLAP and building the data mart.
  • Calculated and analyzed claims data for provider incentive and supplemental benefit analysis using Microsoft Access and Oracle SQL.
  • Performed document analysis involving creation of Use Cases and Use Case narrations using Microsoft Visio, in order to present the efficiency of the gathered requirements.
  • Generate weekly and monthly asset inventory reports.
  • Managed the project requirements, documents and use cases by IBM Rational RequisitePro.
  • Document all data mapping and transformation processes in the Functional Design documents based on the business requirements.
  • Coordinate with the business users in providing appropriate, effective and efficient way to design the new reporting needs based on the user with the existing functionality
  • Analyze business information requirements and model class diagrams and/or conceptual domain models.
  • Analyzed business process workflows and assisted in the development of ETL procedures for mapping data from source to target systems.
  • Responsible for defining the functional requirement documents for each source to target interface.
  • Assisted in building an Integrated Logical Data Design, propose physical database design for building the data mart.

Environment: SSRS, SSIS, Crystal Reports, SQL Server 2008R2/2005 Enterprise, Windows Enterprise Server 2000, Query Analyzer, DTS and SQL Profiler.

We'd love your feedback!