We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Temple Terrace, FL

SUMMARY:

  • Over all 7+ years of experience in Analyzing, Designing, Developing, Testing, Maintaining and Supporting Applications using R, Python, BigData, Had oop, Apache Spark, Scala, Hive, Sqoop, Tableau, PowerBuilder.
  • Drive use case analysis and architectural design around activities focused on meeting business requirements within the tools of the ecosystem.
  • Partner with Architecture, Development and Operational teams to define the architectural vision and direction of a Data Ecosystem to meet the modern data requirements which may comprise of a mix of Big Data Storage system such as Hadoop batch analytics, near - time analytics platforms and NoSQL Online application access.
  • Design, and develop automated test cases to verify solution feasibility and interoperability, including performance assessments.
  • Data warehousing and Relational Database Design with MS SQL server, Oracle and MySQL.
  • Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.
  • Used advanced analytical techniques to segment our customers into actionable segments/micro-segments enabling more holistic customer strategy and experience
  • Experience with Hadoop Reference Architectures associated with AWS, Azure, HP, VMWare Infrastructure.
  • Proficient in Python, R and Tableau used in data analysis/ mining, various analytics and data visualization implementations.
  • Good knowledge in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
  • Experience and knowledge in using various Python packages for Data Science such as NumPy, Scipy, Pandas, Matplotlib and Scikit-learn.
  • Experience in using various R packages for Data Science such as ggplot2, tidyr, Dplyr, caTools, rpart and MASS.
  • Experience with analyzing online user behavior, Conversion Data (A/B Testing) and customer journeys, funnel analysis.
  • Excellent Analytical and Communication skills required to effectively work in the field of applications development and maintenance.

TECHNICAL SKILLS:

RDBMS: SQL Server 2000/2005/2008/ R2/2012/2014, Oracle 9i/10g/11g MySQL, MS Access

Languages: Visual Basic, C, C++, R, Python, Scala

Data Warehousing/BI: Excel, SharePoint, Tableau

Big Data: Hadoop, Spark/Scala, Hive, Sqoop

NOSQL: Cassandra, HBase

Machine Learning: R, Python, Spark Mlib

Operating System: Windows, UNIX, Linux

PROFESSIONAL EXPERIENCE:

Confidential, Temple Terrace, FL

Data Scientist

Responsibilities:

  • Evaluating the data analytics opportunities to improve the efficiency of claims handling process like Fraud Detection
  • Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
  • Create statistical models based on researched information to provide conclusions that will guide the company and the industry into the future.
  • Taking care of missing data after import and encoding the categorical data, when needed.
  • Splitting the data into training set, test set and scaling the data in training set and test set, if necessary.
  • Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
  • Impact of marketing tactics on sales and then forecast the impact of future sets of tactics.
  • Developed Scala and SQL code to extract data from various databases
  • Used R and python for Exploratory Data Analysis and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
  • Used Scala, Python, R and SQL to create Statistical algorithms involving Linear Regression, Logistic Regression, Random forest, Decision trees, Support Vector Machine for estimating the risks.
  • Developed statistical models to forecast inventory and procurement cycles.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Created pipelines for data ingestion and from various channels, through the scripts written in Hive & Java.
  • Work with a range of proprietary, industry standard, and open source data stores to assemble and organize and analyze data.
  • Mapped customers to revenue to predict the revenue (if any) from a new prospective customer.
  • Visualizations, Summary Reports and Presentations using R and Tableau.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
  • Developed pyspark code and Spark-SQL/Streaming for faster testing and processing of data .
  • Supported Map Reduce Programs those are running on the cluster.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data .
  • Scheduled jobs and workflow scheduler to manage Hadoop jobs.
  • Loaded the aggregated data into Data Mart for reporting, dash boarding and ad-hoc analysis using Tableau and developed a self-service BI solution for quicker turnaround of insights.
  • Maintained SQL scripts to create and populate tables in data warehouse for daily reporting across departments.

Environment: R 3.x, Python 2.x, Tableau 9, SQL Server 2012, Spark/Scala, SBT, Hive, Sqoop, Spark ML.

Confidential, Chicago, IL

Data Scientist

Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc.
  • Application of various machine learning algorithms and statistical modeling like decision trees, regression models and K-Means using Python and R.
  • Developed clinical NLP methods that ingest large unstructured clinical data sets, separate signal from noise, and provide personalized insights at the patient level that directly improve our analytics platform .
  • Used NLP methods for information extraction, topic modeling, parsing, and relationship extraction .
  • Worked with NLTK library for NLP data processing and finding the patterns.
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Ensured that the model has low False Positive Rate.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Worked on Natural Language Processing with NLTK module of python for application development and automated customer response.
  • Utilized statistical Natural Language Processing for sentiment analysis, mine unstructured data, and create insights.
  • Worked on feature engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
  • Performed Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Implemented public segmentation by implementing k-means algorithm.
  • Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Generated detailed report after validating the graphs using R, and adjusting the variables to fit the model.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
  • Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
  • Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
  • Used packages like Dplyr, tidyr& ggplot2 in R Studio for Data visualization and generated scatter plot and high low graph to identify relation between different variables.
  • Created various types of data visualizations using Python and Tableau.
  • Communicated the results with operations team for taking best decisions.
  • Collected data needs and requirements by Interacting with the other departments.

Environment: R, Python 2.x, Linux, Tableau Desktop, SQL Server.

Confidential, Land O Lakes, FL

Data Scientist

Responsibilities:

  • Analyze and Prepare data, identify the patterns on dataset by applying historical models. Collaborating with Senior Data Scientists for understanding of data
  • Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating model in R
  • This project was focused on customer segmentation based on machine learning and statistical modelling effort including building predictive models and generate data products to support customer segmentation
  • Used R and Python for programming for improvement of model. Upgrade the entire models for improvement of the product
  • Develop a pricing model for various product and services bundled offering to optimize and predict the gross margin
  • Built price elasticity model for various product and services bundled offering
  • Under supervision of Sr. Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables
  • Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
  • Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
  • Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc.
  • Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions
  • P rototyping and experimentation ML/DL algorithms and integrating into production system for different business needs
  • Worked on Multiple datasets containing two billion values which are structured and unstructured data about web applications usage and online customer surveys
  • Segmented the customers based on demographics using K-means Clustering
  • Explored different regression and ensemble models in machine learning to perform forecasting
  • Presented Dashboards to Higher Management for more Insights using Power BI
  • Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
  • Performed Boosting method on predicted model for the improve efficiency of the model
  • Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI

Environment: MS SQL Server, R/R studio, SQL Enterprise Manager, Python, Red shift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office 2007, Outlook, AS E-Mine.

Confidential

Data Analyst

Responsibilities:

  • Extensively worked on Informatica PowerCenter Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Joiner, Update Strategy, Rank, Aggregator, Sequence Generator etc.
  • Proficiency in using Informatica PowerCenter tool to design data conversions from wide variety of sources.
  • Proficient in using Informatica workflow manager, Workflow monitor to create, schedule and control workflows, tasks, and sessions.
  • Created pivot tables and ran VLOOKUP's in Excel as a part of data validation.
  • Used Informatica PowerCenter for extraction, loading and transformation (ETL) of data in the data warehouse.
  • Worked on data analysis, data discrepancy reduction in the source and target schemas.
  • Designed and developed complex mappings, from varied transformation logic like Unconnected and Connected lookups, Router, Filter, Expression, Aggregator, Joiner, Update Strategy and more.
  • Preparation of System requirements (SRS), Database specifications (DBS), Software design document (SDD).
  • Responsible for the maintenance of few applications in PowerBuilder 10.2
  • Involved in using SQL Server 2005 for fixed the production issues in the background.
  • Coordination and Quality activities on delivery
  • Involved in testing with validation of all fields, functions, programs, agents from front end and back end code reviews across the application.
  • Involved in preparation program specifications, unit tests, test cases and user manual documents.

Environment: Informatica 8.x, PowerBuilder 10.2, SQL Server 2005.

Confidential

Data Analyst

Responsibilities:

  • Built time series models with ARIMA in R to make budget forecasting
  • Developed risk assessment models by using Decision Trees and Analytic Hierarchy Process
  • Designed and maintained comprehensive dashboards and metrics to enable real-time business decisions
  • Coded SQL queries to extract data and identify granularity issues and relationships between datasets and recommended solutions
  • Involved in manipulating, cleansing & processing of data using Excel, Access and SQL
  • Compared the source data with historical data to perform statistical analysis
  • Performed data preprocessing and data cleaning, collected and organized data

Environment: MS Access, R, MS Excel, ETL.

We'd love your feedback!