Data Scientist Resume
Temple Terrace, FL
SUMMARY:
- Over all 7+ years of experience in Analyzing, Designing, Developing, Testing, Maintaining and Supporting Applications using R, Python, BigData, Had oop, Apache Spark, Scala, Hive, Sqoop, Tableau, PowerBuilder.
- Drive use case analysis and architectural design around activities focused on meeting business requirements within the tools of the ecosystem.
- Partner with Architecture, Development and Operational teams to define the architectural vision and direction of a Data Ecosystem to meet the modern data requirements which may comprise of a mix of Big Data Storage system such as Hadoop batch analytics, near - time analytics platforms and NoSQL Online application access.
- Design, and develop automated test cases to verify solution feasibility and interoperability, including performance assessments.
- Data warehousing and Relational Database Design with MS SQL server, Oracle and MySQL.
- Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.
- Used advanced analytical techniques to segment our customers into actionable segments/micro-segments enabling more holistic customer strategy and experience
- Experience with Hadoop Reference Architectures associated with AWS, Azure, HP, VMWare Infrastructure.
- Proficient in Python, R and Tableau used in data analysis/ mining, various analytics and data visualization implementations.
- Good knowledge in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.
- Experience and knowledge in using various Python packages for Data Science such as NumPy, Scipy, Pandas, Matplotlib and Scikit-learn.
- Experience in using various R packages for Data Science such as ggplot2, tidyr, Dplyr, caTools, rpart and MASS.
- Experience with analyzing online user behavior, Conversion Data (A/B Testing) and customer journeys, funnel analysis.
- Excellent Analytical and Communication skills required to effectively work in the field of applications development and maintenance.
TECHNICAL SKILLS:
RDBMS: SQL Server 2000/2005/2008/ R2/2012/2014, Oracle 9i/10g/11g MySQL, MS Access
Languages: Visual Basic, C, C++, R, Python, Scala
Data Warehousing/BI: Excel, SharePoint, Tableau
Big Data: Hadoop, Spark/Scala, Hive, Sqoop
NOSQL: Cassandra, HBase
Machine Learning: R, Python, Spark Mlib
Operating System: Windows, UNIX, Linux
PROFESSIONAL EXPERIENCE:
Confidential, Temple Terrace, FL
Data Scientist
Responsibilities:
- Evaluating the data analytics opportunities to improve the efficiency of claims handling process like Fraud Detection
- Utilized various data analysis and data visualization tools to accomplish data analysis, report design and report delivery.
- Create statistical models based on researched information to provide conclusions that will guide the company and the industry into the future.
- Taking care of missing data after import and encoding the categorical data, when needed.
- Splitting the data into training set, test set and scaling the data in training set and test set, if necessary.
- Creatively communicated and presented models to business customers and executives, utilizing a variety of formats and visualization methodologies.
- Impact of marketing tactics on sales and then forecast the impact of future sets of tactics.
- Developed Scala and SQL code to extract data from various databases
- Used R and python for Exploratory Data Analysis and Hypothesis test to compare and identify the effectiveness of Creative Campaigns.
- Used Scala, Python, R and SQL to create Statistical algorithms involving Linear Regression, Logistic Regression, Random forest, Decision trees, Support Vector Machine for estimating the risks.
- Developed statistical models to forecast inventory and procurement cycles.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Created pipelines for data ingestion and from various channels, through the scripts written in Hive & Java.
- Work with a range of proprietary, industry standard, and open source data stores to assemble and organize and analyze data.
- Mapped customers to revenue to predict the revenue (if any) from a new prospective customer.
- Visualizations, Summary Reports and Presentations using R and Tableau.
- Uploaded data to Hadoop Hive and combined new tables with existing databases.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, and Scala.
- Developed pyspark code and Spark-SQL/Streaming for faster testing and processing of data .
- Supported Map Reduce Programs those are running on the cluster.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data .
- Scheduled jobs and workflow scheduler to manage Hadoop jobs.
- Loaded the aggregated data into Data Mart for reporting, dash boarding and ad-hoc analysis using Tableau and developed a self-service BI solution for quicker turnaround of insights.
- Maintained SQL scripts to create and populate tables in data warehouse for daily reporting across departments.
Environment: R 3.x, Python 2.x, Tableau 9, SQL Server 2012, Spark/Scala, SBT, Hive, Sqoop, Spark ML.
Confidential, Chicago, IL
Data Scientist
Responsibilities:
- Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc.
- Application of various machine learning algorithms and statistical modeling like decision trees, regression models and K-Means using Python and R.
- Developed clinical NLP methods that ingest large unstructured clinical data sets, separate signal from noise, and provide personalized insights at the patient level that directly improve our analytics platform .
- Used NLP methods for information extraction, topic modeling, parsing, and relationship extraction .
- Worked with NLTK library for NLP data processing and finding the patterns.
- Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Ensured that the model has low False Positive Rate.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
- Worked on Natural Language Processing with NLTK module of python for application development and automated customer response.
- Utilized statistical Natural Language Processing for sentiment analysis, mine unstructured data, and create insights.
- Worked on feature engineering such as feature creating, feature scaling and One-Hot encoding with Scikit-learn.
- Performed Logistic Regression, Random forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
- Implemented public segmentation by implementing k-means algorithm.
- Implemented rule-based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
- Generated detailed report after validating the graphs using R, and adjusting the variables to fit the model.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration.
- Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.
- Used packages like Dplyr, tidyr& ggplot2 in R Studio for Data visualization and generated scatter plot and high low graph to identify relation between different variables.
- Created various types of data visualizations using Python and Tableau.
- Communicated the results with operations team for taking best decisions.
- Collected data needs and requirements by Interacting with the other departments.
Environment: R, Python 2.x, Linux, Tableau Desktop, SQL Server.
Confidential, Land O Lakes, FL
Data Scientist
Responsibilities:
- Analyze and Prepare data, identify the patterns on dataset by applying historical models. Collaborating with Senior Data Scientists for understanding of data
- Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating model in R
- This project was focused on customer segmentation based on machine learning and statistical modelling effort including building predictive models and generate data products to support customer segmentation
- Used R and Python for programming for improvement of model. Upgrade the entire models for improvement of the product
- Develop a pricing model for various product and services bundled offering to optimize and predict the gross margin
- Built price elasticity model for various product and services bundled offering
- Under supervision of Sr. Data Scientist performed Data Transformation method for Re scaling and Normalizing Variables
- Developed predictive causal model using annual failure rate and standard cost basis for the new bundled service offering
- Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning
- Utilized Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLib, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction etc.
- Worked with sales and Marketing team for Partner and collaborate with a cross-functional team to frame and answer important data questions
- P rototyping and experimentation ML/DL algorithms and integrating into production system for different business needs
- Worked on Multiple datasets containing two billion values which are structured and unstructured data about web applications usage and online customer surveys
- Segmented the customers based on demographics using K-means Clustering
- Explored different regression and ensemble models in machine learning to perform forecasting
- Presented Dashboards to Higher Management for more Insights using Power BI
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
- Performed Boosting method on predicted model for the improve efficiency of the model
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using R, Tableau, and Power BI
Environment: MS SQL Server, R/R studio, SQL Enterprise Manager, Python, Red shift, MS Excel, Power BI, Tableau, T-SQL, ETL, MS Access, XML, MS office 2007, Outlook, AS E-Mine.
Confidential
Data Analyst
Responsibilities:
- Extensively worked on Informatica PowerCenter Transformations such as Source Qualifier, Lookup, Filter, Expression, Router, Joiner, Update Strategy, Rank, Aggregator, Sequence Generator etc.
- Proficiency in using Informatica PowerCenter tool to design data conversions from wide variety of sources.
- Proficient in using Informatica workflow manager, Workflow monitor to create, schedule and control workflows, tasks, and sessions.
- Created pivot tables and ran VLOOKUP's in Excel as a part of data validation.
- Used Informatica PowerCenter for extraction, loading and transformation (ETL) of data in the data warehouse.
- Worked on data analysis, data discrepancy reduction in the source and target schemas.
- Designed and developed complex mappings, from varied transformation logic like Unconnected and Connected lookups, Router, Filter, Expression, Aggregator, Joiner, Update Strategy and more.
- Preparation of System requirements (SRS), Database specifications (DBS), Software design document (SDD).
- Responsible for the maintenance of few applications in PowerBuilder 10.2
- Involved in using SQL Server 2005 for fixed the production issues in the background.
- Coordination and Quality activities on delivery
- Involved in testing with validation of all fields, functions, programs, agents from front end and back end code reviews across the application.
- Involved in preparation program specifications, unit tests, test cases and user manual documents.
Environment: Informatica 8.x, PowerBuilder 10.2, SQL Server 2005.
Confidential
Data Analyst
Responsibilities:
- Built time series models with ARIMA in R to make budget forecasting
- Developed risk assessment models by using Decision Trees and Analytic Hierarchy Process
- Designed and maintained comprehensive dashboards and metrics to enable real-time business decisions
- Coded SQL queries to extract data and identify granularity issues and relationships between datasets and recommended solutions
- Involved in manipulating, cleansing & processing of data using Excel, Access and SQL
- Compared the source data with historical data to perform statistical analysis
- Performed data preprocessing and data cleaning, collected and organized data
Environment: MS Access, R, MS Excel, ETL.