Data Scientist Resume Oldsmar, FL - Hire IT People

SUMMARY

Proficient inDatapreparation such asDataExtraction,DataCleansing,DataValidation and ExploratoryDataAnalysis to ensure thedataquality.
Datacleaning &DataImputation (outlier detection, missing value treatment)
DataTransformation(Features scaling, Features engineering)
Statistical modeling both linear and nonlinear (logistic, linear, Naïve Bayes, decision trees, Random forest, neural networks, SVM, clustering, KNN)
Experienced with statistics methodologies such as Time Series, Hypothesis Testing, ANOVA, and Chi - Square Test.
Proficient in statistical programming languages like R and Python 2.x/3.x including BigData technologies like Hadoop, Hive.
Worked at every stage ofDataScience project from inception thru deployment which includes
DataGathering & sampling ofDataex: stratified sampling, clustering etc.
Hypothesis testing (Power Analysis, effect size, T test, ANOVA,datadistribution, chi sq test)
EDA (Descriptive stats, Inferential stats,datavisualization)
Expert in Feature Engineering by implementing both Feature Selection and Feature Extraction.
Experienced with Deep learning techniques such as Convolutional Neural Networks, Recurrent Neural Networks by using Keras and Tensorflow.
Familiar with Recommendation System Design by implementing Collaborative Filtering, Matrix Factorization and Clustering Methods.
Experienced with Natural Language Processing along with Topic modeling and Sentiment Analysis.
Experienced in working with Relational DB with strong SQL skill set.
Ability to write SQL queries for various RDBMS such as SQL Server, MySQL, Teradata and Oracle; worked on NoSQL databases such as MongoDB and Cassandra to handle unstructureddata.
Experienced with streaming database Kafka.
In depth understanding of building and publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau and SSRS.
Expertise in Python programming with various packages including NumPy, Pandas, SciPy and Scikit Learn.
Proficient inDatavisualization tools such as Tableau, Plotly, Python Matplotlib and Seaborn.
Familiar with Hadoop Ecosystem such as HDFS, HBase, Hive, Pig and Oozie.
Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, Spark ML).
Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with bigdatatools, solve thedatastorage issue and work on deployment solution.
Experienced in ticketing systems such as Jira/confluence and version control tools such as GitHub.
Worked on deployment tools such as Azure Machine Learning Studio, Oozie, AWS Lambda.
Strong understanding of SDLC in Agile methodology and Scrum process.
Strong experience for working in fast-paced multi-tasking environment both independently and in the collaborative team. Acceptable with challenging projects and work in ambiguity to solve complex problems. A self-motivated enthusiastic learner.

PROFESSIONAL EXPERIENCE

Confidential - Oldsmar, FL

Data Scientist

Responsibilities:

Experience working inDataRequirement analysis for transformingdataaccording to business requirements.
Applied Forward Elimination and Backward Elimination fordatasets to identify most statically significant variables forDataanalysis.
Utilized Label Encoders and One-Hot Encoder in Python to create dummy variables for geographic locations to identify their impact on pre-acquisition and post acquisitions by using 2 sample paired t test.
Worked with ETL SQL Server Integration Services (SSIS) fordatainvestigation and mapping to extractdataand applied fast parsing and enhanced efficiency.
DevelopedDataScience content involvingDataManipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL forDataExtraction.
Built Analytical systems,datastructures, gather and manipulatedata, using statistical techniques.
Designing suite of Interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
Provided and createddatapresentation to reduce biases and telling true story of people by pulling millions of rows ofdatausing SQL and performed ExploratoryDataAnalysis.
Applied breadth of knowledge in programming (R, Python), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop).
Migrateddatafrom HeterogeneousDataSources and legacy system (DB2, Access, Excel) to centralized SQL Server databases using SQL Server Integration Services (SSIS).
Applied Descriptive statistics and Inferential Statistics on variesdataattributes using SPSS to draw insights ofdataregarding providing products and services for patients.
Developed and utilized various machine learning algorithms such as Logistic Regression, Decision trees, Neural Network models, Hybrid recommendation model and NLP fordataanalysis.
Utilizeddatareduction techniques such as Factor analysis to identify most correlated values to underlying factors of thedataand categorized the variable according to factors.
Handled importingdatafrom variousdatasources, performed transformations using Hive, Map Reduce, and loadeddatainto HDFS by using HQL queries in Hadoop.
Performance Tuning: Analyze the requirements and fine tune the stored procedures/queries to improve the performance of the application.
Developed various Tableau9.4DataModels by extracting and using thedatafrom various sources files, DB2, Excel, Flat Files and Big data.
Interaction with Business Analyst, SMEs and otherDataArchitects to understand Business needs and functionality for various project solutions.

Environment: R Programming, Python, Jupyter, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, SQL Server Management Studio, Hadoop, Business Intelligence Development Studio, SAP Business Objects and Business Intelligence.

Confidential - Bronx, NY

Data Scientist

Responsibilities:

Used Tableau to automatically generate reports. Worked with partially adjudicated insurance flat files, internal records, 3rd partydatasources, JSON, XML and more.
Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, and Spark ML).
Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with bigdatatools, solve thedatastorage issue and work on deployment solution.
Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, spacetime.
Performed ExploratoryDataAnalysis andDataVisualizations using R, and Tableau.
Implemented end-to-end systems forDataAnalytics,DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
Gathering all thedatathat is required from multipledatasources and creating datasets that will be used in analysis.
Knowledge extraction from Notes using NLP (Python, NLTK, MLLIB, PySpark,)
Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
Worked with BTEQ to submit SQL statements, import and exportdata, and generate reports in Teradata.
Built and optimizeddatamining pipelines of NLP, and text analytic to extract information.
Coded R functions to interface with Caffe Deep Learning Framework
Working in Amazon Web Services cloud computing environment
Interacted with the other departments to understand and identifydataneeds and requirements and work with other members of the IT organization to deliverdatavisualization and reporting solutions to address those needs.
Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
Designeddatamodels anddataflow diagrams using Erwin and MS Visio.
EstablishedDataarchitecture strategy, best practices, standards, and roadmaps.
Performed data cleaning and imputation of missing values using R.
Developed, Implemented & Maintained the Conceptual, Logical & PhysicalDataModels using Erwin for Forward/Reverse Engineered Databases.
Built and optimizeddatamining pipelines of NLP, and text analytic to extract information.
Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
Creating customized business reports and sharing insights to the management.
Take up ad-hoc requests based on different departments and locations.
Used Hive to store thedataand performdatacleaning steps for huge datasets.
Created dash boards and visualization on regular basis using ggplot2 and Tableau.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, Qlikview, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

DataAnalyst

Responsibilities:

Performeddatacleaning on hugedatasets comprising millions of rows including merging datasets, imputing missingdata, noise and outlier treatment,dataconsolidation
Transformed the rawdatainto actionable insights by incorporating various statistical techniques and usingdatamining tools such as Python (Scikit-Learn, Pandas, NumPy, Matplotlib) and SQL
Implemented an RFM segmentation model to categorize the customer base into various segments such as most valuable, active and lost, also, to analyze the customer life time value
Implemented a classification model (Logistic Regression) to predict the prospective customers based on their age, area, income, time spent on the website per day to get accuracy
Conducted ExploratoryDataAnalysis on the customer historical billing information to improve upon the model of forecasting customers increasing or declining product use
Extensively used Tableau dashboards for visualizations and Report generations

Environment: Python 2.X, SQL Server 2005 Enterprise, MS Visio, MS-Office, MS Excel, MS PowerPoint, MS Word, Macros, Tableau, Jira, HPQC

We provide IT Staff Augmentation Services!

Data Scientist Resume

Oldsmar, FL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship