Data Scientist Resume
4.00/5 (Submit Your Rating)
Oldsmar, FL
SUMMARY
- Proficient inDatapreparation such asDataExtraction,DataCleansing,DataValidation and ExploratoryDataAnalysis to ensure thedataquality.
- Datacleaning &DataImputation (outlier detection, missing value treatment)
- DataTransformation(Features scaling, Features engineering)
- Statistical modeling both linear and nonlinear (logistic, linear, Naïve Bayes, decision trees, Random forest, neural networks, SVM, clustering, KNN)
- Experienced with statistics methodologies such as Time Series, Hypothesis Testing, ANOVA, and Chi - Square Test.
- Proficient in statistical programming languages like R and Python 2.x/3.x including BigData technologies like Hadoop, Hive.
- Worked at every stage ofDataScience project from inception thru deployment which includes
- DataGathering & sampling ofDataex: stratified sampling, clustering etc.
- Hypothesis testing (Power Analysis, effect size, T test, ANOVA,datadistribution, chi sq test)
- EDA (Descriptive stats, Inferential stats,datavisualization)
- Expert in Feature Engineering by implementing both Feature Selection and Feature Extraction.
- Experienced with Deep learning techniques such as Convolutional Neural Networks, Recurrent Neural Networks by using Keras and Tensorflow.
- Familiar with Recommendation System Design by implementing Collaborative Filtering, Matrix Factorization and Clustering Methods.
- Experienced with Natural Language Processing along with Topic modeling and Sentiment Analysis.
- Experienced in working with Relational DB with strong SQL skill set.
- Ability to write SQL queries for various RDBMS such as SQL Server, MySQL, Teradata and Oracle; worked on NoSQL databases such as MongoDB and Cassandra to handle unstructureddata.
- Experienced with streaming database Kafka.
- In depth understanding of building and publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau and SSRS.
- Expertise in Python programming with various packages including NumPy, Pandas, SciPy and Scikit Learn.
- Proficient inDatavisualization tools such as Tableau, Plotly, Python Matplotlib and Seaborn.
- Familiar with Hadoop Ecosystem such as HDFS, HBase, Hive, Pig and Oozie.
- Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, Spark ML).
- Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with bigdatatools, solve thedatastorage issue and work on deployment solution.
- Experienced in ticketing systems such as Jira/confluence and version control tools such as GitHub.
- Worked on deployment tools such as Azure Machine Learning Studio, Oozie, AWS Lambda.
- Strong understanding of SDLC in Agile methodology and Scrum process.
- Strong experience for working in fast-paced multi-tasking environment both independently and in the collaborative team. Acceptable with challenging projects and work in ambiguity to solve complex problems. A self-motivated enthusiastic learner.
PROFESSIONAL EXPERIENCE
Confidential - Oldsmar, FL
Data Scientist
Responsibilities:
- Experience working inDataRequirement analysis for transformingdataaccording to business requirements.
- Applied Forward Elimination and Backward Elimination fordatasets to identify most statically significant variables forDataanalysis.
- Utilized Label Encoders and One-Hot Encoder in Python to create dummy variables for geographic locations to identify their impact on pre-acquisition and post acquisitions by using 2 sample paired t test.
- Worked with ETL SQL Server Integration Services (SSIS) fordatainvestigation and mapping to extractdataand applied fast parsing and enhanced efficiency.
- DevelopedDataScience content involvingDataManipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL forDataExtraction.
- Built Analytical systems,datastructures, gather and manipulatedata, using statistical techniques.
- Designing suite of Interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
- Provided and createddatapresentation to reduce biases and telling true story of people by pulling millions of rows ofdatausing SQL and performed ExploratoryDataAnalysis.
- Applied breadth of knowledge in programming (R, Python), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop).
- Migrateddatafrom HeterogeneousDataSources and legacy system (DB2, Access, Excel) to centralized SQL Server databases using SQL Server Integration Services (SSIS).
- Applied Descriptive statistics and Inferential Statistics on variesdataattributes using SPSS to draw insights ofdataregarding providing products and services for patients.
- Developed and utilized various machine learning algorithms such as Logistic Regression, Decision trees, Neural Network models, Hybrid recommendation model and NLP fordataanalysis.
- Utilizeddatareduction techniques such as Factor analysis to identify most correlated values to underlying factors of thedataand categorized the variable according to factors.
- Handled importingdatafrom variousdatasources, performed transformations using Hive, Map Reduce, and loadeddatainto HDFS by using HQL queries in Hadoop.
- Performance Tuning: Analyze the requirements and fine tune the stored procedures/queries to improve the performance of the application.
- Developed various Tableau9.4DataModels by extracting and using thedatafrom various sources files, DB2, Excel, Flat Files and Big data.
- Interaction with Business Analyst, SMEs and otherDataArchitects to understand Business needs and functionality for various project solutions.
Environment: R Programming, Python, Jupyter, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, SQL Server Management Studio, Hadoop, Business Intelligence Development Studio, SAP Business Objects and Business Intelligence.
Confidential - Bronx, NY
Data Scientist
Responsibilities:
- Used Tableau to automatically generate reports. Worked with partially adjudicated insurance flat files, internal records, 3rd partydatasources, JSON, XML and more.
- Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, and Spark ML).
- Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with bigdatatools, solve thedatastorage issue and work on deployment solution.
- Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, spacetime.
- Performed ExploratoryDataAnalysis andDataVisualizations using R, and Tableau.
- Implemented end-to-end systems forDataAnalytics,DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
- Gathering all thedatathat is required from multipledatasources and creating datasets that will be used in analysis.
- Knowledge extraction from Notes using NLP (Python, NLTK, MLLIB, PySpark,)
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
- Worked with BTEQ to submit SQL statements, import and exportdata, and generate reports in Teradata.
- Built and optimizeddatamining pipelines of NLP, and text analytic to extract information.
- Coded R functions to interface with Caffe Deep Learning Framework
- Working in Amazon Web Services cloud computing environment
- Interacted with the other departments to understand and identifydataneeds and requirements and work with other members of the IT organization to deliverdatavisualization and reporting solutions to address those needs.
- Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
- Designeddatamodels anddataflow diagrams using Erwin and MS Visio.
- EstablishedDataarchitecture strategy, best practices, standards, and roadmaps.
- Performed data cleaning and imputation of missing values using R.
- Developed, Implemented & Maintained the Conceptual, Logical & PhysicalDataModels using Erwin for Forward/Reverse Engineered Databases.
- Built and optimizeddatamining pipelines of NLP, and text analytic to extract information.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
- Creating customized business reports and sharing insights to the management.
- Take up ad-hoc requests based on different departments and locations.
- Used Hive to store thedataand performdatacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau.
Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, Qlikview, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
Confidential
DataAnalyst
Responsibilities:
- Performeddatacleaning on hugedatasets comprising millions of rows including merging datasets, imputing missingdata, noise and outlier treatment,dataconsolidation
- Transformed the rawdatainto actionable insights by incorporating various statistical techniques and usingdatamining tools such as Python (Scikit-Learn, Pandas, NumPy, Matplotlib) and SQL
- Implemented an RFM segmentation model to categorize the customer base into various segments such as most valuable, active and lost, also, to analyze the customer life time value
- Implemented a classification model (Logistic Regression) to predict the prospective customers based on their age, area, income, time spent on the website per day to get accuracy
- Conducted ExploratoryDataAnalysis on the customer historical billing information to improve upon the model of forecasting customers increasing or declining product use
- Extensively used Tableau dashboards for visualizations and Report generations
Environment: Python 2.X, SQL Server 2005 Enterprise, MS Visio, MS-Office, MS Excel, MS PowerPoint, MS Word, Macros, Tableau, Jira, HPQC