We provide IT Staff Augmentation Services!

Data Scientist Resume

4.00/5 (Submit Your Rating)

Oldsmar, FL

SUMMARY

  • Proficient inDatapreparation such asDataExtraction,DataCleansing,DataValidation and ExploratoryDataAnalysis to ensure thedataquality.
  • Datacleaning &DataImputation (outlier detection, missing value treatment)
  • DataTransformation(Features scaling, Features engineering)
  • Statistical modeling both linear and nonlinear (logistic, linear, Naïve Bayes, decision trees, Random forest, neural networks, SVM, clustering, KNN)
  • Experienced with statistics methodologies such as Time Series, Hypothesis Testing, ANOVA, and Chi - Square Test.
  • Proficient in statistical programming languages like R and Python 2.x/3.x including BigData technologies like Hadoop, Hive.
  • Worked at every stage ofDataScience project from inception thru deployment which includes
  • DataGathering & sampling ofDataex: stratified sampling, clustering etc.
  • Hypothesis testing (Power Analysis, effect size, T test, ANOVA,datadistribution, chi sq test)
  • EDA (Descriptive stats, Inferential stats,datavisualization)
  • Expert in Feature Engineering by implementing both Feature Selection and Feature Extraction.
  • Experienced with Deep learning techniques such as Convolutional Neural Networks, Recurrent Neural Networks by using Keras and Tensorflow.
  • Familiar with Recommendation System Design by implementing Collaborative Filtering, Matrix Factorization and Clustering Methods.
  • Experienced with Natural Language Processing along with Topic modeling and Sentiment Analysis.
  • Experienced in working with Relational DB with strong SQL skill set.
  • Ability to write SQL queries for various RDBMS such as SQL Server, MySQL, Teradata and Oracle; worked on NoSQL databases such as MongoDB and Cassandra to handle unstructureddata.
  • Experienced with streaming database Kafka.
  • In depth understanding of building and publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau and SSRS.
  • Expertise in Python programming with various packages including NumPy, Pandas, SciPy and Scikit Learn.
  • Proficient inDatavisualization tools such as Tableau, Plotly, Python Matplotlib and Seaborn.
  • Familiar with Hadoop Ecosystem such as HDFS, HBase, Hive, Pig and Oozie.
  • Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, Spark ML).
  • Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with bigdatatools, solve thedatastorage issue and work on deployment solution.
  • Experienced in ticketing systems such as Jira/confluence and version control tools such as GitHub.
  • Worked on deployment tools such as Azure Machine Learning Studio, Oozie, AWS Lambda.
  • Strong understanding of SDLC in Agile methodology and Scrum process.
  • Strong experience for working in fast-paced multi-tasking environment both independently and in the collaborative team. Acceptable with challenging projects and work in ambiguity to solve complex problems. A self-motivated enthusiastic learner.

PROFESSIONAL EXPERIENCE

Confidential - Oldsmar, FL

Data Scientist

Responsibilities:

  • Experience working inDataRequirement analysis for transformingdataaccording to business requirements.
  • Applied Forward Elimination and Backward Elimination fordatasets to identify most statically significant variables forDataanalysis.
  • Utilized Label Encoders and One-Hot Encoder in Python to create dummy variables for geographic locations to identify their impact on pre-acquisition and post acquisitions by using 2 sample paired t test.
  • Worked with ETL SQL Server Integration Services (SSIS) fordatainvestigation and mapping to extractdataand applied fast parsing and enhanced efficiency.
  • DevelopedDataScience content involvingDataManipulation and Visualization, Web Scraping, Machine Learning, Python programming, SQL, GIT and ETL forDataExtraction.
  • Built Analytical systems,datastructures, gather and manipulatedata, using statistical techniques.
  • Designing suite of Interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
  • Provided and createddatapresentation to reduce biases and telling true story of people by pulling millions of rows ofdatausing SQL and performed ExploratoryDataAnalysis.
  • Applied breadth of knowledge in programming (R, Python), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality (SQL, Hadoop).
  • Migrateddatafrom HeterogeneousDataSources and legacy system (DB2, Access, Excel) to centralized SQL Server databases using SQL Server Integration Services (SSIS).
  • Applied Descriptive statistics and Inferential Statistics on variesdataattributes using SPSS to draw insights ofdataregarding providing products and services for patients.
  • Developed and utilized various machine learning algorithms such as Logistic Regression, Decision trees, Neural Network models, Hybrid recommendation model and NLP fordataanalysis.
  • Utilizeddatareduction techniques such as Factor analysis to identify most correlated values to underlying factors of thedataand categorized the variable according to factors.
  • Handled importingdatafrom variousdatasources, performed transformations using Hive, Map Reduce, and loadeddatainto HDFS by using HQL queries in Hadoop.
  • Performance Tuning: Analyze the requirements and fine tune the stored procedures/queries to improve the performance of the application.
  • Developed various Tableau9.4DataModels by extracting and using thedatafrom various sources files, DB2, Excel, Flat Files and Big data.
  • Interaction with Business Analyst, SMEs and otherDataArchitects to understand Business needs and functionality for various project solutions.

Environment: R Programming, Python, Jupyter, SPSS, SQL Server 2014, SSRS, SSIS, SSAS, SQL Server Management Studio, Hadoop, Business Intelligence Development Studio, SAP Business Objects and Business Intelligence.

Confidential - Bronx, NY

Data Scientist

Responsibilities:

  • Used Tableau to automatically generate reports. Worked with partially adjudicated insurance flat files, internal records, 3rd partydatasources, JSON, XML and more.
  • Experienced in building models by using Spark (PySpark, SparkSQL, Spark MLLib, and Spark ML).
  • Experienced in Cloud Services such as AWS EC2, EMR, RDS, S3 to assist with bigdatatools, solve thedatastorage issue and work on deployment solution.
  • Worked with several R packages including knitr, dplyr, SparkR, Causal Infer, spacetime.
  • Performed ExploratoryDataAnalysis andDataVisualizations using R, and Tableau.
  • Implemented end-to-end systems forDataAnalytics,DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop and MongoDB.
  • Gathering all thedatathat is required from multipledatasources and creating datasets that will be used in analysis.
  • Knowledge extraction from Notes using NLP (Python, NLTK, MLLIB, PySpark,)
  • Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with BigData/Hadoop.
  • Worked with BTEQ to submit SQL statements, import and exportdata, and generate reports in Teradata.
  • Built and optimizeddatamining pipelines of NLP, and text analytic to extract information.
  • Coded R functions to interface with Caffe Deep Learning Framework
  • Working in Amazon Web Services cloud computing environment
  • Interacted with the other departments to understand and identifydataneeds and requirements and work with other members of the IT organization to deliverdatavisualization and reporting solutions to address those needs.
  • Perform a proper EDA, Univariate and bi-variate analysis to understand the intrinsic effect/combined effects.
  • Designeddatamodels anddataflow diagrams using Erwin and MS Visio.
  • EstablishedDataarchitecture strategy, best practices, standards, and roadmaps.
  • Performed data cleaning and imputation of missing values using R.
  • Developed, Implemented & Maintained the Conceptual, Logical & PhysicalDataModels using Erwin for Forward/Reverse Engineered Databases.
  • Built and optimizeddatamining pipelines of NLP, and text analytic to extract information.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
  • Creating customized business reports and sharing insights to the management.
  • Take up ad-hoc requests based on different departments and locations.
  • Used Hive to store thedataand performdatacleaning steps for huge datasets.
  • Created dash boards and visualization on regular basis using ggplot2 and Tableau.

Environment: R 3.0, Erwin 9.5, Tableau 8.0, MDM, Qlikview, MLLib, PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), Map Reduce, PIG, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.

Confidential

DataAnalyst

Responsibilities:

  • Performeddatacleaning on hugedatasets comprising millions of rows including merging datasets, imputing missingdata, noise and outlier treatment,dataconsolidation
  • Transformed the rawdatainto actionable insights by incorporating various statistical techniques and usingdatamining tools such as Python (Scikit-Learn, Pandas, NumPy, Matplotlib) and SQL
  • Implemented an RFM segmentation model to categorize the customer base into various segments such as most valuable, active and lost, also, to analyze the customer life time value
  • Implemented a classification model (Logistic Regression) to predict the prospective customers based on their age, area, income, time spent on the website per day to get accuracy
  • Conducted ExploratoryDataAnalysis on the customer historical billing information to improve upon the model of forecasting customers increasing or declining product use
  • Extensively used Tableau dashboards for visualizations and Report generations

Environment: Python 2.X, SQL Server 2005 Enterprise, MS Visio, MS-Office, MS Excel, MS PowerPoint, MS Word, Macros, Tableau, Jira, HPQC

We'd love your feedback!