We provide IT Staff Augmentation Services!

Data Scientist Resume

Wilmington, OH

SUMMARY

  • Extensive experience in building Data Science solutions using Machine Learning, Statistical Modeling, Data Mining, Natural Language Processing (NLP), and Data Visualization.
  • Theoretical foundations and practical hands - on projects related to (i) supervised learning (linear and logistic regression, boosted decision trees, GBM, Support Vector Machines, neural networks, NLP), (ii) unsupervised learning (clustering, k-means, DBSCAN, Expectation Maximization), dimensionality reduction, recommender systems), (iii) probability & statistics, experiment analysis, principal component and factor analysis, confidence intervals, A/B testing, (iv) algorithms and data structures.
  • Experience in building various machine learning models using algorithms such as Gradient Descent, KNN, Ensembles such as Random Forest, AdaBoost, Gradient Boosting Trees.
  • Experience in Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and Confidential .
  • Experience using Spark and Amazon Machine Learning (AML) to build ML models. Experience with platforms (Azure, and AWS).
  • Expert working within enterprise data warehouse environments platforms. and working within distributed computing platforms such as Hadoop.
  • Demonstrate ability to apply relevant techniques to drive business impact and help with Optimization, causal inference, and choice modeling.
  • Network with business stakeholders to develop a pipeline of data science projects aligned with business strategies. Translate complex and ambiguous business problems into project charters identifying technical risks and project scope.
  • Experienced in agile/iterative development process to drive timely and impactful data science deliverables.
  • Implement statistical and machine learning models, large-scale, cloud-based data processing pipelines, and off-the-shelf solutions for test and evaluation; interpret data to assess algorithm performance.
  • Experience in the software development environment, Agile, and code management/versioning (e.g. Git).
  • Design, train and apply statistics, mathematical models, and machine learning techniques to create scalable solutions for predictive learning, forecasting, and optimization.
  • Develop high-quality, secure code implementing models and algorithms as application programming interfaces or other service-oriented software implementations.
  • Experience working with engineers in designing scalable data science flows and implementing them into production.
  • Excellent communication and presentation skills and ability to explain technical concepts in simple terms to business stakeholders.
  • Experienced in data visualization using Tableau, Weka, Power BI.
  • Extensive hands-on experience in navigating complex relational datasets in both structured and semi-unstructured formats.

TECHNICAL SKILLS

Languages: Python, Confidential, JavaScript, Scala

Packages: pandas, NumPy, Keras, TensorFlow, Seaborn, sciPy, TextBlob, matplotlib, sci-kit-learn, Beautiful Soup, scrapy, spaCy, cv2, Rpy2, Tidyverse, ggplot2, caret, dplyr, Rweka, gmodels, RCurl, C50, Twitter, NLTK, Pattern, Reshape2, rjson, plyr

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Data Modelling Tools: Rational Rose, ER/Studio, MS Visio, SAP Power designer

Big Data Technologies: Hadoop, Hive, HDFS, MapReduce, Pig, Kafka, Spark, Splunk

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra, CosmosDB

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools: Informatica Power Centre, SSIS.

Version Control Tools: SVM, GitHub, Azure Repos

Project Execution Methodologies: Kimball data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD). Agile Scrum Methodology

BI Tools: Tableau, Tableau Server, SAS, Tableau Reader, QlikView

Operating System: Windows, Linux, Unix, Macintosh HD

PROFESSIONAL EXPERIENCE

Confidential, Wilmington, OH

Data Scientist

Responsibilities:

  • Remained highly involved throughout all phases of the project i.e., Project Planning and Problem Definition, Data Engineering, Data Collection, and Analysis, Model Development and Selection, Model Training, Evaluation, and Deployment
  • Gathered requirements from business and reviewing business requirements and analyzing data sources.
  • Performed Data collection, Data Cleaning, features scaling, features engineering, validation, Visualize, interpretation, report findings, and develop strategic uses of data by python libraries like NumPy, Pandas, SciPy, Scikit-Learn.
  • Implemented various statistical techniques to manipulate the data like missing data imputation, principal component analysis, sampling, and t-SNE for visualizing high dimensional data.
  • Worked with Customer Churn Models including Random Forest regression, lasso regression along pre-processing of the data.
  • Explored and visualized the data to get descriptive statistics and inferential statistics for a better understanding of the dataset.
  • Built predictive models including support Vector Machine, Decision tree, Naive Bayes Classifier, Neural Network plus ensemble methods of the models to evaluate how the likelihood to recommend customer groups would change in a different set of services by using pythonscikit-learn.
  • Implemented training process using cross-validation and test sets, evaluated the result based on different performance matrices and collected feedback, and retrained the model to improve the performance.
  • Work with NLTK library to NLP data processing and finding the patterns.
  • Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
  • Use Jupyter Notebook to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • Worked on loading the data from MySQL to HBase where necessary using Sqoop.
  • Strengthen the business and help support clients by using data to describe and model the outcomes of investment and business decisions.
  • Integrating and preparing large, varied datasets, implementing specialized database and computing environments, and communicating results.
  • Used SMOTE sampling technique to deal with the class imbalance in training data.
  • Implemented Tokenization, Embedding, and N-gram vectorization on text data.
  • Build a Sentiment Classification model to predict sentiment for news articles and tweets in real-time.
  • Used Hyper-parameter optimization techniques like Grid-Search, Random-Search, Hyper-opt
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Spark concepts.
  • Encoded and decoded JSON objects using Spark to create and modify the data frames in Apache Spark.
  • Developed Spark applications using Spark and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Divided application into two parts i.e., training and inference. Used Sagemaker, AWS Batch for training section, and AWS Lambda for inference part.
  • Packaged code and required resources into a single deployable package using Shell-Script, Docker Containers, CloudFormation, Python Scripts
  • Participated in multi team’s effort to build an energy-specific language model like BERT from google using Attention models.

Confidential, Minnetonka, MN

Data Scientist

Responsibilities:

  • Develop Reports and Dashboards in Splunk. Utilized Splunk Machine Learning Toolkit for modeling the Clustering model to detection log.
  • Implement Data Exploration to analyze patterns and to select features using SparkSQL and other PySpark libraries.
  • Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Neon, etc.
  • Conducted a hybrid of Hierarchical and K-means Cluster Analysis using IBM SPSS and identified meaningful segments of customers through a discovery approach.
  • Creating Informatica mappings, mapplets, and reusable transformations
  • Work with NLTK library to NLP data processing and finding the patterns. Categorized comments into positive and negative clusters from different social networking sites using Sentiment Analysis and Text Analytics.
  • Analyze traffic patterns by calculating autocorrelation with different time lags.
  • Addressed overfitting by implementing the algorithm regularization methods like L2 and L1.
  • Use Principal Component Analysis in feature engineering to analyse high-dimensional data.
  • Perform Multinomial Logistic Regression, Random Forest, Decision Tree, SVM to classify package is going to deliver on time for the new route.
  • Implemented different models like Logistic Regression, Random Forest, and Gradient-Boost Trees to predict whether a given die will pass or fail the test.
  • Perform data analysis by using Hive to retrieve the data from the Hadoop cluster, SQL to retrieve data from the Oracle database, and used ETL for data transformation.
  • Use MLlib, Spark's Machine learning library to build and evaluate different models.
  • Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Develop MapReduce pipeline for feature extraction using Hive and Pig.
  • Collect data needs and requirements by Interacting with the other departments.
  • Implement Data Exploration to analyze patterns and to select features using SparkSQL and other PySpark libraries
  • Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
  • Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python
  • Build large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL
  • Implemented Pig scripts to convert the data from Avro to Text file format. Created external HIVE tables for analytical querying on the data present in HDFS
  • Querying SQL database for customer production issue resolutions.
  • Construct the models using Statistical techniques and Machine Learning classification models like SVM, and Random Forest

Confidential, Newark NJ

Data Scientist

Responsibilities:

  • Work in Data Requirement analysis for transforming data according to business requirements. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
  • Worked with large amounts of structured and unstructured data. Generated cost-benefit analysis to quantify the model implementation comparing with the former situation.
  • Designed and Developed Scripts, Data Import/Export, Data Conversions and Data Cleansing.
  • Worked on different data formats such as Flat files, SQL files, Databases, XML schema, CSV files.
  • Involved in dimensional modeling, identifying the facts and dimensions. Developed SQL scripts for creating tables, Sequences, Triggers, views and materialized views.
  • Write Python scripts for file transfer and file manipulation.
  • Conduct data analyses of the relationship between customer and client, achieving a 15% more accurate prediction of performance than previous years using Confidential
  • Presented the results in the form of graphs and reports using Tableau
  • Strong hands-on experience with Hadoop Ecosystem such as HDFS and Hive
  • Developed Data Science content involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, and SQL for Data Extraction.
  • Developed Analytical systems, data structures, gather and manipulate data, using statistical techniques.
  • Designing suite of Interactive dashboards, which provided an opportunity to scale and measure the statistics of the HR dept. which was not possible earlier and schedule and publish reports.
  • Provided and created data presentation to reduce biases and telling true story of people by pulling millions of rows of data using SQL and performed Exploratory Data Analysis.
  • Applied breadth of knowledge in programming ( Confidential, Python), Descriptive, Inferential, and Experimental Design statistics, advanced mathematics, and database functionality

Environment: Confidential, Python, HDFS, Hive, Tableau, Spacy, Big Data, Git, NLP & Machine learning Algorithms

Confidential, Bridgewater NJ

Data Scientist

Responsibilities:

  • Built the Decision System by creating the algorithm based on business data.
  • Performed data ETL by collecting, exporting, merging and massaging data from multiple sources
  • Used Optimization Technique Simulated Annealing and Decision Tree ML concepts. Statistical concepts were widely used including Central Limit Theorem, Probability Concept, Probability Distribution.
  • Built Decision Trees in Python to represent segmentation of data & identify key variables to be used in predictive modeling
  • Evaluated models using Cross Validation and Confusion Matrix.
  • Addressed overfitting by implementing cross validation methods like K-Fold.
  • Assisted the business with claims analysis by building Instant Decision Rules with the help of A/B Testing in the Claims Adjudication System
  • Managed project planning and deliverables for several projects focused on Advanced Analytics, Big Data and Digital Analytics streams
  • Provided support in solution development for data science, advanced analytics and digital analytics projects
  • Implemented machine learning techniques and interpreted statistical results which are ready- consumption for senior management and clients
  • Provided pre-sales support to team for RFP’s, RFI’s and client presentations
  • Played a vital role in claims analysis, assisting the business in decision making for Claims Processing
  • Generated reports in case of Decline Claims using Tableau

Hire Now