We provide IT Staff Augmentation Services!

Data Scientist / Machine Learning Engineer Resume

3.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • Data Scientist with 7 years of experience executing data - driven solutions with adept knowledge on Data Analytics , Text Mining , Machine Learning ( ML ), Predictive Modelling , and Natural Language Processing ( NLP )
  • Proficient in Deep learning and Artificial Neural Networks for NLP such as Convolution Neural Networks, Recursive Neural Networks and Recurrent Neural Networks
  • Highly competent at wide varieties of Data Science programming languages and Big Data tools such as Python, R, SQL, Tableau, Sci-kit Learn, Hadoop, Spark, and Hive
  • Built models with TensorFlow and acquainted with top-level frameworks such as Keras, Theano and PyTorch
  • Processed large data sets in Python with libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations to ready the data for ML algorithms
  • Developed predictive models using Random Forest, Boosted Trees, Naïve Bayes, SVM, Logistic Regression, and Neural Networks
  • Experience with relational and non-relational databases such as MySQL, SQLite, MongoDB, and Cassandra. Implemented POC using Spark SQL and Mlib libraries
  • Concrete mathematical background in Statistics , Probability , Differentiation and Integration, and Linear Algebra and Geometry
  • Acquainted with all aspects of Software Development Lifecycle (SDLC) from requirement analysis, Design, Building Coding, Testing, Deployment, and Maintenance in both Agile and Waterfall methodologies
  • Strong technical expertise at creating advanced machine learning algorithms , validation techniques , predictive data modeling , data mining algorithms , topic modeling , and sentiment analysis on the text data to provide new dimensions to the conventional thoughts on businesses
  • Extremely organized with demonstrated skills to perform several tasks and assignments simultaneously within the scheduled time
  • Interpreted and communicated the use cases , patterns , results , validity , and metrics to other teams and management in a visual and plausible way

TECHNICAL SKILLS

LANGUAGES: Python | R

DATABASE: SQL | NoSQL | Microsoft Access | Oracle | Teradata

BIG DATA TECHNOLOGIES: Hadoop | MapReduce | Hive | Pig | Kafka | Spark | Sqoop

TOOLS AND UTILITIES: Jupyter | GIT | RStudio | Tableau | PyCharm | Spyder | Visual Studio | PostgreSQL | MySQL | SQLite | Microsoft SQL Server | MongoDB | Cassandra | Neo4j | JSON | MS Access | Mlib | | Redshift | HBase | SQL Server Management Studio (SSMS) | SQL Server Reporting Services (SSRS) | SQL Server Integration Services (SSIS) | Crystal Reports | Excel Power Pivot

MACHINE LEARNING: Logistic Regression | Linear Regression | Support Vector Machines | Decision Trees | Random Forests | Ensemble Models | K-Nearest Neighbors | Gradient Boost | Naïve Bayes | K-Means Clustering | Hierarchical Clustering | Density Based Clustering | Gaussian Mixtures | Principal Component Analysis | Natural Language Processing (NLP)

DEEP LEARNING: Artificial Neural Networks | Convolutional Neural Networks | Multi-Layer perceptron | Recursive Neural Networks | Recurrent Neural Networks | LSTM | GRU | SoftMax Classifier | Back Propagation | Chain Rule | Dropout

LIBRARIES: NumPy | SciPy | Pandas | Scikit-learn | Theano | TensorFlow | Keras | PyTorch | Caret | Statsmodel | XGBoost | NLTK | dplyr | nnet | Glmnet | H2O | mboost | MATLAB Neural Network Toolbox | MATLAB Signal and Image Processing Toolbox

GRAPH VISUALIZATION: Tableau | Seaborn | Plotly | ggplot2 | Graphviz | Qlik View | Geoplotlib

CLOUD SERVICES: Azure

METHODOLOGIES: Agile | Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Data Scientist / Machine Learning Engineer

Responsibilities:

  • Built advanced Machine Learning classification models like XG Boost , KNN , SVM Regression and clustering algorithms Hierarchical Clustering and DBSCAN
  • Performed NLP by using techniques like Word2Vec, FastText, Bag of Words, tf-idf, Doc2Vec
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and load data into HDFS
  • Created distributed environment of TensorFlow across multiple devices (CPU’s and GPU’s) and run them in parallel
  • Created various types of data visualizations using Tableau and other libraries like Matplotlib, Seaborn, ggplot2
  • Collaborated with internal business partners to identify needs, recommended improvements

Tools: used: Python | Scikit-Learn | Git | NLP | HDFS | Teradata | Hive | Spark | MapReduce | Keras | TensorFlow | Oracle DB | SQL Server | PySpark | NumPy | Pandas | Tableau | Matplotlib | Seaborn

Confidential, Atlanta, GA

Data Scientist

Responsibilities:

  • Performed Sentimental analysis in NLP on the email feedback of the customers to determine the tone behind the series of words by Neural Networks techniques like Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN)
  • Deployed Keras and TensorFlow for NLP implementation and trained using cyclic learning rate schedule
  • Used Long-Short Term Memory (LSTM) for analyzing time series data in PyTorch
  • Utilized t-Stochastics Neighborhood Embedding (t-SNE) and Principal Component Analysis ( PCA ) and to deal with curse of dimensionality
  • Created Neo4j graph visualizations of data flow from source to product and performed data lineage searches
  • Proficient at statistical metrics like F-Score , AUC/ROC , Confusion Matrix and RMSE to evaluate different model performance
  • Generated ETL mappings, sessions and workflows based on business user requirements to stack data from source files, RDBMS tables to target tables
  • Extensively used HiveQL and Spark SQL query to extract the meaningful data and administered to external Hive Table
  • Implemented CRUD functionalities of the API to handle query requests from Neo4j database
  • Performed data visualization using various libraries and designed dashboards with Tableau , generated complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders

Tools: used: Python | SQL | AWS | PyTorch | NLP | Spark SQL | | Oracle Database | Keras | TensorFlow | ETL | NumPy | SciPy | Pandas | Matplotlib | Seaborn | Scikit-Learn | Neo4j | Tableau | Seaborn

Confidential

Sr. Data Analyst

Responsibilities:

  • Developed a Churn model for the marketing team to reduce the retention rate of customers
  • Analyzed very large data sets to develop insights that increase traffic monetization and merchandise sales without compromising shopper experience
  • Built a Proof of Concept (POC) by researching the user behavior and historical trends and developed a fraud detection model strategy using Random Forests and Decision Trees
  • Experimented with multiple classification algorithms, such as Logistic Regression, Support Vector Machine ( SVM ), Random Forest, Ada boost and Gradient boosting using Python, Scikit-Learn and evaluated the performance on customer discount optimization
  • Analyzed data to identify glitches and cleaning it to reduce the distortions
  • Applied multiple Machine Learning ( ML ) and Data Mining techniques to improve the quality of product ads and personalized recommendations
  • Developed NLP with Deep Learning algorithms for analyzing text improving over their existing dictionary-based approaches
  • Addressed overfitting by implementing regularization methods like L1 and L2 in algorithms
  • Collaborated in building of high-performance low latency system to manage high velocity data streams
  • Employed statistical tests such as hypothesis testing, t-test, confidence intervals, error measurements
  • Performed various data manipulation techniques in statistical analysis like missing data imputation, indexing, merging, and sampling
  • Performed Exploratory Data Analysis ( EDA ) to maximize insight into the dataset, detect the outliers and extract important variables numerically and graphically
  • Worked with Hadoop Ecosystem covering HDFS , HBase , YARN and MapReduce
  • Developed Hive UDF's to bring all the customers emails into a structured format
  • Created different charts such as Heatmaps, Bar charts, Line charts
  • Worked in creating different visualizations in Tableau using Bar charts, Line charts, Pie charts, Maps, Scatter Plot charts, and Table reports

Tools: Used: Python | NumPy | Pandas | Matplotlib | Seaborn | Scikit-Learn | Tableau | NLP | Neural Networks | SAP DB | Oracle Database | SQL | AWS | HDFS | Git | Excel | PySpark-ML | Theano

We'd love your feedback!