Data Scientist / Machine Learning Engineer Resume
Herndon, VA
SUMMARY:
- Professional Data Scientist/Data Analyst with 4+ years of experience in Data Science and Analytics including Machine Learning /Deep Learning/Datamining and Statistical Analysis.
- Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling, and data visualization with large data sets of structured and unstructured data.
- Experienced with machine learning algorithm such as logistic regression, Ensemble methods, XGBoost, KNN, SVM, neural network, linear regression and Clustering algorithms like k - means, DBSCAN etc.
- Involved with Recommendation Systems such as Collaborative filtering and content-based filtering.
- Deep understanding of state-of-the-art machine learning and deep learning algorithms, techniques and best practices.
- Experienced with Convolutional Neural Networks (CNN’s) and Recurrent neural networks (RNN’s).
- Experienced with developing, implementing, debugging and extending machine learning algorithms including shallow and deep learning.
- Experienced applying machine learning and deep learning techniques to build models and analyze large scale data .
- Strong skills in Linear algebra, Probability theory, and Calculus.
- Implemented Bagging and Boosting to enhance the model performance.
- Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
- Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0, Jupyter Notebook 4.X, R 3.0 (ggplot2, Caret, matplotlib).
- Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like MySQL, NoSQL databases like MongoDB.
- Developed API libraries and coded business logic using C#, XML and designed web pages using .NET framework, PHP, Python, Django, HTML, AJAX, Angular.Js, Node.JS.
- Strong experience in Image Recognition and Big Data technologies like Spark1.6, Spark SQL, pySpark, Hadoop 2.X, HDFS, Hive.
- Experience in visualization tools like Tableau 9.X, 10.X for creating dashboards.
- Excellent understanding of Agile and Scrum development methodology.
- Used the version control tools like Git 2.X and build tools like Apache Maven/Ant.
- Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making.
- Experienced the full software life cycle in SDLC, Agile, DevOps and Scrum methodologies including creating requirements, test plans.
- Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.
- Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.
- Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Social Network Analysis, Cluster Analysis, and Neural Networks.
- Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.
- Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, NumPy, SciPy and Pandas for data analysis.
- Worked with complex applications such as R, Python, Theano, TensorFlow, H20, Keras, MATLAB to develop a neural network, cluster analysis.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across the massive volume of structured and unstructured data.
- Strong C/C++, SQL programming skills, with experience in working with functions, packages and triggers.
- Extensively worked on Python 3.5/2.7 (NumPy, Pandas, Matplotlib, NLTK, and Scikit-learn).
- Strong proficiency in JavaScript and good Knowledge of Node.js such as Express etc.
- Worked with No of SQL Database including MongoDB, HBase and Cassandra.
- Experienced in Visual Basic for Applications and VB programming languages C#, .NET framework to work with developing applications.
- Experienced in Big Data with Hadoop, HDFS, MapReduce, and Spark.
- Experienced in Data Integration Validation and Data Quality controls for ETL process.
- Proficient in Tableau, Adobe Analytics and R-Shiny data visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
- Automated recurring reports using SQL and Python and visualized them on BI platform like Tableau.
- Worked in development environments like Git and VM.
- Ability to maintain a fun, casual, professional and productive team atmosphere.
- Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.
PROFESSIONAL EXPERIENCE:
Confidential, Herndon, VA
Data Scientist / Machine Learning Engineer
Responsibilities:
- Implemented Machine Learning, Computer Vision, Deep Learning and Neural Networks algorithms using TensorFlow, Keras and designed Prediction Model using Data Mining Techniques with help of Python, and Libraries like NumPy, SciPy, Matplotlib, Pandas, Scikit-learn.
- Used pandas, NumPy, Seaborne, SciPy, matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.
- Worked with text feature engineering techniques like n-grams, TF-IDF, word2vec etc.
- Applied Support vector machines (SVM) and it’s kernels such Polynomial, RBF-kernel on machine learning problems.
- Worked on imbalanced datasets and used the appropriate metrics while working on the imbalanced datasets.
- Worked with deep neural networks and Convolutional Neural Networks (CNN’s) and Recurrent Neural networks (RNN’s).
- Developed low-latency applications and interpretable models using machine learning algorithms.
- Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.
- Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas).
- Implemented Classification using supervised algorithms like Logistic Regression, SVM, Decision trees, KNN, Naive Bayes.
- Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.
- Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7
- Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
- Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
- Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
- Implemented Agile Methodology for building an internal application.
- Data Manipulation and Aggregation from a different source using Nexus, Toad, Business Objects, Powerball and Smart View.
- Interaction with Business Analyst, SMEs, and other Data Architects to understand Business needs and functionality for various project solutions.
- Researched, evaluated, architected, and deployed new tools, frameworks, and patterns to build sustainable Big Data platforms for the clients.
- Data transformation from various resources, data organization, features extraction from raw and stored.
Environment: Python, MLlib, regression, PCA, T-SNE, Cluster analysis, SQL, Scala, NLP, Spark, Kafka, Mongo DB, logistic regression, Hadoop, PySpark, CNN’s, RNN’s, Oracle 12c, Netezza, MySQL Server, SSRS, T-SQL, Tableau, Teradata, random forest, OLAP, Azure, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS, Linux.
Confidential, Woodbury, New York
Machine Learning Engineer
Responsibilities:
- Performed Data Profiling to learn about behavior with various features of USMLE examinations of various student patterns using Tableau, Adobe Analytics, and Python Matplotlib.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elastic Search, Kibana etc.
- Addressed overfitting by implementing the algorithm regularization methods like L2 and L1 and dropouts in neural networks.
- Implemented statistical modeling with XGBoost machine learning software package using Python to determine the predicted probabilities of each model.
- Worked with different performance metrics like log-loss, AUC, confusion matrix, f1-score for classification and mean square error, mean absolute error for regression problems.
- Worked with text feature engineering techniques n-grams, TF-IDF, word2vec etc.
- Created master data for modeling by combining various tables and derived fields from client data and students LORs, essays and various performance metrics.
- Formulated a basis for variable selection and Grid Search, KFold for optimal hyperparameters.
- Utilized Boosting algorithms to build a model for predictive analysis of student’s behavior who took USMLE exam apply for residency.
- Used NumPy, SciPy, pandas, NLTK (Natural Language Processing Toolkit), matplotlib to build the model.
- Extracted data from HDFS using Hive, Presto and performed data analysis using Spark with Scala, pySpark, Redshift, and feature selection and created nonparametric models in Spark.
- Application of various Artificial Intelligence (AI)/machine learning algorithms and statistical modeling like decision trees, text analytics, Image and Text Recognition using OCR tools like Abbyy, natural language processing (NLP), supervised and unsupervised, regression models.
- Used Principal Component Analysis and T-SNE in feature engineering to analyze high dimensional data.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python and build models using deep learning frameworks.
- Created deep learning models using TensorFlow and Keras by combining all tests as a single normalized score and predict residency attainment of students.
- Used OnevsRest Classifier to fit each classifier against all other classifiers and used it on multiclass classification problems.
- Implemented application of various machine learning algorithms and statistical modelings like Decision Tree, Text Analytics, Sentiment Analysis, Naive Bayes, Logistic Regression and Linear Regression using Python to determine the accuracy rate of each model.
- Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior with cloud-based products like Azure ML Studio and Dataiku.
- Generated various models by using different machine learning and deep learning frameworks and tuned the best performance model using Signal Hub and AWS Sage Maker/Azure Data bricks.
Environment: Python 2.x,3.x, Sickie-learn, NumPy, Pandas, SciPy, Dask, Hive, AWS, Linux, Tableau Desktop, Microsoft Excel, NLP, Deep learning frameworks such as Tensor Flow, Keras, Boosting algorithms etc.
Confidential
Data Scientist/Machine Learning Engineer
Responsibilities:
- Extracted the data from hive tables by writing efficient Hive queries.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Application of various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using Scikit-learn package in python, MATLAB.
- Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks such as Caffe, Keras etc.
- Developed Spark/Scala, Python, R for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources. Used clustering technique K-Means to identify outliers and to classify unlabeled data.
- Evaluated models using Cross Validation, Log loss function, ROC curves and used AUC for feature selection and elastic technologies like Elasticsearch, Kibana etc.
- Worked with NLTK library to NLP data processing and finding the patterns.
- Addressed overfitting by implementing the algorithm regularization methods like L2 and L1.
- Used Principal Component Analysis in feature engineering to analyze high dimensional data.
- Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation.
- Used MLlib, Spark's Machine learning library to build and evaluate different models.
- Developed MapReduce pipeline for feature extraction using Hive and Pig.
- Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
- Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Create various types of data visualizations using Python and Tableau.
- Communicated the results with operations team for taking the best decisions.
Environment: Python, SQL, SQL Server, SSRS, PL/SQL, T-SQL, Tableau, MLlib, regression, Cluster analysis, Scala, NLP, Spark, Kafka, MongoDB, logistic regression, Hadoop, PySpark, Teradata, random forest, OLAP, Azure, MariaDB, SAP CRM, HDFS, ODS, NLTK, SVM, JSON, Tableau, XML, Cassandra, MapReduce, AWS.
Confidential
Data Analyst
Responsibilities:
- Gathered business requirements, definition, and design of the data sourcing, worked with the data warehouse architect on the development of logical data models.
- Collaborated with Data Engineers to filter data as per the project requirements.
- Implemented various statistical techniques to manipulate the data (missing data imputation, principal component analysis, and sampling).
- Applied different dimensionality reduction techniques like principal component analysis (PCA) and t-stochastic neighbor embedding (t-SNE) on feature matrix.
- Identified outliers and inconsistencies in data by conducting exploratory data analysis (EDA) using python NumPy and Seaborn to see the insights of data and validate each feature.
- Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
- Performed feature engineering including feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
- Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN.
- Used clustering techniques like DBSCAN, K-means, K-means++ and Hierarchical clustering for customer profiling to design insurance plans according to their behavior pattern.
- Used Grid Search and Random Search to evaluate the best hyper-parameters for my model and K-fold cross-validation technique to train my model for best results.
- Worked with Customer Churn Models including Random forest regression, lasso regression along with pre-processing of the data.
- Designed rich data visualizations to model data into human-readable form with Matplotlib.
Environment: Python, Agile, SQL Server, SOA, SSIS, Pandas, NumPy, SSRS, ETL, UNIX, Neural Networks, Scikit-learn.