Software Engineer Resume

SUMMARY:

Python engineer with background in data science and big data. I have 1+ years of hands - on work experience in building Machine learning models, Big data, and Statistical Analysis.
Skilled in Python, R, Jupyter Notebook, Big Data (Spark, Hadoop, Map-Reduce), Machine learning, Statistics, SQL, problem solving, and programming.

SKILLS:

Languages: Python, R, SQL

Python: Scikit-learn, Jupyter, Pandas, Numpy, Matplotlib, Seaborn, Python-Idle, Spider

R: RMD, R-Shiny, RStudio, tidyr, and dplyr, lattice graphics, caret, ggplot2, MASS, Base, grid, cluster, e1071, class, ROCR, random Forest, caret, glmnet, leaps, car

Big data: Spark, Hadoop, MapReduce Statistical Topics, Machine-Learning

Data Wrangling, Exploratory Data Analysis: Inferential Statistics, Descriptive Statistics, Statistical Graphics, Plot Data Analysis, Sampling Error Analysis, Hypothesis testing, A/B testing

Regression: K-Nearest Neighbors (KNN), simple linear regression, multiple linear regression, interaction terms

Classification: nai ve Bayes classifier, classification with KNN, logistic regression, Support vector machines (SVM), tree-based methods (“decision trees”), boosting, bagging and random forests

Unsupervised Learning and Clustering: Principal Component Analysis (PCA), K-means clustering and hierarchical clustering

Model Selection and Regularization: ridge regression, lasso, dimensionality reduction (PCA), stepwise/forward selection ANOVA, Nested Models, Prediction Accuracy Test, Cross validation, Bootstrapping

Tools: /Other: Confidential SPSS Statistics tool, Rest APIs, JSON, CSV, MySQL, Git, IntelliJ, Microsoft Office, Microsoft Excel, Unix

WORK EXPERIENCE:

Software Engineer

Confidential

Responsibilities:

Analyzed the Airline data set using Spark-context, created Resilient Distributed Dataset (RDD) of Airline Dataset
Data exploration using lambda function
Compute the Average distance travelled by flight apply Map-Reduce Operations.
Compute the Average delay using aggregate function.
Create frequency histogram of delays using countByValue and GraphX,
Used Jupyter Notebook to prepare the report

Tools: /Technologies: Python, Spark, Lambda function, Map-Reduce, Jupyter Notebook

Software Engineer Intern

Confidential

Responsibilities:

explore the dataset using numpy, pandas, matplotlip, seaborn, sklearn, scipy libraries visualized the dataset, look for each variable’s distribution using histogram plot a correlation matrix to see any strong relationships between variables using matplotlib and seaborn
Unsupervised learning: random forest & k-nearest neighbor to define a detection of an outlier(fraud) detection method. Used IsolationForest, LocalOutlierFactor libraries
Compare and Fit both the models
Calculate prediction fraud errors and model accuracy and confusion matrix for both the models and concluded that Random forest predicts frauds with better precision compare to k-nearest neighbor
Used Jupyter Notebook to prepare the report

Tools: /Technologies: Python, Spark, Lambda function, Map-Reduce, Jupyter Notebook, numpy, pandas, matplotlip, seaborn, sklearn, scipy, IsolationForest, LocalOutlierFactor, CSV

Confidential

Python engineer

Responsibilities:

Data exploration using pandas
Create density plot for each variable using seaborn library
Developing clustering methods for show and no-show medical appointments:
Principal component Analysis (PCA) using scikit-learn library
Random Forest using scikit-learn and seaborn.
Created confusion matrix and evaluated how well model fitted to the data set
Used cross-validation method to estimate test errors

Tools: /Technologies: Python, Jupyter Notebook, numpy, pandas, seaborn, scikit-learn

Confidential

Python engineer

Responsibilities:

Data exploration - cleaning the data
Performed univariate and unsupervised analysis - principal components analysis (PCA)
Logistic Regression model using multiple predictors as input
Random Forest model using the categorized income and plotted variable importance.
Support Vector Machine (SVM) by choosing various parameters. Plotted variable importance.
K-nearest neighbor - built K-nearest neighbor model
Summarized statistical models in terms of accuracy/error, sensitivity/specificity
Bootstrapping - compared performance of logistic regression, random forest, and SVM models

Tools: /Technologies: R, RMD, cluster, e1071, class, ROCR, caret, ggplot2, PCA, Logistic Regression, Random forest, SVM, KNN, Bootstrapping

Confidential

Python engineer

Responsibilities:

Data exploration: loading, cleaning, and summarize the data
Selected optimal models using exhaustive, forward, and backward selection methods.
Selected optimal set of variables for developing clustering models. Described differences and similarities between attributes deemed important in each case.
Used cross-validation method to estimate test errors with different numbers of variables.
Used Lasso, Ridge regularized approaches. Compared resulting models in terms of number of variables and their effects by regression subset selection and resampling.
Principal Component Analysis (PCA) - merged red and white wine datasets; plotted (biplot and similar plots) data projection for the first two principal components; built PCA model to determine wine quality

Tools: /Technologies - R, RMD; R Libraries - glmnet, leaps, ggplot2, MASS, corrplot, car; Data exploration- exhaustive, forward and backward selection; Cross-Validation; Regularized approaches - lasso and ridge

Confidential

Python engineer

Responsibilities:

Cleaned up the dataset by imputing missing values using the series mean
Checked linear regression model’s assumption. Transformed the data using reflected and logarithmic transformation.
Split the data into training and test sets; developed linear regression model on the training set using predictor and an interaction term
Cross-validated the linear regression model

Tools: /Technologies - Confidential SPSS Statistics tool, Logarithmic Transformation, Linear Regression, Cross-Validation

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship