We provide IT Staff Augmentation Services!

Software Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY:

  • Python engineer with background in data science and big data. I have 1+ years of hands - on work experience in building Machine learning models, Big data, and Statistical Analysis.
  • Skilled in Python, R, Jupyter Notebook, Big Data (Spark, Hadoop, Map-Reduce), Machine learning, Statistics, SQL, problem solving, and programming.

SKILLS:

Languages: Python, R, SQL

Python: Scikit-learn, Jupyter, Pandas, Numpy, Matplotlib, Seaborn, Python-Idle, Spider

R: RMD, R-Shiny, RStudio, tidyr, and dplyr, lattice graphics, caret, ggplot2, MASS, Base, grid, cluster, e1071, class, ROCR, random Forest, caret, glmnet, leaps, car

Big data: Spark, Hadoop, MapReduce Statistical Topics, Machine-Learning

Data Wrangling, Exploratory Data Analysis: Inferential Statistics, Descriptive Statistics, Statistical Graphics, Plot Data Analysis, Sampling Error Analysis, Hypothesis testing, A/B testing

Regression: K-Nearest Neighbors (KNN), simple linear regression, multiple linear regression, interaction terms

Classification: nai ve Bayes classifier, classification with KNN, logistic regression, Support vector machines (SVM), tree-based methods (“decision trees”), boosting, bagging and random forests

Unsupervised Learning and Clustering: Principal Component Analysis (PCA), K-means clustering and hierarchical clustering

Model Selection and Regularization: ridge regression, lasso, dimensionality reduction (PCA), stepwise/forward selection ANOVA, Nested Models, Prediction Accuracy Test, Cross validation, Bootstrapping

Tools: /Other: Confidential SPSS Statistics tool, Rest APIs, JSON, CSV, MySQL, Git, IntelliJ, Microsoft Office, Microsoft Excel, Unix

WORK EXPERIENCE:

Software Engineer

Confidential

Responsibilities:

  • Analyzed the Airline data set using Spark-context, created Resilient Distributed Dataset (RDD) of Airline Dataset
  • Data exploration using lambda function
  • Compute the Average distance travelled by flight apply Map-Reduce Operations.
  • Compute the Average delay using aggregate function.
  • Create frequency histogram of delays using countByValue and GraphX,
  • Used Jupyter Notebook to prepare the report

Tools: /Technologies: Python, Spark, Lambda function, Map-Reduce, Jupyter Notebook

Software Engineer Intern

Confidential

Responsibilities:

  • explore the dataset using numpy, pandas, matplotlip, seaborn, sklearn, scipy libraries visualized the dataset, look for each variable’s distribution using histogram plot a correlation matrix to see any strong relationships between variables using matplotlib and seaborn
  • Unsupervised learning: random forest & k-nearest neighbor to define a detection of an outlier(fraud) detection method. Used IsolationForest, LocalOutlierFactor libraries
  • Compare and Fit both the models
  • Calculate prediction fraud errors and model accuracy and confusion matrix for both the models and concluded that Random forest predicts frauds with better precision compare to k-nearest neighbor
  • Used Jupyter Notebook to prepare the report

Tools: /Technologies: Python, Spark, Lambda function, Map-Reduce, Jupyter Notebook, numpy, pandas, matplotlip, seaborn, sklearn, scipy, IsolationForest, LocalOutlierFactor, CSV

Confidential

Python engineer

Responsibilities:

  • Data exploration using pandas
  • Create density plot for each variable using seaborn library
  • Developing clustering methods for show and no-show medical appointments:
  • Principal component Analysis (PCA) using scikit-learn library
  • Random Forest using scikit-learn and seaborn.
  • Created confusion matrix and evaluated how well model fitted to the data set
  • Used cross-validation method to estimate test errors

Tools: /Technologies: Python, Jupyter Notebook, numpy, pandas, seaborn, scikit-learn

Confidential

Python engineer

Responsibilities:

  • Data exploration - cleaning the data
  • Performed univariate and unsupervised analysis - principal components analysis (PCA)
  • Logistic Regression model using multiple predictors as input
  • Random Forest model using the categorized income and plotted variable importance.
  • Support Vector Machine (SVM) by choosing various parameters. Plotted variable importance.
  • K-nearest neighbor - built K-nearest neighbor model
  • Summarized statistical models in terms of accuracy/error, sensitivity/specificity
  • Bootstrapping - compared performance of logistic regression, random forest, and SVM models

Tools: /Technologies: R, RMD, cluster, e1071, class, ROCR, caret, ggplot2, PCA, Logistic Regression, Random forest, SVM, KNN, Bootstrapping

Confidential

Python engineer

Responsibilities:

  • Data exploration: loading, cleaning, and summarize the data
  • Selected optimal models using exhaustive, forward, and backward selection methods.
  • Selected optimal set of variables for developing clustering models. Described differences and similarities between attributes deemed important in each case.
  • Used cross-validation method to estimate test errors with different numbers of variables.
  • Used Lasso, Ridge regularized approaches. Compared resulting models in terms of number of variables and their effects by regression subset selection and resampling.
  • Principal Component Analysis (PCA) - merged red and white wine datasets; plotted (biplot and similar plots) data projection for the first two principal components; built PCA model to determine wine quality

Tools: /Technologies - R, RMD; R Libraries - glmnet, leaps, ggplot2, MASS, corrplot, car; Data exploration- exhaustive, forward and backward selection; Cross-Validation; Regularized approaches - lasso and ridge

Confidential

Python engineer

Responsibilities:

  • Cleaned up the dataset by imputing missing values using the series mean
  • Checked linear regression model’s assumption. Transformed the data using reflected and logarithmic transformation.
  • Split the data into training and test sets; developed linear regression model on the training set using predictor and an interaction term
  • Cross-validated the linear regression model

Tools: /Technologies - Confidential SPSS Statistics tool, Logarithmic Transformation, Linear Regression, Cross-Validation

We'd love your feedback!