Machine Learning/ Python Data Scientist Resume
New, JerseY
SUMMARY
- 11+ Years of experience in the area of Software Development, evaluating functional requirements, project planning, estimation, design, development, peer code review, testing, deployment and application support.
- Around 3+ years of experience in Machine Learning, Deep Learning, Data Mining with large datasets ofstructured and unstructured data, Data Validation, Data Visualization and Predictive Modelling.
- Domain expertise in building comprehensive analytical solutions in Marketing, Sales and Banking industries
- Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, featuresengineering, statistical modeling, testing and validation and data visualization.
- Proficient in Machine Learning algorithm and Predictive Modeling including Regression Models, Decision Tree, Random Forests, Sentiment Analysis, Naïve Bayes Classifier, SVM, Ensemble Models.
- Proficient in Statistical Methodologies including Hypothetical Testing, ANOVA, Time Series, PrincipalComponent Analysis, Factor Analysis, Cluster Analysis, Discriminant Analysis.
- Experience in machine learning algorithms in R and Python.
- Expertise in Exploratory Data Analysis for featuring engineering and missing values imputation, Outlier detection to reduce the noise in data using R and Python.
- Worked in stats function with Numpy, visualization using Matplotlib and Pandas for organizing data.
- Used Ski - kit packages in Python for predictions.
- Utilize web scraping technique to extract and organize competitor data, used predictive analysis
- (Machine Learning and data mining techniques to forecast).
- Hands on experience in Data visualization with Python & R: Line and scatter plots, Bar Charts, Histograms, Pie chart, Dot charts, Box plots, Time series, Error Bars, Multiple Charts types, Multiple Axes, subplots etc.
- Expert in using SQL queries, JOIN statements and developed Stored Procedures, Triggers, Functions, Packages
- Experience in working with version control systems like Git.
- Containerize the application and deploy into Azure Kubernetes Service via CI/CD.
- Utilized R to identify trends and relationships between different variables and their impact on the response variable. Also draw appropriate conclusions and translate analytical insights into planned strategies
- Expert in automating scripts and writing production quality code in Python, R, SQL
- Extensively used Python libraries like Pandas, NumPy, SciPy, PyMySql, SqlAlchemy, XlsxWriter, Open PyXl, TabPy, Scikit-Learn(sklearn), Seaborn, Pickle.
- Ability to present complex data and analytics to non-analytical audiences.
- Extensive experience in developing online and batch applications.
- Expertise in Performance Tuning and Query Optimization.
- Experience in the database migration from VSAM files to DB2 Tables.
- Good understanding of delivery processes such as Agile & Waterfall
- Extensive experience in testing software products / applications including development of test plans, test cases and execution of tests (Unit, system, regression, functional etc.).
- Good exposure to quality assurance procedures and processes in software product development life cycle - ISO/CMM Level 5.
TECHNICAL SKILLS
Languages: Python, COBOL, PL1, JCL, SQL, REXX
Statistical Language: R
Machine Learning: Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, Natural Language Processing (NLP), RNN,LSTM,Seq2Seq Models, word2vec
Python Libraries / Framework: NumPy, Pandas, SciPy, Sci-kit Learn, Seaborn,PyTorch,Pyspark, Matplotlib, ggplot2, BeatifulSoup4, Pickle, Flask,Keras, Tensorflow
Databases: DB2, IMS/DB, MySQL, SqlServer
OLTP: CICS
Utilities &Tools: Endevor, File-Manager, VSAM, SPUFI, Trace Master, FILEAID, Changeman, Xpeditor, CA7.
Operating System: Windows, MVS/ESA
Platforms: Unix, Windows, MacOS
Statistical Methods: Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Autocorrelation
PROFESSIONAL EXPERIENCE
Confidential, New Jersey
Machine Learning/ Python Data Scientist
Responsibilities:
- Analyze and Prepare data, identify the patterns on the dataset by applying historical models. Collaborating with Senior Data Scientists for understanding of data.
- Perform data manipulation, data preparation, normalization, and predictive modelling. Improve efficiency and accuracy by evaluating models in Python and R.
- This project was focused on customer segmentation based on machine learning and statistical modelling effort including building predictive models and generating data products to support customer segmentation.
- Used Pandas, NumPy, Scikit-Learn in Python for performing exploratory analysis and developing various machine learning models such as Random forest.
- The missing data in the dataset is handled using Imputer method in SkLearn library
- Performed categorical variable analysis using python Label Encoder, fit transform, One Hot Encoder methods in sklearn library
- Responsible for design and development of advanced Python programs to prepare to transform and harmonize data sets in preparation for modeling.
- Defined a generic classification function, which takes a model as input and determines the Accuracy and Cross-Validation scores
- Worked with the sales and Marketing team for Partner and collaborated with a cross-functional team to frame and answer important data questions prototyping and experimentation ML algorithms and integrating into production systems for different business needs.
- Worked on Multiple datasets containing two billion values which are structured and unstructured data about web applications usage and online customer surveys
- Segmented the customers based on demographics using K-means Clustering
- Explored different regression and ensemble models in machine learning to perform forecasting
- Presented Dashboards to Higher Management for more Insights using Power BI
- Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring
- Performed Boosting method on predicted model for the improve efficiency of the model
- Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom, visualization tools using Python, Power BI
- Collaborating with the project managers and business owners to understand their organizational processes and help design the necessary reports
Environment: Python, PyCharm, Jupyter Notebook, Spyder, R, MySQL, Git, Power BI, k-Means clustering, Hierarchical clustering, PCA.
Confidential
Machine Learning / Python -Data Scientist
Responsibilities:
- Involved in gathering requirements while uncovering and defining multiple dimensions.
- Extracted data from one or more source files and Databases.
- Participated in continuous interaction with the Revenue team for obtaining the data and data quality.
- Accomplished multiple tasks from collecting data to organizing data and interpreting statistical information. Utilize web scraping technique to extract and organize competitor data.
- Unearthed the raw data by doing the Exploratory Data Analysis (Classification, splitting, cross-validation).
- Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data. Doing Data Munging and Feature Engineering to convert the raw data into Valuable Insights.
- Utilized various techniques like Histogram, Bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.
- Conducted data exploration to look for trends, patterns, grouping, and deviations in the data to understand the data diagnostics.
- Visualizing Time Series Data to identify Seasonality, Trend or Noise data.
- Checking stationarity by statistical test called ‘Unit Root Test’ like Dickey Fuller Test (ADH Test)
- Manipulate the data with AR,MA and ARMA and Doing Time series forecasting in Python.
- Performed time series forecasting by Keras and Tensorflow.
- Building various models, validating (Training) and Evaluating (Testing) the models.
- Analyzing raw data, drawing conclusions & developing recommendations.
- Performed the Model tuning by Underfitting Vs Overfitting, Regularization, Hyperparameter Tuning and Cross Validation.
- Performed NLP for customer feedback by using nltk, Created Bag of Words Model and Fitting Naïve Bayes. The accuracy has been performed by Confusion Matrix.
- Implemented Recurrent Neural Networks (RNN) in PyTorch, Performed Binary Text Classification Using Words and Multi-Class Text Classification using Characters.
- Performed Sentiment Analysis using Word Embedding and Performed Language Translation using Sequence-to-sequence Models.
- Performed the model persistence by using Pickle and API by Flask.
- Containerize the application and deploy into Azure Kubernetes Service via CI/CD.
- Advice on the suitable methodologies and suggesting improvements. Carrying out specific data processing and statistical techniques. Creating output to explain data analysis, data visualization, and statistical modeling results.
Environment: Python, PyCharm, Jupyter Notebook, NLP, RNN, LSTM, Keras, PyTorch, Seq2Seq, Word2vec, G love, Cookiecutter, Spyder, R, MySQL, Git, Flask, Tensor Flow, Azure Kubernetes Service, Azure Devops, Power BI.
Confidential
COBOL/DB2 developer - Project Leader
Responsibilities:
- Worked with business analyst and product manager for gathering the requirements.
- Perform estimation for the design/implementation/testing of the task by using Client Management Tool BNYM (CMT).
- Identifying the complex business scenarios during the design phase and finding an optimal solution for that.
- Convert the requirements into HLD (High Level Design) and TDD (Technical Design Documents).
- Work with Data Architect & DBA to create the required schemas.
- Responsible to co-ordinate with Business User/Technical/Test Lead as required to ensure quality of deliverables on time.
- Providing guidance and mentorship and knowledge transition to the team members of the project.
- Preparation of status reports, project metrics and tracking of tasks assigned to the team members.
- Involved in reviewing the offshore deliverables.
- Tracking the Systems Integration Testing (SIN) defects and fix those issues on time and ensure its completion before the project delivered to the UAT phase.
- Creation and maintenance of mainframe execution JCL (Jobs) to support system replacement project in User Acceptance (UAT) and Production Environments.
- Performed peer review of the various work completed by the team members based on BNYM coding/testing standard documents to ensure quality of deliverables.
- Interact with region owners to check its connectivity and availability to perform UAT.
- Preparing release note documents for various releases and involved in installation and post deployment of the project.
- Accountable for on-call support to analyze and fix the failed production Batch jobs and involved in enhancement of online processing modules.
- Worked on many root cause analyses of the issues to provide the optimum solution.
- Significantly involved in executing the enhancement of programs for a Data Migration Project.
- Involved in cost saving initiatives/ efforts & performance tuning.
- Worked on Cost reduction project like VSAM to IAM conversion and VSAM to DB2 table conversions.
- Worked on module changes which formats and sends the reconciliation reports/files, to the users or downstream applications for further processing at the end of the day via batch process.
- Preparing the Knowledge documents for future reference in the project.
Environment: COBOL, JCL, DB2, TSO/ISPF, File-Aid, SPUFI, ENDEVOR, XPEDITOR, IDCAMS, IEBGENER, SORT, ICETOOL, CICS, ESP Scheduler.