Data Scientist Resume
St Louis, MO
SUMMARY
- Data Scientist with 5 years of experience in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data and expertise working in a variety of industries including Banking, Retail and Agriculture.
- Expert in Data Science process life cycle: Data Acquisition, Data Preparation, Modeling (Feature Engineering, Model Evaluation) and Deployment.
- Equipped with experience in utilizing statistical techniques which include hypothesis testing, Principal Component Analysis (PCA), ANOVA, sampling distributions, chi - square tests, time-series analysis, discriminant analysis, Bayesian inference, multivariate analysis.
- Efficient in preprocessing data including Data cleaning, Correlation analysis, Imputation, Visualization, Feature Scaling and Dimensionality Reduction techniques using Machine learning platforms like Python Data Science Packages (SciKit-Learn, Pandas, NumPy ).
- Expertise in building various machine learning models using algorithms such as Linear Regression, Logistic Regression, Naive Bayes, Support Vector Machines (SVM), Decision trees, KNN, K-means Clustering, Ensemble methods (Bagging, Gradient Boosting).
- Experience in Text mining, Topic modeling, Natural Language Processing (NLP), Content Classification, Sentiment analysis, Market Basket Analysis, Recommendation systems, Entity recognition.
- Applied text pre-processing and normalization techniques, such as tokenization, POS tagging, and parsing. Expertise using NLP techniques (BOW, TF-IDF, Word2Vec) and toolkits such as NLTK.
- Experienced in tuning models using Grid Search, Randomized Grid Search, K-Fold Cross Validation.
- Strong Understanding with artificial neural networks, convolutional neural networks, and deep learning
- Skilled in using statistical methods including exploratory data analysis, regression analysis regularized linear models, time-series analysis, cluster analysis, goodness of fit, Monte Carlo simulation, sampling, cross-validation, ANOVA, A/B testing.
- Working experience in Natural Language Processing (NLP) and Deep understanding of Statistics/Linear Algebra/Calculus and various optimization algorithms like gradient descent.
- Familiar with key data science concepts (statistics, data visualization, machine learning, etc.). Experienced in Python, R, Py Spark programming for statistic and quantitative analysis.
- Knowledge on Time Series Analysis using ARIMA.
- Experience in building production quality and large-scale deployment of applications related to natural language processing and machine learning algorithms.
- Experience with high performance computing (cluster computing on AWS with PySpark) and building real-time analysis with Kibana and Elastic Services. Knowledge using Tableau, and Power BI
- Exposure to AI and Deep learning platforms such as TensorFlow, Keras, AWS ML, Azure ML studio
- Experience working with Big Data tools such as Hive and Apache Spark (PySpark).
- Extensive experience working with RDBMS such as SQL Server, MySQL, and NoSQL databases such as MongoDB.
- Generated data visualizations using tools such as Tableau, Python Matplotlib, Python Seaborn, R.
- Knowledge and experience working in Agile environments including the scrum process and used Project Management tools like Jira and version control tools such as GitHub/Git.
TECHNICAL SKILLS
Data Sources: Oracle, MySQL, Postgres, Amazon Redshift
Statistical Methods: Hypothesis Testing, ANOVA, Principal Component Analysis (PCA), Time Series, Correlation (Chi-square test, covariance), Multivariate Analysis, Bayes Law.
Machine Learning: Linear Regression, Logistic Regression, Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), K-Means Clustering, K-Nearest Neighbors (KNN), Random Forest, Gradient Boosting Trees, Ada Boosting, PCA, LDA, Natural Language Processing
Deep Learning: Artificial Neural Networks, Convolutional Neural Networks, RNN, Deep Learning on AWS
Hadoop Ecosystem: Hadoop, Spark, MapReduce, Hive QL, HDFS, Sqoop, Pig Latin
Data Visualization: Tableau, Python (Matplotlib, Seaborn), R(ggplot2), Power BI
Languages: Python (NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn), R, SQL, Spark, Java, C++
Operating Systems: UNIX Shell Scripting (via PuTTY client), Linux, Windows.
Other tools and technologies: TensorFlow, Keras, AWS ML, GCP, NLTK, MS Office Suite, Google Analytics, GitHub, AWS—(EC2/S3/Redshift/EMR/Lambda)
PROFESSIONAL EXPERIENCE
Confidential, St Louis, MO
Data Scientist
Responsibilities:
- Implemented a model using Neural Networks and Linear Mixed Models to predict the seed rate for a field based on yield, soil properties and region information which increased the crop yield by 5 bushels/acre for each field on an average using Tensor Flow and Sage Maker.
- Created Data pipelines to convert the unstructured data from API’s and AWS S3 buckets to structured data using PySpark and Python, which reduced the data availability from a month to 15 minutes.
- Provided a solution to identify the missing soil information by analyzing the soil data using Geospatial interpolation techniques, which increased the soil coverage for a field from 15% to 80%.
- Implemented a Deep Neural network by using Random Forest for feature selection and provided recommendations for a dealer to select the best hybrid for each region based on the research trails.
- Performed Agglomerative Cluster Analysis on spatial data sets generated by GPS crop growth sensors, fertilizer usage sensors and high-resolution satellite or aerial imaging for dividing the fields in to management zones for applying fertilizer to the crops by grid sampling which reduced the cost of using fertilizer by 30% for a field.
- Automated ETL processes making it easier to wrangle data and reducing time by as much as 40%.
Confidential, Richmond, VA
Data Scientist/ Analyst
Responsibilities:
- Worked on Marketing Campaigns, to identify the customer segmentation and performed A/B Testing and Statistical Analysis to identify the right customer to communicate using data is present in multiple data sources, which helped in increasing the revenue by 20% compared to the previous year .
- Created data pipeline that can extract data from S3 Buckets and transform using data bricks and spark, store it in the data lake, query using HIVE on SQL DB and serve results for~50TB data increasing revenue by $200k/annually, and reducing time to insights by 50%.
- Troubleshoot ETL Failures and performed manual loads using SQL stored procedures.
Confidential, Dekalb, IL
Research Assistant
Responsibilities:
- Worked under a professor in research on Block Chain Technology and Robotic Process Automation and their implementation in various industries and assisting him in teaching.
- Written a paper on evolution of Data Operations and its current practice in different industries.
Confidential
Data Analyst/ Engineer
Responsibilities:
- Designed and developed complex Oracle PL/SQL scripts for business requirements, mappings, stored Procs, Packages and batch programs for Hourly, Biweekly and Monthly.
- Analyzed data to drive the growth of a Retail company and improved forecasting that reduced backorders to retail partners by 17%.
- Collected and cleansed the unstructured data from flat files to structured data for modeling and analytical purposes, which is used for demand forecasting, BI research and for creating visually impactful Tableau dashboards.
- Assisted data scientists by providing data multiple sources like flat files, Oracle and data warehouses for performing optimization techniques on inventory availability which increased the sales performance by 21%.
- Spearheaded in-depth analysis of Inventory Replenishment Optimization that led to a 14% decrease in operating costs.
- Implemented an Optimization solution to reduce unnecessary shipping costs by analyzing the customer location details, which saved the company $322,000 annually.
- Implemented an automation project, which reduced the testing time from 20 days to 5 days using selenium and TestNG.
