Sr. Data Scientist Resume
Secaucus, NJ
SUMMARY
- Result - driven IT Professional with referable & strong expertise as Data Scientist with a passion for delivering valuable data through analytical functions, data retrieval methods and implement action-oriented solutions to complex business problems.
- Data Professional with 8+ years of experience in Data Analysis, Machine learning, Artificial Intelligence, Data Visualization, ETL, Data Warehousing, Cloud services and Big Data Ecosystem.
- Demonstrated expertise in all phases of CRISP-DM Methodology includes Business requirement, Data Collection, Data Modeling, Model Development and Model Deployment of Data Science ML Projects.
- Bringing forth the ability to synthesize quantitative information and communicate effectively with Business.
- Ability to analyze unstructured data from various sources like Google Maps API, Yelp, ArcGIS and competitor websites using Python web-scraping techniques.
- Strong experience in Customer Analytics. Collaborated with Operations, Finance, Marketing, CRM and Web Analytics teams on multiple business initiatives by building Machine Learning models.
- Proficient in writing and executing SQL queries in Spark Context and Snowflake.
- Hands on experience in PySpark and creating Dataframes, applying operations like Transformations, Actions and built reports and Data Mining pipelines. Knowledge in Kafka Streaming.
- Experience processing Big Data in Hadoop Architecture, leveraging HDFS Framework and components of its ecosystem like HIVE, Spark and Impala.
- Hands-On working experience by joining multiple data sources like Oracle, Teradata, SQL Server, AWS Redshift and Snowflake during data collection phase as part of model building.
- In-Depth understanding of choosing metrics in case of both Classification and Regression algorithms.
- Experience in leveraging high compute AWS EC2 instances to speed up Feature Selection and Hyper Parameter Tuning stages of Machine Leaning modeling.
- Immense working knowledge in dealing with datasets that possess Linear and Non-Linear relationships. Expert in feature engineering and statistical analyses.
- Hands-on experience in building Time Series Forecasting models using SARIMAX and PROPHET algorithms.
- Ability to generate insights from Data Visualizations using Power BI and Tableau to the business partners.
- Good understanding in Text Processing & Image Processing concepts, Computer Vision & Natural Language Processing algorithms. Knowledge in AWS SageMaker and SparkML services.
- Resilient mindset in problem solving and research capabilities.
TECHNICAL SKILLS
Methodologies: Waterfall, Agile and CRISP-DM
Machine Learning: Linear regression, Logistic Regression, Random Forests, Cross Validation, Naïve Bayes, K-Means Clustering and Model Selection, Feature Selection, Constraint Programming, Lookalike Modeling, Churn Prediction, Hyper-Parameter Tuning, NLP, TF-IDF, CNN and LSTM
BI Tools: Jupyter Notebooks, SAP BEx Analyzer, Business Objects, Microsoft Excel, Tableau, Power BI and ESRI ArcGIS
Programming: Python (Pandas, NumPy, Scikit-Learn, SciPy, Matplotlib, BeautifulSoup, Stats models, PySpark, Keras, Tensorflow, NLTK, Open CV, Skimage, PyTorch and Flask), SQL, HTML, XML and CSS
Cloud: AWS EC2, S3, EMR, Lambda, CloudWatch, Dynamo DB, IAM, Redshift and Snowflake
Databases: MS SQL Server, Oracle DB2, 1010data, Teradata and Dynamo DB, Mongo DB
Big Data Ecosystem: HDFS, Hue, Hive, Spark, Sqoop, Pig and Impala
PROFESSIONAL EXPERIENCE
Confidential - Secaucus, NJ
Sr. Data Scientist
Responsibilities:
- Incorporated ESRI ArcGIS data variables to build a XGBoost machine learning model to predict annual store sales for New prospect locations in the Country.
- Achieved an R2 of 79.05 against Consultant’s solution with 30% improvement in Cross Validation performance.
- Replaced Consulting firm’s solution with in-house model, which was $100K cost to company.
- Feature engineered walk score, bike score and livability score variables for each location of the Store. Utilized web scraping techniques to extract and analyze Competition data.
- Satellite images from Google Maps API as a source data to develop an alternative computer vision neural network model using Python and Keras to classify worst/average/best performing categories which also complement the current In-House model.
Environment: Python, Scikit-Learn, Pandas, NumPy, Matplotlib, re, BeautifulSoup, ESRI ArcGIS, Keras, OpenCV, Skimage, Google maps API, SQL, Snowflake, AWS EC2, AWS IAM, Flask, Shell, Linux, MS Excel
Confidential - Secaucus, NJ
Sr. Data Scientist
Responsibilities:
- Been part of organizational finance sales forecast consensus meeting with executives every quarter.
- Aggregated Terabytes of Transaction data from the data lake using Spark SQL API for each channel.
- Developed Time-series forecasting machine learning models for every channel (B&M, Web, & ADP) using Python and Stats models.
- Selected a Parsimonious Model by iterating between Prophet and SARIMAX algorithms, tracking lowest Information Criteria from metrics like AIC and BIC Scores.
Environment: Python, Pandas, NumPy, Matplotlib, Stats models, Prophet, MS Excel, SparkConfidential - Secaucus, NJ
Sr. Data Scientist
Responsibilities:
- Developed critical reports at Sales, Customer and SKU level, to make informed decisions, for Store-in-Store business initiative. Capitalized Snowflake for faster data retrieval.
- Devised monthly ADP incremental analysis report, tracking metrics like customer penetration, subscription cancellation and probability of being active in ADP for both Store and Web channels.
- Leveraged AWS EC2 instances to speed up the feature selection process in the Machine Learning model building pipeline.
- Generated leads using Google Maps API’s for the Operations team to follow up with new business initiative on Small commercial businesses in the country.
- Created dashboards to analyze customer shopping habits and sales transfer for the closed stores using advanced SQL querying and visualization tools like Power BI.
Confidential - Secaucus, NJ
Sr. Data Scientist
Responsibilities:
- Restructured labor schedules for ALL B&M stores by combining Constraint programming with Genetic algorithms to code hard and soft constraints to build an Optimization machine learning ‘model.
- Analyzed an estimated ROI of $16M annually by placing right talent in right selling time intervals and reducing labor from least performing stores.
- Overhauled and automated the end-to-end Payroll process starting from schedule generation to organizing until emailing them to store, district and regional managers, which saved 100s of man-hours.
- Deployed Machine Learning model using Flask on AWS EC2 Instance for Ops Team to create schedules for stores.
- Integrated Employees data from Kronos, Foot Traffic IOT data from Retail Next in Spark using PySpark.
- Collaborated with business stakeholders from Operations, Finance and Business Intelligence teams on the development of the model which increased productivity and cut unnecessary costs.
Environment: Python, Scikit-Learn, Pandas, NumPy, Matplotlib, PySchedule, SQL, Snowflake, AWS EC2, AWS IAM, Flask, Shell, Linux, MS Excel, Spark
