We provide IT Staff Augmentation Services!

Sr. Data Scientist Resume

5.00/5 (Submit Your Rating)

Secaucus, NJ

SUMMARY

  • Result - driven IT Professional with referable & strong expertise as Data Scientist with a passion for delivering valuable data through analytical functions, data retrieval methods and implement action-oriented solutions to complex business problems.
  • Data Professional with 8+ years of experience in Data Analysis, Machine learning, Artificial Intelligence, Data Visualization, ETL, Data Warehousing, Cloud services and Big Data Ecosystem.
  • Demonstrated expertise in all phases of CRISP-DM Methodology includes Business requirement, Data Collection, Data Modeling, Model Development and Model Deployment of Data Science ML Projects.
  • Bringing forth the ability to synthesize quantitative information and communicate effectively with Business.
  • Ability to analyze unstructured data from various sources like Google Maps API, Yelp, ArcGIS and competitor websites using Python web-scraping techniques.
  • Strong experience in Customer Analytics. Collaborated with Operations, Finance, Marketing, CRM and Web Analytics teams on multiple business initiatives by building Machine Learning models.
  • Proficient in writing and executing SQL queries in Spark Context and Snowflake.
  • Hands on experience in PySpark and creating Dataframes, applying operations like Transformations, Actions and built reports and Data Mining pipelines. Knowledge in Kafka Streaming.
  • Experience processing Big Data in Hadoop Architecture, leveraging HDFS Framework and components of its ecosystem like HIVE, Spark and Impala.
  • Hands-On working experience by joining multiple data sources like Oracle, Teradata, SQL Server, AWS Redshift and Snowflake during data collection phase as part of model building.
  • In-Depth understanding of choosing metrics in case of both Classification and Regression algorithms.
  • Experience in leveraging high compute AWS EC2 instances to speed up Feature Selection and Hyper Parameter Tuning stages of Machine Leaning modeling.
  • Immense working knowledge in dealing with datasets that possess Linear and Non-Linear relationships. Expert in feature engineering and statistical analyses.
  • Hands-on experience in building Time Series Forecasting models using SARIMAX and PROPHET algorithms.
  • Ability to generate insights from Data Visualizations using Power BI and Tableau to the business partners.
  • Good understanding in Text Processing & Image Processing concepts, Computer Vision & Natural Language Processing algorithms. Knowledge in AWS SageMaker and SparkML services.
  • Resilient mindset in problem solving and research capabilities.

TECHNICAL SKILLS

Methodologies: Waterfall, Agile and CRISP-DM

Machine Learning: Linear regression, Logistic Regression, Random Forests, Cross Validation, Naïve Bayes, K-Means Clustering and Model Selection, Feature Selection, Constraint Programming, Lookalike Modeling, Churn Prediction, Hyper-Parameter Tuning, NLP, TF-IDF, CNN and LSTM

BI Tools: Jupyter Notebooks, SAP BEx Analyzer, Business Objects, Microsoft Excel, Tableau, Power BI and ESRI ArcGIS

Programming: Python (Pandas, NumPy, Scikit-Learn, SciPy, Matplotlib, BeautifulSoup, Stats models, PySpark, Keras, Tensorflow, NLTK, Open CV, Skimage, PyTorch and Flask), SQL, HTML, XML and CSS

Cloud: AWS EC2, S3, EMR, Lambda, CloudWatch, Dynamo DB, IAM, Redshift and Snowflake

Databases: MS SQL Server, Oracle DB2, 1010data, Teradata and Dynamo DB, Mongo DB

Big Data Ecosystem: HDFS, Hue, Hive, Spark, Sqoop, Pig and Impala

PROFESSIONAL EXPERIENCE

Confidential - Secaucus, NJ

Sr. Data Scientist

Responsibilities:

  • Incorporated ESRI ArcGIS data variables to build a XGBoost machine learning model to predict annual store sales for New prospect locations in the Country.
  • Achieved an R2 of 79.05 against Consultant’s solution with 30% improvement in Cross Validation performance.
  • Replaced Consulting firm’s solution with in-house model, which was $100K cost to company.
  • Feature engineered walk score, bike score and livability score variables for each location of the Store. Utilized web scraping techniques to extract and analyze Competition data.
  • Satellite images from Google Maps API as a source data to develop an alternative computer vision neural network model using Python and Keras to classify worst/average/best performing categories which also complement the current In-House model.

Environment: Python, Scikit-Learn, Pandas, NumPy, Matplotlib, re, BeautifulSoup, ESRI ArcGIS, Keras, OpenCV, Skimage, Google maps API, SQL, Snowflake, AWS EC2, AWS IAM, Flask, Shell, Linux, MS Excel

Confidential - Secaucus, NJ

Sr. Data Scientist

Responsibilities:

  • Been part of organizational finance sales forecast consensus meeting with executives every quarter.
  • Aggregated Terabytes of Transaction data from the data lake using Spark SQL API for each channel.
  • Developed Time-series forecasting machine learning models for every channel (B&M, Web, & ADP) using Python and Stats models.
  • Selected a Parsimonious Model by iterating between Prophet and SARIMAX algorithms, tracking lowest Information Criteria from metrics like AIC and BIC Scores.

Environment: Python, Pandas, NumPy, Matplotlib, Stats models, Prophet, MS Excel, SparkConfidential - Secaucus, NJ

Sr. Data Scientist

Responsibilities:

  • Developed critical reports at Sales, Customer and SKU level, to make informed decisions, for Store-in-Store business initiative. Capitalized Snowflake for faster data retrieval.
  • Devised monthly ADP incremental analysis report, tracking metrics like customer penetration, subscription cancellation and probability of being active in ADP for both Store and Web channels.
  • Leveraged AWS EC2 instances to speed up the feature selection process in the Machine Learning model building pipeline.
  • Generated leads using Google Maps API’s for the Operations team to follow up with new business initiative on Small commercial businesses in the country.
  • Created dashboards to analyze customer shopping habits and sales transfer for the closed stores using advanced SQL querying and visualization tools like Power BI.

Confidential - Secaucus, NJ

Sr. Data Scientist

Responsibilities:

  • Restructured labor schedules for ALL B&M stores by combining Constraint programming with Genetic algorithms to code hard and soft constraints to build an Optimization machine learning ‘model.
  • Analyzed an estimated ROI of $16M annually by placing right talent in right selling time intervals and reducing labor from least performing stores.
  • Overhauled and automated the end-to-end Payroll process starting from schedule generation to organizing until emailing them to store, district and regional managers, which saved 100s of man-hours.
  • Deployed Machine Learning model using Flask on AWS EC2 Instance for Ops Team to create schedules for stores.
  • Integrated Employees data from Kronos, Foot Traffic IOT data from Retail Next in Spark using PySpark.
  • Collaborated with business stakeholders from Operations, Finance and Business Intelligence teams on the development of the model which increased productivity and cut unnecessary costs.

Environment: Python, Scikit-Learn, Pandas, NumPy, Matplotlib, PySchedule, SQL, Snowflake, AWS EC2, AWS IAM, Flask, Shell, Linux, MS Excel, Spark

We'd love your feedback!