shawneil/hackathon
Viewer โข Updated โข 75k โข 16
Multimodal deep learning model for predicting Amazon product prices from images, text, and metadata
| Metric | Value | Benchmark |
|---|---|---|
| SMAPE | 36.5% | Top 3% (Competition) |
| MAE | $5.82 | -22.5% vs baseline |
| MAPE | 28.4% | Industry-leading |
| Rยฒ | 0.847 | Strong correlation |
| Median Error | $3.21 | Robust predictions |
Training Data: 75,000 Amazon products
Architecture: CLIP ViT-L/14 + Enhanced Multi-head Attention + 40+ Features
Parameters: 395M total, 78M trainable (19.8%)
pip install torch torchvision open_clip_torch peft pillow
pip install huggingface_hub datasets transformers
from huggingface_hub import hf_hub_download
import torch
# Download model checkpoint
model_path = hf_hub_download(
repo_id="shawneil/Amazon-ml-Challenge-Model",
filename="best_model.pt"
)
# Load model (see GitHub repo for complete model definition)
model = OptimizedCLIPPriceModel(clip_model)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()
from PIL import Image
import open_clip
import torch
# Load CLIP processor
clip_model, _, preprocess = open_clip.create_model_and_transforms(
'ViT-L-14', pretrained='openai'
)
tokenizer = open_clip.get_tokenizer('ViT-L-14')
# Prepare inputs
image = Image.open("product_image.jpg")
image_tensor = preprocess(image).unsqueeze(0)
text = "Premium Organic Coffee Beans, 16 oz, Medium Roast"
text_tokens = tokenizer([text])
# Extract 40+ features (see feature engineering guide)
features = extract_features(text) # Your feature extraction function
features_tensor = torch.tensor(features).unsqueeze(0)
# Predict price
with torch.no_grad():
predicted_price = model(image_tensor, text_tokens, features_tensor)
print(f"Predicted Price: ${predicted_price.item():.2f}")
Product Image (512ร512) โโโ
โโโ> CLIP Vision (ViT-L/14) โโโ
Product Text โโโโโโโโโโโโโโผโโ> CLIP Text Transformer โโโโค
โ โโโ> Feature Attention โโ> Enhanced Head โโ> Price
40+ Features โโโโโโโโโโโโโโ โ (Self-Attn + Gate) (Dual-path +
(Quantities, Categories, โ Cross-Attn)
Brands, Quality, etc.) โ
{
"epochs": 3,
"batch_size": 32,
"gradient_accumulation": 2,
"effective_batch_size": 64,
"learning_rate": {
"vision": 1e-6,
"text": 1e-6,
"head": 1e-4
},
"optimizer": "AdamW (betas=(0.9, 0.999), weight_decay=0.01)",
"scheduler": "CosineAnnealingLR with warmup (500 steps)",
"gradient_clip": 0.5,
"mixed_precision": "fp16"
}
Total Loss = 0.05รHuber + 0.05รMSE + 0.65รSMAPE +
0.15รPercentageError + 0.05รWeightedMAE + 0.05รQuantileLoss
Where:
- SMAPE: Primary competition metric (65% weight)
- Percentage Error: Relative error focus (15%)
- Huber: Robust regression (ฮด=0.8)
- Weighted MAE: Price-aware weighting (1/price)
- Quantile: Median regression (ฯ=0.5)
- MSE: Standard regression baseline
| Category | % of Data | SMAPE | MAE | Best Range |
|---|---|---|---|---|
| Food & Beverages | 40% | 34.8% | $5.12 | $5-$25 |
| Electronics | 15% | 39.1% | $8.94 | $25-$100 |
| Beauty | 20% | 35.6% | $4.87 | $10-$50 |
| Health | 15% | 37.3% | $6.24 | $15-$40 |
| Spices | 5% | 33.2% | $3.91 | $5-$15 |
| Other | 5% | 42.7% | $7.18 | Varies |
Best Performance: Low to mid-price items ($5-$50) covering 88% of products
| Version | Date | SMAPE | Changes |
|---|---|---|---|
| v2.0 | 2025-01 | 36.5% | Enhanced features + architecture |
| v1.0 | 2025-01 | 45.8% | Baseline with 17 features |
| v0.1 | 2024-12 | 52.3% | CLIP-only (frozen) |
@misc{rodrigues2025amazon,
title={Amazon Product Price Prediction using Multimodal Deep Learning},
author={Rodrigues, Shawneil},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/shawneil/Amazon-ml-Challenge-Model}},
note={SMAPE: 36.5\%}
}
MIT License - See LICENSE
Built with โค๏ธ using PyTorch, CLIP, and smart feature engineering
From 52.3% to 36.5% SMAPE - Multimodal learning at its best
Base model
openai/clip-vit-large-patch14