Instructions to use castorini/azbert-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use castorini/azbert-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="castorini/azbert-base")# Load model directly from transformers import AutoTokenizer, AutoModelForPreTraining tokenizer = AutoTokenizer.from_pretrained("castorini/azbert-base") model = AutoModelForPreTraining.from_pretrained("castorini/azbert-base") - Notebooks
- Google Colab
- Kaggle
About
Here we share a pretrained BERT model that is aware of math tokens. The math tokens are treated specially and tokenized using pya0, which adds very limited new tokens for latex markup (total vocabulary is just 31,061).
This model is trained on 4 x 2 Tesla V100 with a total batch size of 64, using Math StackExchange data with 2.7 million sentence pairs trained for 7 epochs.
Usage
Download and try it out
pip install pya0==0.3.2
wget https://vault.cs.uwaterloo.ca/s/gqstFZmWHCLGXe3/download -O ckpt.tar.gz
mkdir -p ckpt
tar xzf ckpt.tar.gz -C ckpt --strip-components=1
python test.py --test_file test.txt
Test file format
Modify the test examples in test.txt to play with it.
The test file is tab-separated, the first column is additional positions you want to mask for the right-side sentence (useful for masking tokens in math markups). A zero means no additional mask positions.
Example output
Upload to huggingface
This repo is hosted on Github, and only mirrored at huggingface.
To upload to huggingface, use the upload2hgf.sh script.
Before runnig this script, be sure to check:
- check points for model and tokenizer are created under
./ckptfolder - model contains all the files needed:
config.jsonandpytorch_model.bin - tokenizer contains all the files needed:
added_tokens.json,special_tokens_map.json,tokenizer_config.json,vocab.txtandtokenizer.json - no
tokenizer_filefield intokenizer_config.json(sometimes it is located locally at~/.cache) git-lfsis installed- having git-remote named
hgfreference tohttps://huggingface.co/castorini/azbert-base
- Downloads last month
- 2
