Pile-T5

gwern · on April 15, 2024

Now that's a blast from the past - T5! I always thought it was underused, but it's also from so long ago now, and even Google has moved past it to UL2 etc AFAIK. What's the use-case here for reproducing it? Aren't there already many good code models?

brizii · on April 15, 2024

my group is currently working on a T5 model (and tokenizer) for html, as there are very few (if any) tokenizers that work well with HTML!

You can try using GPT4's tokenizer on your own HTML inputs below [1] ... there's definitely room for improvement!

[1] https://tiktokenizer.vercel.app

euclaise · on April 15, 2024

A lot of embedding models are built on top of T5's encoder, this offers a new option

The modularity of the enc-dec approach is useful - you can insert additional models in between (e.g. A diffusion model), you can use different encoders for different modalities, etc

dartos · on April 15, 2024

Blast from the past? Like 2 years ago?

michaelt · on April 15, 2024

Gather round, children, and let Grandpa tell you a tale from way back in the day, when we thought 24GB was a lot of VRAM...

bevekspldnw · on April 15, 2024

My nearly new 48GB A6000 is basically an ancient artifact relative to the exploding size of MoE requirements.

p1esk · on April 15, 2024

T5 was released in 2019.

stavros · on April 15, 2024

Isn't that two years ago?

littlestymaar · on April 15, 2024

Two years already? This locked-down year of 2020 really messes up with me perception of time.

jakderrida · on April 17, 2024

I've downloaded the data for every model on Huggingface and, if nothing else, flan-t5-xxl is the largest encoder (discriminant) model in terms of total file sizes. To be clear T5 is an encoder-decoder model. But the shift in focus to decoder-only has made it so encoder and encoder-decoder models have been, for lack of a better word, neglected, in terms of training focus.

emadm · on April 15, 2024

I figured it would be interesting to see scale and for the image models

tosh · on April 15, 2024

a new T5

* llama tokenizer

* the pile dataset

* trained on 2T tokens (2 times of og T5)

* better, & especially better at coding

adt · on April 15, 2024

https://lifearchitect.ai/models-table/

bevekspldnw · on April 15, 2024

I was using it for translation with somewhat wobbly results. Would love to see a translation fine tune.

AIorNot · on April 15, 2024

Can someone provide some context? please explain what this is? thanks