Hacker News new | past | comments | ask | show | jobs | submit login
Pile-T5 (eleuther.ai)
59 points by tosh on April 15, 2024 | hide | past | favorite | 15 comments



Now that's a blast from the past - T5! I always thought it was underused, but it's also from so long ago now, and even Google has moved past it to UL2 etc AFAIK. What's the use-case here for reproducing it? Aren't there already many good code models?


my group is currently working on a T5 model (and tokenizer) for html, as there are very few (if any) tokenizers that work well with HTML!

You can try using GPT4's tokenizer on your own HTML inputs below [1] ... there's definitely room for improvement!

[1] https://tiktokenizer.vercel.app


A lot of embedding models are built on top of T5's encoder, this offers a new option

The modularity of the enc-dec approach is useful - you can insert additional models in between (e.g. A diffusion model), you can use different encoders for different modalities, etc


Blast from the past? Like 2 years ago?


Gather round, children, and let Grandpa tell you a tale from way back in the day, when we thought 24GB was a lot of VRAM...


My nearly new 48GB A6000 is basically an ancient artifact relative to the exploding size of MoE requirements.


T5 was released in 2019.


Isn't that two years ago?


Two years already? This locked-down year of 2020 really messes up with me perception of time.


I've downloaded the data for every model on Huggingface and, if nothing else, flan-t5-xxl is the largest encoder (discriminant) model in terms of total file sizes. To be clear T5 is an encoder-decoder model. But the shift in focus to decoder-only has made it so encoder and encoder-decoder models have been, for lack of a better word, neglected, in terms of training focus.


I figured it would be interesting to see scale and for the image models


a new T5

* llama tokenizer

* the pile dataset

* trained on 2T tokens (2 times of og T5)

* better, & especially better at coding



I was using it for translation with somewhat wobbly results. Would love to see a translation fine tune.


Can someone provide some context? please explain what this is? thanks




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: