I want to use a multimodal model for manga translation, analysis, and tagging. I...

idonotknowwhy · 2025-03-12T15:02:32 1741791752

The multimodal models aren't good for this. Refusals aren't the issue (they're fine with BERSERK, though occasionally they'll refuse for copyright). The issue is the tech isn't there yet.

You'll want to use custom models to segment the manga (panels, speech bubbles), OCR the text, translate (gemma punches above it's weights for this part).

That said, I've been experimenting with using Pixtral to do the analysis part with okay-ish results (providing individual panels with the character names) but it'll still mix up the characters when they're drawn differently.

> I'm not a big believer in abliteration, it seems to always hurt performance.

Agreed, it's fun to play with but it increases halucinations. And for creative writing, it makes the model write more compliant characters (they'll give in too easily during negotiations, rather than refuse, etc)

Could probably be improved with more targeted abliteration.