I want to use a multimodal model for manga translation, analysis, and tagging.
If this gives me the "aschually as a ethical safe harmless assistant I can't ..." spiel on anything mildly mature, that would be very disappointing. I'll run a test with Berserk and see how it goes.
I'm not a big believer in abliteration, it seems to always hurt performance. Safety should be handled by a separate system, no need to cripple the actual LLM.
The multimodal models aren't good for this. Refusals aren't the issue (they're fine with BERSERK, though occasionally they'll refuse for copyright). The issue is the tech isn't there yet.
You'll want to use custom models to segment the manga (panels, speech bubbles), OCR the text, translate (gemma punches above it's weights for this part).
That said, I've been experimenting with using Pixtral to do the analysis part with okay-ish results (providing individual panels with the character names) but it'll still mix up the characters when they're drawn differently.
> I'm not a big believer in abliteration, it seems to always hurt performance.
Agreed, it's fun to play with but it increases halucinations. And for creative writing, it makes the model write more compliant characters (they'll give in too easily during negotiations, rather than refuse, etc)
Could probably be improved with more targeted abliteration.
If this gives me the "aschually as a ethical safe harmless assistant I can't ..." spiel on anything mildly mature, that would be very disappointing. I'll run a test with Berserk and see how it goes.
I'm not a big believer in abliteration, it seems to always hurt performance. Safety should be handled by a separate system, no need to cripple the actual LLM.