Hacker News new | past | comments | ask | show | jobs | submit | h-jones's comments login

Anyone know how this compares to GROBID [1]? I'm looking at alternatives to GROBID as I'm not super pleased with its outputs. GROBID has a lot of great features for journal papers (reference extraction / parsing), but I'm only interested in cleanly extracting the body. Also considering nougat [2] but I haven't tried it yet.

[1] https://github.com/kermitt2/grobid

[2] https://github.com/facebookresearch/nougat


Right, I'm in a similar situation here. I'm trying to read journal papers in the terminal. Previously, I've considered using pdf2htmlEX[0] to generate a layout-preserving HTML5 + CSS version of the PDF; then rendering it in the terminal using browsh (unfortunately terminal browsers like w3m don't support HTML5 + CSS) [1]. Between nougat and MinerU, they seem like a better option.

[0] https://pdf2htmlex.github.io/pdf2htmlEX/ [1] https://www.brow.sh/


If you’re looking for research along these directions, Melanie Mitchell at the Santa Fe institute explores these areas. There are better references from her, but this is what came to mind https://medium.com/p/can-a-computer-ever-learn-to-talk-cf47d....


The PyTorch docs give a pretty good overview of AMP here https://pytorch.org/tutorials/recipes/recipes/amp_recipe.htm... and an overview of which operations cast to which dtype can be found here https://pytorch.org/docs/stable/amp.html#autocast-op-referen....

Edit: Fixed second link.


This helps lessen issues like inference speed / requirements but it doesn't address the environmental impact of training.


That's only for pretraining a model. Very few groups pretrain models, since it is so expensive in terms of GPU time. Finetuning a model for a specific task typically only takes a few hours. E.g. I regularly train multitask syntax models (POS tagging, lemmatization, morphological tagging, dependency relations, topological fields), which takes just a few hours on a consumer-level RTX 2060 super.

Unfortuntaly, distillation of smaller models can take a fair bit of time. However, there is a lot of recent work to make distillation more efficient, e.g. by not just training on the label distributions of of a teacher model, but by also learning to emulate the teacher's attention, hidden layer outputs, etc.


Is there one model that you use more frequently than others as a base for these disparate fine tuning tasks? Basically, are there any that are particularly flexible?


In general, BERT would be the most common one. RoBERTa is the same model but trained for longer, which turns out to work better. T5 is a larger model, which works better on many tasks but is more expensive.


Thanks for the summary! I'm familiar with BERT, but less so the different variants, so that's quite helpful. I'll take a look at how RoBERTa works.


So far, of the models that run on GPUs with 8-16GiB VRAM XLM-RoBERTa has been the best for these specific tasks. It worked better than the multi-lingual BERT model and language-specific BERT models by quite a wide margin.


Great, thanks very much for the pointer, especially the VRAM context - I'm looking to fine-tune on 2080Ti's rather than V100/A100s, so that's really good to know.


The environmental impact of training deep models is a tiny fraction of the impact of eg mining bitcoin.


Also it's tiny compared to the server cost of big tech companies like facebook and google.


the environmental impact of training is, generally considered to be a one-time cost.


How does this play in with the fact that the DDR5 is clocked at 4800mHz and the DDR4 is at 3200? Would we not expect a 50% improvement with respect to transfer rates with a 50% increase in clock? I really don't know.

There are even 4800mHz DDR4 DIMMs available now, even if they are niche.

EDIT: DDR4 is 3200 not 3800.


There are plenty of publicly published papers detailing detection of deep fakes.


We do have methods to detect deepfakes and it often works very well (at least for any given generator, perhaps not multiple generators), but you're right, at some point deepfakes will be indistinguishable from real media. At that point I think the problem is largely outside the domain of computer science and we will need to start redefining "trust" looks like.


Anyone else find it mildly(/extremely) distressing that a large part of our industry is dedicated to creating technology that provides little-to-no value to society, but creates enormous opportunities for great societal damage?

We then release the tech to the wild without any proposed way to address the problems, instead saying “the problems this causes are outside the domain of CS, so I won’t bother considering them – besides, if I didn’t create these problems someone else would have”


I hope I didn't make it seem like I was pushing the solution onto others when I said that it will move outside the domain of computer science. My research is in the area of detecting deep fakes.

You're definitely right about how little we attend to the consequences of our advancements, especially in the field of AI/ML. This area is fraught with ethical and moral issues that have taken a backseat to the ideal of progress and I think we are making a mistake there.


If there were B and C that would mitigate the bad things in more powerful A, then fine

But otherwise delaying progress what if in the end a more powerful A will eventually make any B and C irrelevant? All you’re doing is kicking the can down the road maybe?



I think differential geometry may present the closest exception to this, not only is the notation often incredibly dense and subtle (i.e. spacing between indices when raising and lowering) but often everyone seems to have their personal favorite take on any given notation.


I actually really enjoy using it, why do you not like it? It has its quirks but for me they almost make me enjoy the creative process more.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: