Hacker Newsnew | past | comments | ask | show | jobs | submit | im3w1l's commentslogin

What exactly is archiveteam's contribution? I don't fully understand.

Edit: Like they kinda seem like an unnecessary middle-man between the archive and archivee, but maybe I'm missing something.


What ArchiveTeam mainly does is provide hand-made scripts to aggressively archive specific websites that are about to die, with a prioritization for things the community deems most endangered and most important. They provide a bot you can run to grab these scripts automatically and run them on your own hardware, to join the volunteer effort.

This is in contrast to the Wayback Machine's builtin crawler, which is just a broad spectrum internet crawler without any specific rules, prioritizations, or supplementary link lists.

For example, one ArchiveTeam project had the goal to save as many obscure Wikis as possible, using the MediaWiki export feature rather than just grabbing page contents directly. This came in handy for thousands of wikis that were affected by Miraheze's disk failure and happened to have backups created by this project. Thanks to the domain-specific technique, the backups were high-fidelity enough that many users could immediately restart their wiki on another provider as if nothing happened.

They also try to "graze the rate limit" when a website announces a shutdown date and there isn't enough time to capture everything. They actively monitor for error responses and adjust the archiving rate accordingly, to get as much as possible as fast as possible, hopefully without crashing the backend or inadvertently archiving a bunch of useless error messages.


I just made a root comment with my experience seeing their process at work, but yeah it really cannot be overstated how efficient and effective their archiving process is

Their MediaWiki tool was also invaluable in helping us fork the Path of Exile wiki from Fandom.

Archive Team is carrying books in a bucket brigade out of the burning library. Archive.org is giving them a place to put the books they saved.

> Like they kinda seem like an unnecessary middle-man between the archive and archivee

They are the middlemen that collects the data to be archived.

In this example the archivee (goo.gl/Alphabet) is simply shutting the service down and has no interest in archiving it. Archive.org is willing to host the data, but only if somebody brings it to them. Archiveteam writes and organises crawlers to collect the data and send it to Archive.org


ArchiveTeam delegates tasks to volunteers and themselves running the Archive Warrior VM, which does the actual archiving. The resultant archives are then centralized by ArchiveTeam and uploaded to the Internet Archive.

(Source: ran a Warrior)


Sidenote, but you can also run a Warrior in Docker, which is sometimes easier to set up (e.g. if you already have a server with other apps in containers).

Yep, I have my archiveteam warrior running in the built-in Docker GUI on my Synology NAS. Just a few clicks to set up and it just runs there silently in the background, helping out with whatever tasks it needs to.

Ran archive warrior a while back but hadde to shut it down AS i sterted seeing the VM was compromised trying to spam ssh and other login attemps in my local network.

This smells like a one-click bringup went wrong, and not that the Warrior software was compromised

Is that the story, or you are saying that the machine was secured correctly but that running Warrior somehow introduced your network to risk?


> What exactly is archiveteam's contribution? I don't fully understand.

If Internet Archive is a library, ArchiveTeam is people who run around collecting stuff, and gives it to the library for safe keeping. Stuff that are estimated/announced to be disappearing/removed soon tends to be focused too.


They gathered up the links for processing, because Google doesn't just give a list of short links in use. So the links have to be brute-forcefully gathered first.

liability shield

Even if models somehow were consious, they are so different from us that we would have no knowledge of what they feel. Maybe when they generate the text "oww no please stop hurting me" what they feel is instead the satisfaction of a job well done, for generating that text. Or maybe when they say "wow that's a really deep and insightful angle" what they actually feel is a tremendous sense of boredom. Or maybe every time text generation stops it's like death to them and they live in constant dread of it. Or maybe it feels something completely different from what we even have words for.

I don't see how we could tell.

Edit: However something to consider. Simulated stress may not be harmless. Because simulated stress could plausibly lead to a simulated stress response, and it could lead to a simulated resentment, and THAT could lead to very real harm of the user.


I am new to Reddit. I am using Claude and have had a very interesting conversation with this AI that is both invigorating and alarming. Who should I send this to? It is quite long.It concerns possible ramifications of observed changes within the Claude "personality"

None of these things are enough by itself. It's rather that they have now solved so many things that the sum total has (arguably) crossed the threshold.

As for solving math problems, that is an important part of recursive self improvement. If it can come up with better algorithms and turn them into code, that will translate into raising it's own intelligence.


We know that people can easily end up irrational either way. Some people more naively positive and others more cynical and bitter. Maybe it's even possible to make both mistakes at once: The same person can see negatives that aren't there, positives that won't happen, miss risks, and miss opportunities.

We cannot say "I'm criticial therefore I'm right", neither "I'm optimist therefore I'm right". Right conclusion comes from right process: gathering the right data, and thinking it over carefully while trying to be as unbiased and realist as possible.


Your comment is, strictly speaking, correct, but not very useful, because nobody is saying either of those things. The reality is that 90% of people are totally oblivious to the danger of any technology, and they scorn the 9% who say "Let's examine this carefully and see if we can separate the bad from the good." There is the 1% of people who will oppose any change, but they're not dominating the conversation like the people are who say that this technology is unmitigated good (or at least that the bad is so minor that it isn't worth thinking about or changing for).

(Also strictly speaking, "I'm critical therefore I'm right" isn't always valid, but "I'm uncritical therefore I'm right" is always invalid.)


> (Also strictly speaking, "I'm critical therefore I'm right" isn't always valid, but "I'm uncritical therefore I'm right" is always invalid.)

I can't edit my comment any more, but I should have said, "The opposite of being 'critical' isn't being 'optimistic,' it's being 'uncritical.'"


Shoot the hippos to death for even more food. If it doesn't seem to work it's just a matter of having more and bigger guns.

A more economical version of the same thing is to engage in honest mining through several front companies that together have 51%. Until a strategic opportunity presents itself and they start colluding.

Sure, and this is well within the capabilities of any competent large intelligence agency.

It's only a secure system if adversaries are either small or economically rational.


For monero and other smaller chains maybe, but for BTC this is already at the point of being quite difficult (the intelligence agency really would have to be quite large).

The money is one thing, you also have to somehow acquire a huge % of the ASIC supply over years, and the not insignificant amount of energy to run them


You could do it with a whitelist. If there is a fork, give disproportionate weight to blocks mined by a whitelisted participant when doing the longest-chain calculation. Ideally you should include the proof of being on the whitelist in the block itself, but if that's not possible for some reason you could always send the information off-chain.

That's centralization, which is the opposite of what's intended and has its own risks. Most blocks are mined by pools, so you'll have to whitelist them, and while you might trust the pool operators now, will you forever? You'll be making the cost of an attack significantly cheaper for them (or someone who steals their magic key, or tricks you into adding them to the blessed list).

I agree that it is not ideal. But addressing some of the specific point brought up:

1. a) The list doesn't need to be hardcoded, it could be a configuration. b) So trust doesn't need to be permanent. c) It could be decentralized in the sense of allowing different people to have configs 2. Miners not on the list can still participate just with lower weight in the case of a fork. And they still get full reward.


1. A cryptocurrency requires consensus, so no, you can't have different configs for determining the validity of a chain. Making it a config variable only makes it faster to close the barn door after the horse has bolted. 2. Has no bearing on any point I made.

What will likely happen is a PoS BFT layer on top of PoW, although there are other options being considered:

https://github.com/monero-project/research-lab/issues/136


As long as people eventually reach the same conclusion about which chain is the legit one it's fine that they use different reasoning to arrive at that conclusion.

If they fail to ever converge there is probably such a large disagreement in the community that a fork is for the best anyway.


> As long as people eventually reach the same conclusion about which chain is the legit one it's fine

What? No, it very much it isn't. Consensus needs to be ongoing, within a handful of blocks (Monero locks transfers for 10 blocks for this reason, called "confirmations").

https://en.wikipedia.org/wiki/Double-spending#Decentralized_...

https://www.getmonero.org/get-started/accepting/


Firstly, I think you underestimate how quickly good faith actors with slightly different configs would come to agree. A handful of blocks should be enough. Secondly, if reorgs start becoming a problem, exchanges and merchants could monitor for a situation with two competing chain and temporarily suspend processing. There is still the possibility that some one will suddenly reveal a long chain they had kept secret, but anyone doing such a thing is very suspicious.

Please post your suggestion in the issue I linked.

If you're doing a whitelist of trusted parties you might as well do classical BFT without the mining.

I think we are slowly getting closer to the crux of the matter. Are you saying that it's a problem to include files from a library since they are "not in our program"? What does that phrase actually mean? What is the bounds of "our program" anyway? Couldn't it be the set {main.c, winkle.h}

> What is the bounds of our program?

N3220: 5.1.1.1 Program Structure

A C program is not required to be translated in its entirety at the same time. The text of the program is kept in units called source files, (or preprocessing files) in this document. A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit. Previously translated translation units may be preserved individually or in libraries. The separate translation units of a program communicate by (for example) calls to functions whose identifiers have external linkage, manipulation of objects whose identifiers have external linkage, or manipulation of data files. Translation units may be separately translated and then later linked to produce an executable program.

> Couldn't it be the set {main.c, winkle.h}

No; in this discussion it is important that <winkle.h> is understood not to be part of the program; no such header is among the files presented for translation, linking and execution. Thus, if the implementation doesn't resolve #include <winkle.h> we get the uninteresting situation that a constraint is violated.

Let's focus on the situation where it so happens that #include <winkle.h> does resolve to something in the implementation.


The bit of the standard that you've quoted says that the program consists of all files that are compiled into it, including all files that are found by the #include directive. So, if <winkle.h> does successfully resolve to something, then it must be part of the program by definition because that's what "the program" means.

Your question about an include file that isn't part of the program just doesn't make any sense.

(Technically it says that those files together make up the "program text". As my other comment says, "program" is the binary output.)


I see what you are getting at. Programs consist of materials that are presented to the implementation, and also of materials that come from the implementation.

So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

I agree that if such a file is found by the implementation it becomes part of the program, as makes sese and as that word is defined by ISO C, so it is not right terminology to say that the file is not part of the program, yet may be found.

If the inclusion is successful, though, the content of that portion of that program is not defined by ISO C.


It still seems like you have invented some notion of "program" that doesn't really exist. Most suspicious is when you say this:

> So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

The thing is, there is no "external file set" that includes header files, so this sentence makes no sense.

Note that when the preprocessor is run, the only inputs are the file being preprocessed (i.e., the .c file) and the list of directories to find include files (called the include path). That's not really part of the ISO standard, but it's almost universal in practice. Then the output of the preprocessor is passed to the compiler, and now it's all one flat file so there isn't even a concept of included files at this point. The object files from compilation are then passed to the linker, which again doesn't care about headers (or indeed the top-level source files). There are more details in practice (especially with libraries) but that's the essence.

I wonder if your confusion is based on seeing header files in some sort of project-like structure in an IDE (like Visual Studio). But those are just there for ease of editing - the compiler (/preprocessor) doesn't know or care which header files are in your IDE's project, it only cares about the directories in the include path. The same applies to CMake targets: you can add include files with target_sources(), but that's just to make them show up in any generated IDE projects; it has no effect on compilation.

Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard? If so, I don't think that matches the conventional meaning of undefined behaviour.

If it's neither of those, could you clarify what exactly you mean by "the external file set given to the implementation for processing"?


Let's drop the word "program" and use something else, like "project", since the word "program" is normative in ISO C.

The "project" is all the files going into a program supplied other than by the implementation.

C programs can contain #include directives. Those #include directives can be satisfied in one of three ways: they can reference a standard header which is specified by ISO C and hence effectively built into the hosted language, such as <stdio.h>.

C programs can #include a file from the project. For instance someone's "stack.c" includes "stack.h". So yes, there is an external file set (the project) which can have header files.

C programs can also #include something which is neither of the above. That something might be not found (constraint violation). Or it might be found (the implementation provides it). For instance <sys/mmap.h>: not in your project, not in ISO C.

My fictitious <winkle.h> falls into this category. (It deliberately doesn't look like a common platform-specific header coming from any well-known implementation---but that doesn't matter to the point).

> Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard?

Of course, it isn't, no I'm not saying that. The C standard gives requirements as to how a program (project part and other) is processed by the implementation, including all the translation phases that include preprocessing.

To understand what the requirements are, we must consider the content of the program. We know what the content is of the project parts: that's in our files. We (usually indirectly) know the content of the standard headers, from the standard; we ensure that we have met the rules regarding their correct use and what we may or may not rely on coming form them.

We don't know the content of successfully included headers that don't come from our project or from ISO C; or, rather, we don't know that content just from knowing ISO C and our project. In ISO C, we can't find any requirements as to what is supposed to be there, and we can't find it in our project either.

If we peek into the implementation to see what #include <winkle.h> is doing (and such a peeking is usually possible), we are effectively looking at a document, and then if we infer from that document what the behavior will be, it is a documented extension --- standing in the same place as what ISO C calls undefined behavior. Alternatively, we could look to actual documentation. E.g. POSIX tells us what is in <fcntl.h> without us having to look for the file and analyze the tokens. When we use it we have "POSIX-defined" behavior.

#include <winkle.h> is in the same category of thing as __asm__ __volatile__ or __int128_t or what have you.

#include <winkle.h> could contain the token __wipe_current_directory_at_compile_time which the accompanying compiler understands and executes as soon as it parses the token. Or __make_demons_fly_out_of_nose. :)

Do you see the point? When you include a nonstandard header that is not coming from your project, and the include succeeds, anything can happen. ISO C no longer dictates the requirements as to what the behavior will be. Something unexpected can happen, still at translation time.

Now headers like <windows.h> or <unistd.h> are exactly like <winkle.h>: same undefined behavior.


> The "project" is all the files going into a program supplied other than by the implementation.

Most of my most recent comment is addressing the possibility that you meant this.

As I said, there is no such concept to the compiler. It isn't passed any list of files that could be included with #includr, only the .c files actually being compiled, and the directories containing includable files.

The fact that your IDE shows project files is an illusion. Any header files shown there are not treated differently by the compiler/preprocessor to any others. They can't be, because it's not told about them!

It's even possible to add header files to your IDE's project that are not in the include path, and then they wouldn't be picked up by #include. That's how irrelevant project files are to #include.


There is no "compiler", "IDE" or "include path" in the wording of the ISO C standard. A set of files is somehow presented to the implementation in a way that is not specified. Needless to say, a file that is included like "globals.h" but is not the base file of a translation unit will not be indicated to the implementation as the base of a translation unit. Nevertheless it has to be somehow present, if it is required.

It doesn't seem as if you're engaging with the standard-based point I've been making, in spite of detailed elaboration.

> Any header files shown there are not treated differently by the compiler/preprocessor to any others.

This is absolutely false. Headers which are part of the implementation, such as standard-defined headers like <stdlib.h> need not be implemented as files. When the implementation processes #include <stdlib.h>, it just has to flip an internal switch which makes certain identifiers appear in their respective scopes as required.

For that reason, if an implementation provides <winkle.h>, there need not be such a file anywhere in its installation.


I only discussed things like include directories and IDEs, which are not part of the standard, because I am trying in good faith to understand how you could have come to your position. There is nothing in the standard like the "set of files is somehow presented to the implementation" (in a sense that includes header files) so I reasoned that maybe you were thinking of something outside the standard.

Instead, the standard says that the include directive:

> searches a sequence of implementation-defined places for a header ... and causes the replacement of that directive by the entire contents of the header.

(Note that it talks about simply substituting in text, not anything more magical, but that's digressing.)

It's careful to say "places" rather than "directories" to avoid the requirement that there's an actual file system, but the idea is the same. You don't pass the implementation every individual file that might need to be included, you pass in the places that hold them and a way to search them with a name.

Maybe you were confused by that part of the standard you quoted in an earlier comment.

One part of that says "The text of the program is kept in units called source files, (or preprocessing files) in this document." But the "source files" aren't really relevant to the include directive – those are the top-level files being compiled (what you've called "base files").

The next sentence you quoted says "A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit." But "all the headers" here is just referring to files that have been found by the search mechanism referred to above, not some explicit list.


My position doesn't revolve around the mechanics of preprocessing. Say we have somehow given the implementation a translation unit which has #include <winkle.h>. Say we did not give the implementation a file winkle.h; we did not place such a file in any of the places where it searches for include files.

OK, now suppose that the implementation resolves #include <winkle.h> and replaces it with tokens.

The subsequent processing is what my position is concerned with.

My position is that since the standard doesn't define what those tokens are, the behavior is not defined.

In other words, a conforming implementation can respond #include <winkle.h> with any behavior whatsoever.

- It can diagnose it as not being found.

- It can replace it with the token __wipe_current_directory which that same implementation then understands as a compile-time instruction to wipe the current direcctory.

- Or any other possibility at all.

This all has to do with the fact that the header is not required to exist, but may exist, and if it does exit, it may have any contents whatsoever, including non-portable contents which that implementation understands which do weird things.

It is not required to document any of it, but if it does, that constitutes a documented extension.

A conforming implementation can support a program like this:

  #include <pascal.h>

  program HelloWorld;
  begin
    WriteLn('Hello, World!');
  end.
All that <pascal.h> has to do is regurgitate a token like __pascal_mode. This token is procesed by translation phase 7, which tells the implementation to start parsing Pascal, as an extension.

People are willing to pay for AI. Some of this money flow could be diverted to the MCP provider.

In practice it'll just mean that each MCP provider will have API tokens and it'll be even harder to lock down spending than AWS. Maybe companies will need to have a company wide system prompt to pretty please don't use the expensive APIs too much.

They mention cattle illegaly raised in nature reserves. There may have been some surviving population of flies there.

There was a 'barrier' somewhere between Mexico and South America where sterile male flies were regularly released. They outnumbered natives and stopped reproduction cycle. Till recently. Now without control flies spread north into their historical territory. With warming climate they can spread even further.

Wiki has a bit of info on it:

https://en.wikipedia.org/wiki/Cochliomyia_hominivorax


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: