Testing LLVM

KenoFischer · on Jan 13, 2017

I always find committing to LLVM very nerve wrecking, because of the post-commit CI testing. LLVM has so many architectures that more often than not something I write will fail on one of them. And the only way for me to find out is to commit it, wait for the buildbot to fail (which can take a few hours, during which I really can't leave my computer lest I leave trunk broken on some buildbot, which is a big faux pas), revert it and then figure out what went wrong. I'm hoping that at some point this will be improved, such that I can run the whole buildbot army on my commit before putting it on trunk.

shanemhansen · on Jan 13, 2017

That's odd. It would be nice if branches could be tested with the build bot.

gus_massa · on Jan 13, 2017

I agree. When I submit a commit for Racket, the branch is tested in Travis and AppVeyor. That catches a lot of error.

Anyway, Racket has an additional internal CI called DrDr and some very subtle errors are only detected there, after the commit is merged.

wyldfire · on Jan 13, 2017

I get the feeling that there's parts of the community that feel the same way. I'm hoping that the planned move to github will naturally cascade into pre-commit checks.

Joky · on Jan 14, 2017

A big big problem is having enough hardware to have a CI throughput one or two order of magnitude what it is now. Unfortunately that's not an easy thing considering how "heavy" it is to build and test the full toolchain.

ezanmoto · on Jan 13, 2017

Do you know of solutions that exist for pre-commit checks for GitHub at the moment?

anp · on Jan 13, 2017

Rust uses https://github.com/rust-community/bors which maintains a linear queue of PRs which can only land after being rebased onto the commits before the PR and subsequently passing tests.

WalterBright · on Jan 14, 2017

dlang uses several. One notable one is the "autotester" written by Brad Roberts. It's built to ensure that breaking the D compiler (as tested by the test suite) does not result in commits.

It's incredibly useful.

Joky · on Jan 14, 2017

Swift also has a lot of pre-commit CI:

- here is the Jenkins bots page: https://ci.swift.org/view/Pull%20Request/ - example of PR test reporting: https://github.com/apple/swift/pull/6802

wyldfire · on Jan 13, 2017

In general? Yes, Travis CI seems to be a very popular one. Many folks use Appveyor for Windows support AFAICT.

Locke1689 · on Jan 14, 2017

Most of the .NET repos are structured to use inner and outer loop testing with Jenkins. Most tests on most architectures are run in the inner loop, which are kicked off in parallel as soon as you make a pull request to one of the .NET repositories on Github.

Some repositories, like CoreCLR, have outer loop testing that runs on a separate schedule (nightly, I think), but those tests are far less likely to break and are more devoted to finding rare and difficult to compute edge cases.

SloopJon · on Jan 13, 2017

Interesting to see screenshots of LCOV. I'm hoping to get an intern to work on test coverage this summer, and I wondered whether LCOV is still current. Looks like the latest release is from December 2016.

Joky · on Jan 14, 2017

The screenshots are from the "old" style coverage. We have much better views now: http://lab.llvm.org:8080/coverage/coverage-reports/opt/cover...

See for instance how having your cursor on top of each side of a condition tells you how many times each individual condition were evaluated!

bootload · on Jan 14, 2017

"Compilers are usually not networked, concurrent, or timing-dependent, and overall interact with the outside world only in very constrained ways."

Are there any parallel compliers?

tux3 · on Jan 14, 2017

I can say that for C and C++, the compilation is very often parallelized at the translation unit (file) level, by starting multiple instances of the compiler either locally or over a network with something like distcc. This is simple and effective enough that there wouldn't be much gain in parallelizing the compilers: all the cores are already busy most of the time.

Locke1689 · on Jan 14, 2017

It depends on what you mean by parallel.

Certainly the Roslyn C# compiler is highly parallel. All files are parsed in parallel, then all classes are bound (semantically analyzed) in parallel, then the IL serialization phase is sequential.

bootload · on Jan 16, 2017

"It depends on what you mean by parallel."

Across different machines, not cores on ^a^ chip?

Locke1689 · on Jan 16, 2017

I wouldn't say that's what most people mean by parallel, but in that case I think you're better off building a layer on top of the compiler for that.

For instance, provided deterministic compilation you could keep a networked cache of compiled libraries that would be delivered as needed.

Trying to be network-parallel at any finer level is probably a waste of time -- network and (de)serialization overhead would eat away all the advantages.

enqk · on Jan 14, 2017

Microsoft's CL.exe is, through the /MP option

boris · on Jan 14, 2017

Which only has effect if you pass multiple files to compile.

newsat13 · on Jan 14, 2017

One quirk of llvm is that they don't have a pre-commit CI.

mp3geek · on Jan 13, 2017

I do find it strange that such a large project isn't using a better VCS. SVN seems to be very antiquated.

wyldfire · on Jan 13, 2017

It's its size that makes it difficult to move. Some major ecosystem stuff is designed around the svn infrastructure. When the will arrived to make a change, it seemed natural to migrate not just to a different VCS but a different host. And this seemed to spawn a new debate: monorepo vs multi-repo. [still open AFAIK]

At the recent 2016 US Dev Conf, there was a consensus to move to git and that the new host would be github.

Really subjective IMO part: In general, there's tons of really smart folks working on really awesome stuff in LLVM+clang+etc. There's a handful of folks also focusing on the general "plumbing" software within and among those projects. The meta-plumbing job of the dev infrastructure is "kinda interesting" to several folks who want to improve the way the project is developed. But "kinda interesting" doesn't pay the bills and so it's a second (or nth responsibility) for the folks volunteering to work on it. Add to that the "no good deed goes unpunished" rule that they'll get the responsibility/blame after making a sweeping change, it means it will require extreme patience and caution.

Joky · on Jan 14, 2017

Yes! Such work is all done on a volunteering basis, when we find time for it on top of the "real work" (bug fixes, features, ...). All the infrastructure work is not always rewarding, you just get the upset people yelling at you :)

Current status: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109015...

DannyBee · on Jan 13, 2017

For a project like LLVM, it just doesn't matter too much. git-svn or plain svn works pretty well for most people. Certainly it matters, and it'll move eventually, but i'd rather see time spent on better testing tools than a "better" vcs.

When i moved GCC from CVS to SVN, it made life a bit easier but it's not revolutionary change.

Which is funny, considering how often people argue about VCS systems.

Locke1689 · on Jan 14, 2017

I don't agree.

Before we moved to Git, Roslyn was on TFS, which was basically Perforce/CVS/SVN.

You're absolutely right that the distinction among the former VCS's is minimal. However, Git offers value that was transformative compared to the former. Namely,

  1. Git allows you to easily switch between multiple work items while keeping track of the work done in each item.

  2. Git allows you to easily integrate with people who have significantly diverged clones of your tree without too much trouble.

  3. Git allows you to easily work offline.

(1) is definitely the largest benefit, but was mitigated with tools like g5 when I was at Google. However, the Google gravity well has its own drawbacks.

(2) is very important if you want to host rapid release schedules with divergence of features. It's especially useful if you want to have long stabilization periods and low-risk bug fixes delivered in parallel to multiple customers.

(3) is pretty self-explanatory, but for most people it's underestimated how much downtime your VCS has. I'd bet, for most people, it's significantly less than 5 9's. Not only is that wasted time, it's frustrating because it's usually consecutive and removes entire working days at random.

saurik · on Jan 14, 2017

I take it you haven't actually used the tool that was mentioned in the comment you replied to, namely git-svn? My use of svn to interface with projects using Subversion has essentially entirely been replaced by git-svn, and I can say it is essentially impossible for someone who has used it to not realize that at least offline now works like git. Taking a step back: at some point what you run on the server is just a storage format; unless you used some of the more advanced Subversion features (at which point you might actually like using it), it generally maps pretty directly to git semantics, at which point essentially all other functionality differences are mere porcelain.

lomnakkus · on Jan 14, 2017

Presumably not everyone is using git-svn... otherwise what would be the point of sticking with svn?

paulddraper · on Jan 14, 2017

Not a big surprise, considering that SVN is "CVS done right".

Compare that to Git's author: "Subversion has been the most pointless project ever started. There is no way to do cvs right."

Git and Hg (+ the many tools that surround them: GitHub, Bitbucket, Gerrit, GitLab, etc.) have a model that makes community contribution far easier than CVS and SVN.

saurik · on Jan 14, 2017

The community contribution concepts in git are great, but it is confusing to then mention GitHub: their modus operandi is to provide tooling to make things easier that are only hard if you insist on misusing git as if it were Subversion (by for example having a single centralized repository with multiple committers, requiring complex and annoying access control and public key management). If someone had built tooling like GitHub around Subversion and then encouraged use of svk (note the "k"; this was a replacement client for Subversion that supported offline operation and had better merging support, but which worked with any svn server), things would have felt much more reasonable before; the irony is that if you follow the actual git workflow used by Linus for Linux (where everyone has their own repository, rather than at best their own branch and at worst trying to share master), you shouldn't even need any of that for git :/.

paulddraper · on Jan 14, 2017

> actual git workflow used by Linus for Linux

When I install Linux 2.4, that centralized version comes from somewhere.

I agree that svk could have made for a serviceable GitHub, but the fact that Git had such things natively supported is a big advantage.

saurik · on Jan 17, 2017

Yes, and the centralized version comes from a tree that in the Linux workflow is only able to be modified by one person. You submit patches via email or pull requests (literal ones, to pull from a repository); you don't share commit bits on a centralized repository.

Groxx · on Jan 13, 2017

Until semi-recently, Git[1] wouldn't let you do a shallow checkout and still do useful things. For a large project, for most purposes, downloading all of history is pointless and immensely wasteful. SVN handles that just fine, and people who want git locally can use git-svn.

(edit: LLVM is surprisingly small, actually - a git clone comes in at just under 900MB. for more painful examples tho, see repos that commit(ted) binaries, or the scale of Android's repos)

[1]: AFAIK Mercurial still has no built-in support, though extensions exist. Which is probably the right choice for Mercurial.

tux3 · on Jan 14, 2017

>LLVM is surprisingly small, actually - a git clone comes in at just under 900MB

That's a little bit on the small side, but it's still very manageable. For comparison Linux's .git folder comes in at 1.3GB on my computer, and LibreOffice's repo which has git history going back to the year 2000 weights some 3.6 GB. I can happily say that I haven't had any performance or space problem dealing with either full repos, even on my fairly weak laptop.