opinions sought: costs & benefits of switching to self-hosted Git server?
#131
Open
opened 2 years ago by mbutterick
·
17 comments
Loading…
Reference in New Issue
There is no content yet.
Delete Branch '%!s(<nil>)'
Deleting a branch is permanent. It CANNOT be undone. Continue?
For those who have firsthand experience with running a self-hosted Git server — pros? Cons? Is there a solution for CI?
For everyone — what would be the negative impact on you if
pollen
(and all my other software) were moved to this hypothetical new server? I would not be changing the license or anything else. Just the server where the canonical source is hosted.If that happened, I would be open to leaving
pollen-users
here at GitHub. Though I don’t have a strong feeling either way.As for why. (Not that it matters.) I was willing to reserve judgment after the Microsoft acquisition of GitHub. Since then I have found myself holding my nose at most of the so-called improvements.
This week I tried Copilot, which is the most putrid yet — you install a ~keylogger~ Visual Studio plugin on your machine and get terrible code in return. It seems inevitable that in the same way social-media sites rapidly evolved into funnels for personal data to be sold to advertisers, the main business of GitHub will be collecting code for their AI training and other collateral purposes.
I also think that Copilot is a massive violation of the open-source licenses I use. I further question whether I can even meaningfully comply with the open-source licenses of underlying software I incorporate while hosting code on GitHub, because I’m feeding that code into the maw of something that will violate the license. (No, I don’t literally expect to be sued, but the handwaving around these issues is far from encouraging.)
(I’d also be willing to consider a GitHub alternative like GitLab, though they also seem to be heading down the AI rabbit hole.)
I know nothing about SourceHut, but it seems quite popular as an alternative to Git{Hub/Lab}.
---- Eugene
I second the SourceHut recommendation, though I don't (yet) use it myself. It's run by Drew Devault, who agrees with you about Github and GitLab, and it includes CI.
I see that he wrote this week about Copilot as a form of “open source laundering”. I generally agree with his argument, though I think his suggested solutions are unworkable:
This idea is similar to the GDPR’s “right to be forgotten”. But it’s impossible to retroactively remove code that has already been incorporated into the model without retraining it from scratch. Also, I expect there would be a negative-selection effect where owners of better code would be more likely to opt out, thereby making the model dumber. (Though it wouldn’t surprise me if private enterprise repos have been exempted from the model so far, and opting out will be a service sold to them later on.)
Impossible. First, the material emitted by the model comes from different places and there’s no guarantee that the licenses are legally compatible. Second, this would put Microsoft in the position of giving legal advice to zillions of users. They are already passing that buck (about which more below)
Impossible. Without copyleft code, the model would starve to death.
I’m sure Microsoft’s view is that the owners of the projects are being compensated already with all the goodies on GitHub that they don’t pay market rates for. The “license permitting this use” is already baked into the GitHub terms of service.
Even assuming that training an AI with certain software code counts as fair use under US copyright (as GitHub’s former CEO has claimed), that’s a long way from claiming that every output of that system also qualifies as fair use. Microsoft has not made this claim — and will not, because they can’t guarantee the behavior of a probabilistic system — so they explicitly pass this risk onto Copilot users:
Therefore, the good news (?) — I expect that Copilot will be banned in most companies due to the possibility of some junior engineer nonchalantly embedding IP violations in the enterprise codebase.
In the meantime, Microsoft’s fair-use argument creates a bigger problem. Devault suggests a nuclear option: “don’t use GitHub and your code will not make it into the model”. That’s also what I’m proposing here. But if AI training qualifies as fair use for code that appears on GitHub, it qualifies as fair use for code that appears anywhere. Just as Google indexes all the web pages, GitHub could train its AI on the code displayed on GitLab, Gogs, Gitea, etc. The cynical endpoint of this line of thinking is that one might as well leave code on GitHub because it’s going to be absorbed anyhow.
Ironically the other likely outcome of Copilot is a surge of Copilot-generated code being released by the world’s laziest programmers. This tsunami of idiocy will crash again on GitHub’s shores, where it will be reabsorbed into the model, creating a process heretofore unknown in computer science: recursive stupidity.
What an accomplishment.
[I am not anyone’s lawyer and no one should take this comment as legal advice.]
Bradley Kuhn of Software Freedom Conservancy on the ramifications of AI for open-source software. Bradley reaches several of the same points, though with more factual & legal detail.
I cannot speak knowledgeably about point 1, but for 2 and 3:
I use sourcehut for personal projects and have been quite happy with it. It is unobtrusive and seems to have all the features I need without any chaff. It offers a mailing list service too, which works nicely from email though is kinda minimal as a web forum.
Thank you for the suggestions. I have cloned Pollen to Sourcehut and changed the canonical repo on the Racket package server:
https://git.sr.ht/~mbutterick/pollen
Apparently this would become the new mailing list:
https://lists.sr.ht/~mbutterick/pollen-discuss
I invite Sourcehut fans to inspect this repo & either flag any mistakes or make suggestions for improvement before the switch is thrown.
After that, I suppose the right move is to put the GitHub Pollen repo into “archived” mode.
My further thoughts on the legality of Copilot and the (perhaps) futility of avoiding its maw, though ethics count too.
Sourcehut uses Git over HTTP. Racket added support for these URLs in version 8.1 with a private
git+https
prefix. Regardless of the wisdom of this workaround, because Pollen (and my other Racket packages) support versions of Racket before 8.1, AFAICT I need a source-hosting service that supports traditional.git
URLs. As it stands, users of versions of Racket before 8.1 will not be able to install Pollen.In the meantime I have reverted the package server to use the GitHub repo (see #132)
I have cloned pollen to Codeberg and changed the canonical repo on the Racket package server:
https://codeberg.org/mbutterick/pollen
Swift and Racket (among others) use Discourse. I’m thinking of putting up a self-hosted instance as a replacement for
pollen-users
. Pros/cons from those who have fiddled with it?(So far Codeberg seems to be cooperating with the Racket package server, so I plan to stick with it. But it seems wise to permanently divorce the talking-about-software functionality from the Git hosting.)
2 questions:
or
Yes, Codeberg also has an “issues” feature. So in principle we could make a
pollen-users
over there. (But all the current messages would be left behind.) I suppose increasingly I lean toward separating the two tasks. It’s easier to relocate a Git repo than a discussion system. (Pollen originally had a mailing list hosted by Google, which was shut down abruptly, which is how we ended up here.)I considered that. If I’m going to have the discussion list hosted elsewhere, I’d rather a) host it myself using b) an open-source system with a track record, and c) do it in a way that allows me to consolidate other discussions (related to Quad, Beautiful Racket, etc.) because all those projects will be leaving GitHub too.
I’ve put up a Discourse server at https://forums.matthewbutterick.com with an area for Pollen discussion. I invite members of
pollen-users
to inspect this server. Absent any objections or unforeseen wrinkles, I will putpollen-users
into read-only mode by the end of July 2022 and we will move the party to the new server.Codeberg is just a hosted Gitea instance. So I thought: why not just put up my own Gitea server, if I could get it working in 30 min or less. I could and I did.
https://git.matthewbutterick.com/mbutterick/pollen
It would be possible to migrate
pollen-users
to this server, sort of. The thread messages would be migrated to a new repo. But they wouldn’t be attributed to users on the new server. Still, becausepollen-users
was always something of an off-label use of GitHub, there isn’t much reason to persist with that idiosyncrasy, now that there is a Discourse server.