opinions sought: costs & benefits of switching to self-hosted Git server? #131

Open
opened 2 years ago by mbutterick · 17 comments
mbutterick commented 2 years ago (Migrated from github.com)
  1. For those who have firsthand experience with running a self-hosted Git server — pros? Cons? Is there a solution for CI?

  2. For everyone — what would be the negative impact on you if pollen (and all my other software) were moved to this hypothetical new server? I would not be changing the license or anything else. Just the server where the canonical source is hosted.

  3. If that happened, I would be open to leaving pollen-users here at GitHub. Though I don’t have a strong feeling either way.

As for why. (Not that it matters.) I was willing to reserve judgment after the Microsoft acquisition of GitHub. Since then I have found myself holding my nose at most of the so-called improvements.

This week I tried Copilot, which is the most putrid yet — you install a ~keylogger~ Visual Studio plugin on your machine and get terrible code in return. It seems inevitable that in the same way social-media sites rapidly evolved into funnels for personal data to be sold to advertisers, the main business of GitHub will be collecting code for their AI training and other collateral purposes.

I also think that Copilot is a massive violation of the open-source licenses I use. I further question whether I can even meaningfully comply with the open-source licenses of underlying software I incorporate while hosting code on GitHub, because I’m feeding that code into the maw of something that will violate the license. (No, I don’t literally expect to be sued, but the handwaving around these issues is far from encouraging.)

1. For those who have firsthand experience with running a self-hosted Git server — pros? Cons? Is there a solution for CI? 3. For everyone — what would be the negative impact on you if `pollen` (and all my other software) were moved to this hypothetical new server? I would not be changing the license or anything else. Just the server where the canonical source is hosted. 4. If that happened, I would be open to leaving `pollen-users` here at GitHub. Though I don’t have a strong feeling either way. As for why. (Not that it matters.) I was willing to reserve judgment after the Microsoft acquisition of GitHub. Since then I have found myself holding my nose at most of the so-called improvements. This week I tried Copilot, which is the most putrid yet — you install a ~keylogger~ Visual Studio plugin on your machine and get terrible code in return. It seems inevitable that in the same way social-media sites rapidly evolved into funnels for personal data to be sold to advertisers, the main business of GitHub will be collecting code for their AI training and other collateral purposes. I also think that Copilot is a massive violation of the open-source licenses I use. I further question whether I can even meaningfully *comply* with the open-source licenses of underlying software I incorporate while hosting code on GitHub, because I’m feeding that code into the maw of something that will violate the license. (No, I don’t literally expect to be sued, but the [handwaving](https://docs.github.com/en/copilot/overview-of-github-copilot/about-github-copilot#using-github-copilot) around these issues is far from encouraging.)
mbutterick commented 2 years ago (Migrated from github.com)

(I’d also be willing to consider a GitHub alternative like GitLab, though they also seem to be heading down the AI rabbit hole.)

(I’d also be willing to consider a GitHub alternative like GitLab, though they also seem to be heading down the [AI rabbit hole](https://about.gitlab.com/handbook/engineering/incubation/ai-assist/).)
sorawee commented 2 years ago (Migrated from github.com)

I know nothing about SourceHut, but it seems quite popular as an alternative to Git{Hub/Lab}.

I know nothing about [SourceHut](https://git.sr.ht/), but it seems quite popular as an alternative to Git{Hub/Lab}.
wallingf commented 2 years ago (Migrated from github.com)

• For those who have firsthand experience with running a self-hosted Git server — pros? Cons? Is there a solution for CI?

 If you have a server that is reliably up, then users probably
 won't notice much difference at the git level.  We would give
 up Github-specific features, but I try not to use those anyway.

 I cannot comment on the CI issue with any generality.

• For everyone — what would be the negative impact on you if pollen (and all my other software) were moved to this hypothetical new server? I would not be changing the license or anything else. Just the server where the canonical source is hosted.

 There would be no negative impact on me.

• If that happened, I would be open to leaving pollen-users here at GitHub. Though I don’t have a strong feeling either way.

 I don’t have a strong feeling, either.  I do like having a mailing
 list.

 I completely understand your concerns.  As a university prof, I
 am not at all happy with Copilot as a free service available to
 students.  At least now, pre-Copilot, students have to do the work
 of finding someone else's code to copy and modify.

---- Eugene

As for why. (Not that it matters.) I was willing to reserve judgment after the Microsoft acquisition of GitHub. Since then I have found myself holding my nose at most of the so-called improvements.

This week I tried Copilot, which is the most putrid yet — you install a keylogger Visual Studio plugin on your machine and get terrible code in return. It seems inevitable that in the same way social-media sites rapidly evolved into funnels for personal data to be sold to advertisers, the main business of GitHub will be collecting code for their AI training and other collateral purposes.

I also think that Copilot is a massive violation of the open-source licenses I use. I further question whether I can even meaningfully comply with the open-source licenses of underlying software I incorporate while hosting code on GitHub, because I’m feeding that code into the maw of something that will violate the license. (No, I don’t literally expect to be sued, but the handwaving around these issues is far from encouraging.)


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

> • For those who have firsthand experience with running a self-hosted Git server — pros? Cons? Is there a solution for CI? If you have a server that is reliably up, then users probably won't notice much difference at the git level. We would give up Github-specific features, but I try not to use those anyway. I cannot comment on the CI issue with any generality. > • For everyone — what would be the negative impact on you if pollen (and all my other software) were moved to this hypothetical new server? I would not be changing the license or anything else. Just the server where the canonical source is hosted. There would be no negative impact on me. > • If that happened, I would be open to leaving pollen-users here at GitHub. Though I don’t have a strong feeling either way. I don’t have a strong feeling, either. I do like having a mailing list. I completely understand your concerns. As a university prof, I am not at all happy with Copilot as a free service available to students. At least now, pre-Copilot, students have to do the work of finding someone else's code to copy and modify. ---- Eugene > As for why. (Not that it matters.) I was willing to reserve judgment after the Microsoft acquisition of GitHub. Since then I have found myself holding my nose at most of the so-called improvements. > > This week I tried Copilot, which is the most putrid yet — you install a keylogger Visual Studio plugin on your machine and get terrible code in return. It seems inevitable that in the same way social-media sites rapidly evolved into funnels for personal data to be sold to advertisers, the main business of GitHub will be collecting code for their AI training and other collateral purposes. > > I also think that Copilot is a massive violation of the open-source licenses I use. I further question whether I can even meaningfully comply with the open-source licenses of underlying software I incorporate while hosting code on GitHub, because I’m feeding that code into the maw of something that will violate the license. (No, I don’t literally expect to be sued, but the handwaving around these issues is far from encouraging.) > > — > Reply to this email directly, view it on GitHub, or unsubscribe. > You are receiving this because you are subscribed to this thread.
otherjoel commented 2 years ago (Migrated from github.com)
  1. If you held a gun to my head right now and said I had to self-host a git project with CI, I would try gitea with metroline
  2. No impact
  3. Probably better to keep everything in one place but not much opinion either way

I second the SourceHut recommendation, though I don't (yet) use it myself. It's run by Drew Devault, who agrees with you about Github and GitLab, and it includes CI.

1. If you held a gun to my head right now and said I had to self-host a git project with CI, I would try [gitea](https://gitea.io/en-us/) with [metroline](https://github.com/metroline/metroline) 2. No impact 3. Probably better to keep everything in one place but not much opinion either way I second the SourceHut recommendation, though I don't (yet) use it myself. It's run by Drew Devault, who agrees with you about Github and GitLab, and it includes CI.
mbutterick commented 2 years ago (Migrated from github.com)

Drew Devault, who agrees with you about Github and GitLab

I see that he wrote this week about Copilot as a form of “open source laundering”. I generally agree with his argument, though I think his suggested solutions are unworkable:

Allow GitHub users and repositories to opt-out of being incorporated into the model. Better, allow them to opt-in. Do not tie this flag into unrelated projects like Software Heritage and the Internet Archive.

This idea is similar to the GDPR’s “right to be forgotten”. But it’s impossible to retroactively remove code that has already been incorporated into the model without retraining it from scratch. Also, I expect there would be a negative-selection effect where owners of better code would be more likely to opt out, thereby making the model dumber. (Though it wouldn’t surprise me if private enterprise repos have been exempted from the model so far, and opting out will be a service sold to them later on.)

Track the software licenses which are incorporated into the model and inform users of their obligations with respect to those licenses.

Impossible. First, the material emitted by the model comes from different places and there’s no guarantee that the licenses are legally compatible. Second, this would put Microsoft in the position of giving legal advice to zillions of users. They are already passing that buck (about which more below)

Remove copyleft code from the model entirely, unless you want to make the model and its support code free software as well.

Impossible. Without copyleft code, the model would starve to death.

Consider compensating the copyright owners of free software projects incorporated into the model with a margin from the Copilot usage fees, in exchange for a license permitting this use.

I’m sure Microsoft’s view is that the owners of the projects are being compensated already with all the goodies on GitHub that they don’t pay market rates for. The “license permitting this use” is already baked into the GitHub terms of service.


Even assuming that training an AI with certain software code counts as fair use under US copyright (as GitHub’s former CEO has claimed), that’s a long way from claiming that every output of that system also qualifies as fair use. Microsoft has not made this claim — and will not, because they can’t guarantee the behavior of a probabilistic system — so they explicitly pass this risk onto Copilot users:

We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn't write yourself. These precautions include rigorous testing, IP scanning

Therefore, the good news (?) — I expect that Copilot will be banned in most companies due to the possibility of some junior engineer nonchalantly embedding IP violations in the enterprise codebase.

In the meantime, Microsoft’s fair-use argument creates a bigger problem. Devault suggests a nuclear option: “don’t use GitHub and your code will not make it into the model”. That’s also what I’m proposing here. But if AI training qualifies as fair use for code that appears on GitHub, it qualifies as fair use for code that appears anywhere. Just as Google indexes all the web pages, GitHub could train its AI on the code displayed on GitLab, Gogs, Gitea, etc. The cynical endpoint of this line of thinking is that one might as well leave code on GitHub because it’s going to be absorbed anyhow.

Ironically the other likely outcome of Copilot is a surge of Copilot-generated code being released by the world’s laziest programmers. This tsunami of idiocy will crash again on GitHub’s shores, where it will be reabsorbed into the model, creating a process heretofore unknown in computer science: recursive stupidity.

What an accomplishment.

[I am not anyone’s lawyer and no one should take this comment as legal advice.]

> Drew Devault, who agrees with you about Github and GitLab I see that he wrote this week about Copilot as a form of “[open source laundering](https://drewdevault.com/2022/06/23/Copilot-GPL-washing.html)”. I generally agree with his argument, though I think his suggested solutions are unworkable: > Allow GitHub users and repositories to opt-out of being incorporated into the model. Better, allow them to opt-in. Do not tie this flag into unrelated projects like Software Heritage and the Internet Archive. This idea is similar to the GDPR’s “right to be forgotten”. But it’s impossible to retroactively remove code that has already been incorporated into the model without retraining it from scratch. Also, I expect there would be a negative-selection effect where owners of better code would be more likely to opt out, thereby making the model dumber. (Though it wouldn’t surprise me if private enterprise repos have been exempted from the model so far, and opting out will be a service sold to them later on.) > Track the software licenses which are incorporated into the model and inform users of their obligations with respect to those licenses. Impossible. First, the material emitted by the model comes from different places and there’s no guarantee that the licenses are legally compatible. Second, this would put Microsoft in the position of giving legal advice to zillions of users. They are already passing that buck (about which more below) > Remove copyleft code from the model entirely, unless you want to make the model and its support code free software as well. Impossible. Without copyleft code, the model would starve to death. > Consider compensating the copyright owners of free software projects incorporated into the model with a margin from the Copilot usage fees, in exchange for a license permitting this use. I’m sure Microsoft’s view is that the owners of the projects **are** being compensated already with all the goodies on GitHub that they don’t pay market rates for. The “license permitting this use” is already baked into the GitHub [terms of service](https://docs.github.com/en/site-policy/github-terms/github-terms-of-service#4-license-grant-to-us). *** Even assuming that training an AI with certain software code counts as fair use under US copyright (as GitHub’s [former CEO has claimed](https://news.ycombinator.com/item?id=27678354)), that’s a long way from claiming that every _output_ of that system also qualifies as fair use. Microsoft has not made this claim — and will not, because they can’t guarantee the behavior of a probabilistic system — so they explicitly pass this risk [onto Copilot users](https://docs.github.com/en/copilot/overview-of-github-copilot/about-github-copilot#using-github-copilot): > We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn't write yourself. These precautions include rigorous testing, **IP scanning** … Therefore, the good news (?) — I expect that Copilot will be banned in most companies due to the possibility of some junior engineer nonchalantly embedding IP violations in the enterprise codebase. In the meantime, Microsoft’s fair-use argument creates a bigger problem. Devault suggests a nuclear option: “don’t use GitHub and your code will not make it into the model”. That’s also what I’m proposing here. But if AI training qualifies as fair use for code that appears on GitHub, it qualifies as fair use for code that appears **anywhere**. Just as Google indexes all the web pages, GitHub could train its AI on the code displayed on GitLab, Gogs, Gitea, etc. The cynical endpoint of this line of thinking is that one might as well leave code on GitHub because it’s going to be absorbed anyhow. Ironically the other likely outcome of Copilot is a surge of Copilot-generated code being released by the world’s laziest programmers. This tsunami of idiocy will crash again on GitHub’s shores, where it will be reabsorbed into the model, creating a process heretofore unknown in computer science: recursive stupidity. What an accomplishment. [I am not anyone’s lawyer and no one should take this comment as legal advice.]
mbutterick commented 2 years ago (Migrated from github.com)

Bradley Kuhn of Software Freedom Conservancy on the ramifications of AI for open-source software. Bradley reaches several of the same points, though with more factual & legal detail.

Bradley Kuhn of Software Freedom Conservancy on the [ramifications of AI for open-source software](https://sfconservancy.org/blog/2022/feb/03/github-copilot-copyleft-gpl/). Bradley reaches several of the same points, though with more factual & legal detail.
zachmandeville commented 2 years ago (Migrated from github.com)

I cannot speak knowledgeably about point 1, but for 2 and 3:

  1. Moving off github would have no negative impact on me as a pollen user. I would see it as a strong positive.
  2. I would love if pollen-users was moved to a different host, if it was simple and made sense to do so. I prefer to use github as little as possible, for a host of reasons including the ones stated in the initial message.

I use sourcehut for personal projects and have been quite happy with it. It is unobtrusive and seems to have all the features I need without any chaff. It offers a mailing list service too, which works nicely from email though is kinda minimal as a web forum.

I cannot speak knowledgeably about point 1, but for 2 and 3: 2. Moving off github would have no negative impact on me as a pollen user. I would see it as a strong positive. 3. I would love if pollen-users was moved to a different host, if it was simple and made sense to do so. I prefer to use github as little as possible, for a host of reasons including the ones stated in the initial message. I use sourcehut for personal projects and have been quite happy with it. It is unobtrusive and seems to have all the features I need without any chaff. It offers a mailing list service too, which works nicely from email though is kinda minimal as a web forum.
mbutterick commented 2 years ago (Migrated from github.com)

Thank you for the suggestions. I have cloned Pollen to Sourcehut and changed the canonical repo on the Racket package server:

https://git.sr.ht/~mbutterick/pollen

Apparently this would become the new mailing list:

https://lists.sr.ht/~mbutterick/pollen-discuss

I invite Sourcehut fans to inspect this repo & either flag any mistakes or make suggestions for improvement before the switch is thrown.

After that, I suppose the right move is to put the GitHub Pollen repo into “archived” mode.

Thank you for the suggestions. I have cloned Pollen to Sourcehut and changed the canonical repo on the Racket package server: https://git.sr.ht/~mbutterick/pollen Apparently this would become the new mailing list: https://lists.sr.ht/~mbutterick/pollen-discuss I invite Sourcehut fans to inspect this repo & either flag any mistakes or make suggestions for improvement before the switch is thrown. After that, I suppose the right move is to put the GitHub Pollen repo into “archived” mode.
mbutterick commented 2 years ago (Migrated from github.com)

My further thoughts on the legality of Copilot and the (perhaps) futility of avoiding its maw, though ethics count too.

[My further thoughts](https://matthewbutterick.com/chron/this-copilot-is-stupid-and-wants-to-kill-me.html) on the legality of Copilot and the (perhaps) futility of avoiding its maw, though ethics count too.
mbutterick commented 2 years ago (Migrated from github.com)

Sourcehut uses Git over HTTP. Racket added support for these URLs in version 8.1 with a private git+https prefix. Regardless of the wisdom of this workaround, because Pollen (and my other Racket packages) support versions of Racket before 8.1, AFAICT I need a source-hosting service that supports traditional .git URLs. As it stands, users of versions of Racket before 8.1 will not be able to install Pollen.

Sourcehut uses Git over HTTP. Racket [added support](https://github.com/racket/racket/commit/2606ae3d8ea95e1b644f9400e156ec2ed781a582) for these URLs in version 8.1 with a private `git+https` prefix. Regardless of the wisdom of this workaround, because Pollen (and my other Racket packages) support versions of Racket before 8.1, AFAICT I need a source-hosting service that supports traditional `.git` URLs. As it stands, users of versions of Racket before 8.1 will not be able to install Pollen.
mbutterick commented 2 years ago (Migrated from github.com)

In the meantime I have reverted the package server to use the GitHub repo (see #132)

In the meantime I have reverted the package server to use the GitHub repo (see #132)
mbutterick commented 2 years ago (Migrated from github.com)

I have cloned pollen to Codeberg and changed the canonical repo on the Racket package server:

https://codeberg.org/mbutterick/pollen

I have cloned pollen to Codeberg and changed the canonical repo on the Racket package server: https://codeberg.org/mbutterick/pollen
mbutterick commented 2 years ago (Migrated from github.com)

Swift and Racket (among others) use Discourse. I’m thinking of putting up a self-hosted instance as a replacement for pollen-users. Pros/cons from those who have fiddled with it?

(So far Codeberg seems to be cooperating with the Racket package server, so I plan to stick with it. But it seems wise to permanently divorce the talking-about-software functionality from the Git hosting.)

Swift and Racket (among others) use [Discourse](https://www.discourse.org/). I’m thinking of putting up a self-hosted instance as a replacement for `pollen-users`. Pros/cons from those who have fiddled with it? (So far Codeberg seems to be cooperating with the Racket package server, so I plan to stick with it. But it seems wise to permanently divorce the talking-about-software functionality from the Git hosting.)
pmarinov commented 2 years ago (Migrated from github.com)

2 questions:

  • Does Codeberg provide feature similar to this we use here on GitHub for pollen-users?

or

2 questions: * Does Codeberg provide feature similar to this we use here on GitHub for *pollen-users*? or * Why not use the mailing list at https://lists.sr.ht/~mbutterick/pollen-discuss?
mbutterick commented 2 years ago (Migrated from github.com)

Does Codeberg provide feature similar to this we use here on GitHub for pollen-users?

Yes, Codeberg also has an “issues” feature. So in principle we could make a pollen-users over there. (But all the current messages would be left behind.) I suppose increasingly I lean toward separating the two tasks. It’s easier to relocate a Git repo than a discussion system. (Pollen originally had a mailing list hosted by Google, which was shut down abruptly, which is how we ended up here.)

Why not use the mailing list at https://lists.sr.ht/~mbutterick/pollen-discuss?

I considered that. If I’m going to have the discussion list hosted elsewhere, I’d rather a) host it myself using b) an open-source system with a track record, and c) do it in a way that allows me to consolidate other discussions (related to Quad, Beautiful Racket, etc.) because all those projects will be leaving GitHub too.

> Does Codeberg provide feature similar to this we use here on GitHub for pollen-users? Yes, Codeberg also has an “issues” feature. So in principle we could make a `pollen-users` over there. (But all the current messages would be left behind.) I suppose increasingly I lean toward separating the two tasks. It’s easier to relocate a Git repo than a discussion system. (Pollen originally had a mailing list hosted by Google, which was shut down abruptly, which is how we ended up here.) > Why not use the mailing list at https://lists.sr.ht/~mbutterick/pollen-discuss? I considered that. If I’m going to have the discussion list hosted elsewhere, I’d rather a) host it myself using b) an open-source system with a track record, and c) do it in a way that allows me to consolidate other discussions (related to Quad, Beautiful Racket, etc.) because all those projects will be leaving GitHub too.
mbutterick commented 2 years ago (Migrated from github.com)

I’ve put up a Discourse server at https://forums.matthewbutterick.com with an area for Pollen discussion. I invite members of pollen-users to inspect this server. Absent any objections or unforeseen wrinkles, I will put pollen-users into read-only mode by the end of July 2022 and we will move the party to the new server.

I’ve put up a Discourse server at https://forums.matthewbutterick.com with an area for Pollen discussion. I invite members of `pollen-users` to inspect this server. Absent any objections or unforeseen wrinkles, I will put `pollen-users` into read-only mode by the end of July 2022 and we will move the party to the new server.
mbutterick commented 2 years ago (Migrated from github.com)

Codeberg is just a hosted Gitea instance. So I thought: why not just put up my own Gitea server, if I could get it working in 30 min or less. I could and I did.

https://git.matthewbutterick.com/mbutterick/pollen

It would be possible to migrate pollen-users to this server, sort of. The thread messages would be migrated to a new repo. But they wouldn’t be attributed to users on the new server. Still, because pollen-users was always something of an off-label use of GitHub, there isn’t much reason to persist with that idiosyncrasy, now that there is a Discourse server.

Codeberg is just a hosted Gitea instance. So I thought: why not just put up my own Gitea server, if I could get it working in 30 min or less. I could and I did. https://git.matthewbutterick.com/mbutterick/pollen It would be possible to migrate `pollen-users` to this server, sort of. The thread messages would be migrated to a new repo. But they wouldn’t be attributed to users on the new server. Still, because `pollen-users` was always something of an off-label use of GitHub, there isn’t much reason to persist with that idiosyncrasy, now that there is a Discourse server.
This repo is archived. You cannot comment on issues.
No Milestone
No project
No Assignees
1 Participants
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: mbutterick/pollen-users#131
Loading…
There is no content yet.