[Toybox] Microsoft github took down the xz repo.

Rob Landley rob at landley.net
Tue Apr 16 06:02:27 PDT 2024


On 4/15/24 03:53, Jarno Mäkipää wrote:
> On Sun, Apr 14, 2024 at 9:14 AM Oliver Webb via Toybox
> <toybox at lists.landley.net> wrote:
>>
>> To revive a old thread with new technical info I stumbled upon:
>>
>> On Saturday, March 30th, 2024 at 15:58, Rob Landley <rob at landley.net> wrote:
>>
>> > I set up gitea for Jeff on a j-core internal server, and it was fine except it
>> > used a BUNCH of memory and cpu for very vew users. Running cgi on dreamhost's
>> > servers is a bother at the best of times (I don't want to worry about exploits),
>> > and the available memory/CPU there is wind-up toy levels.
>> >
>> > My website is a bunch of static pages rsynced into place, some of which use
>> > xbithack to enable a crude #include syntax, and that's about what the server can
>> > handle.
>>
>> Going through the list of "minimal tools" on https://suckless.org/rocks/,

Not really a fan of that site. I did a roadmap section on them long ago
(https://landley.net/toybox/roadmap.html#sbase), but I'm trying to implement
mostly compatible versions of things that already exist, and they're trying to
invent new things that didn't previously exist because https://xkcd.com/927/
which I mostly consider fragmentation rather than helping, and I try not to
encourage them.

>> I stumbled
>> upon a git frontend called stagit (https://git.codemadness.org/stagit/file/README.html)
>> which the suckless project uses as it's git frontend.

When microsoft bought github I mirrored my repo on my website so you could pull
it from there, but doing that doesn't have any web interface so I did a quick
and dirty bash script to upload the "git format-patch" of each commit, with
symlinks from the 12 character hash to the full hash (because doing _each_ one
was an insanely slow exercise in inode exhaustion).

You're once again telling me what I did was not good enough for you, and that I
am wrong, and must change to suit you.

>> But to have a solution, you must have a problem. The 2 main issues I have with the current git management
>> are the fact

I'm very tired.

>> there doesn't seem to be a way to clone the current repo directly from landley.net (Making Microsoft
>> GitHub the middleman).


$ git annotate www/header.html | grep -w git
fb47b0120	(Rob Landley	2021-09-12 14:33:36 -0500	30)          <li><a
href=https://landley.net/toybox/git>local</a></li>
$ git show fb47b0120
commit fb47b0120f7aa73c0821a8c55e15540d83baed01
Author: Rob Landley <rob at landley.net>
Date:   Sun Sep 12 14:33:36 2021 -0500

    Add a local git mirror (todo item since github was acquired)...

diff --git a/www/git/index.html b/www/git/index.html
new file mode 100644
index 00000000..bade8d1b
--- /dev/null
+++ b/www/git/index.html
@@ -0,0 +1 @@
+Not browseable: <b>git clone https://landley.net/toybox/git</b>

$ git log scripts/git-static-index.sh
commit 990e0e7a40e4509c7987a190febe5d867f412af6
Author: Rob Landley <rob at landley.net>
Date:   Sat Dec 24 06:34:11 2022 -0600

    Script to put something browseable in https://landley.net/toybox/git

https://landley.net/notes-2022.html#22-12-2022

>> And the fact I can't browse the source code without github or android code search acting as
>> the middleman

I do not have source tree snapshots up. Kinda hard to do in a static manner
without uploading rather a LOT of files (and even if you upload each version of
"git log" for each file and create an index file for each commit with the ls -lR
of the whole tree linking to the relevant version, the URLs to the files are
ugly. I can do it, but don't really want to? Linking to individual lines of the
file while also having the raw text kinda implies uploading two versions and I
just dowanna. Oh, and dreamhost's server config doesn't have sane file
associations for all the types so if I put up a .c file it wants to DOWNLOAD it
instead of displaying it as text and trying to .htaccess that more of a pain
than I'm up for, so I would wind up having blah.c.txt and blah.c.html files and
that's just ugly...)

Plus, syntax highlighting: you'd THINK there would be some nice linux syntax
highlighting packages out there but not counting "use vi" (which doesn't work
for me anyway, :syntax = "E319: Sorry, the command is not available in this
version")...

Searching around I found https://github.com/alecthomas/chroma which is very
proud that it's written in "pure go"... except it's a wrapper for a python
library, and python's runtime is written in C, so DEFINE PURE...

Digging into the aforementioned python (don't get me started) library, the
"python-pigmentize" package installs the man page for a command "pygmentize",
and the bash completion for the command pygmentize, but does not install the
actual command in the $PATH (or anywhere, according to dpkg-query -L
python-pygments).

That's the point at which I gave up and decided to give my talk using the github
page to highlight the code, which txlf projected onto a screen with the room
lights on and everything was so washed out you couldn't see any colors anyway.

*shrug* Yes, I've thought about it. It's on the todo list. No, I don't see how
your static whatsis is going to suck significatly less with this server config
and what it WILL do is demand I install lots of prerequisite packages on my
development laptop to generate the stuff it rsyncs, and then probably not
minimize the rsync change each time but instead want to push lots of irrelevant
regenerated changes through phone tethering. You know what half the rsync
slowness is? Dreamhost's server at the far end hasn't got all the files cache
hot so it has to fault them back in to compare them, and its storage is ok about
streaming data but TERRIBLE at seeking. They're cheap, don't meter bandwidth,
and I haven't had to migrate in forever...

>> The first of these seems near impossible to solve with _only_ static webpages, since a git server is not
>> a static thing. I don't know yet tho.
> 
> Git repo with read-only access over http(s) is a static thing.

It's been built into git since at least 2013:

https://git-scm.com/book/en/v2/Git-on-the-Server-Getting-Git-on-a-Server

>> The second one seems easier though, copying or maybe symlinking stuff from the source directory with "find"
>> in a pipeline with bash to make a simple, browse-able tree would take probably take a few dozen lines of
>> at most, It could also probably just be an rsync command if you don't want to worry about listing out directory
>> contents.
>>
>> Both of these problems are remediable now, but in a year they might not be (ProtonMail just said "If you
>> don't sign in for a long enough time we will delete all your data" like Google drive is doing, it's not
>> hard to imagine Microsoft GitHub doing a similar thing with accounts they locked out by their 2FA crusade)

Yes, that's why I've been publishing my repo through my own website for years.

The subtler issue is that sha1sum isn't _that_ hard to induce hash collisions in
with crypto miner style setups (https://valerieaurora.org/hash.html) let alone
full state actor kit, and Microsoft is quite happy to cash checks from not even
the five eyes but
https://www.latimes.com/business/technology/story/2019-10-09/github-ice-contract-employee-oppose
and such, meaning what I pull from the repo and what I pushed INTO the repo...
microsoft had custody of the data in between, I don't trust it.

So my website has tarballs and a current git repo, in case anybody feels like
comparing. And I don't pull from outside sources, I "git am" human readable
patches. (And I have multiple local historical backups on really quite small usb
sticks and sd cards and "yes, that hard drive from 2004 wrapped in an old
washcloth still works with this USB adapter", just so anybody trying something
could never quite be SURE. Can't STOP shenanigans, but I can be enough of a
messy pain it's probably not worth bothering...)

(Yes I'm aware of signing commits, no I'm not convinced it would help. Maybe,
but I haven't read through it and spent an hour in a room with a relevant expert
explaining how and why it does what. I'd rather not run a fixed pipe through
enemy territory in the first place than trust the pipe to be untappable.)

>> Rob, Are you interested in future-proofing the codebase from whatever GitHub
>> and AOSP decide to do?

https://en.wikipedia.org/wiki/Teaching_grandmother_to_suck_eggs

Rob


More information about the Toybox mailing list