argh

author: B. Watson <yalhcru@gmail.com> 2018-09-22 21:12:35 -0400
committer: B. Watson <yalhcru@gmail.com> 2018-09-22 21:12:35 -0400
commit: 1ca6c7902e03ec8d5cc7b39ffba57bd1560eabae (patch)
tree: 750a690f0d9f1bf401b8cb7ecd4087b54623a19e
parent: 0693dcb548569f118d1d7eefdf2874ef56334eb2 (diff)
download: sbostuff-1ca6c7902e03ec8d5cc7b39ffba57bd1560eabae.tar.gz
3 files changed, 80 insertions, 24 deletions
diff --git a/sbodl b/sbodl
index 7dd35f4..def7c7e 100755
--- a/sbodl
+++ b/sbodl
@@ -90,7 +90,7 @@ for dl in $DL; do
 	else
 		wget $WGETARGS $EXTRAWGETARGS "$dl" || die "Download failed"
 		if [ -e "$FILE" ]; then
-			mv "$FILE" "$CACHEDIR"
+			mv -b "$FILE" "$CACHEDIR"
 			ln -s "$CACHEDIR/$FILE" "$FILE"
 		fi
 	fi
@@ -105,6 +105,5 @@ for dl in $DL; do
 	else
 		echo "WARN: can't find downloaded file $FILE"
 	fi
-	echo
 	shift
 done
diff --git a/sbosrcarch b/sbosrcarch
index f5da6d4..0b32cb9 100755
--- a/sbosrcarch
+++ b/sbosrcarch
@@ -1192,8 +1192,8 @@ sub handle_info_file {
 		my $filename = url_to_filename($url);
 		print ": $url\n";
 
-		if(exists($url_rewrite_hacks{$category/$prgnam})) {
-			$url = $url_rewrite_hacks{$category/$prgnam}->($url);
+		if(exists($url_rewrite_hacks{'$category/$prgnam'})) {
+			$url = $url_rewrite_hacks{'$category/$prgnam'}->($url);
 		}
 
 		if(already_exists($filename, $category, $prgnam, $md5)) {
diff --git a/sbosrcarch.faq b/sbosrcarch.faq
index 19b30ac..88b0e2c 100644
--- a/sbosrcarch.faq
+++ b/sbosrcarch.faq
@@ -32,12 +32,12 @@ A: The archive server is operated by Darren Austin, aka "Tadgy"
    archive healthy.
 
    The best way to contact us is using an IRC client to connect to
-   Freenode and join the ##slackware or ##slackbuilds channel.
+   Freenode and join the ##slackware or #slackbuilds channel.
 
    We can also be reached by email:
 
    B. Watson <yalhcru@gmail.com>
-   TODO: make sure Tadgy is OK with his email being here!
+   Darren Austin <mirrors (at) slackware.uk>
 
    Please read this entire FAQ before asking us questions. Chances are,
    you'll find the answer here. If not, or if the answer isn't clear
@@ -95,10 +95,10 @@ A: Several answers here:
      you'll see a list of build name directories. Each of these will
      contain the source file(s) for the build.
 
-     Example: you can't download the source to system/atari800, so
-     you go to the by-name page, click on "system", then "atari800".
-     There you'll see the file you wanted, atari800-3.1.0.tar.gz (unless
-     it's been updated since I wrote this).
+     Example: you can't download the source to system/atari800
+     from its original URL, so you go to the by-name page, click on
+     "system", then "atari800".  There you'll see the file you wanted,
+     atari800-3.1.0.tar.gz (unless it's been updated since I wrote this).
 
    - With a download tool like wget or curl. You could do this using the
      same by-name tree as you would for manual lookups, but it's better to
@@ -244,8 +244,7 @@ A: Multiple answers:
    can remove and blacklist it. It is not our intention to violate
    anyone's license.
 
-Q: Why does the status page say the by-name tree is missing 4 files, but
-   the by-md5sum tree is 100% complete?
+Q: Why do some of the by-name directories have filenames ending in ".x86_64"?
 
 A: This is due to a design flaw in the archive structure. We assumed that
    download filenames would either be unique within an .info file, or else
@@ -258,12 +257,14 @@ A: This is due to a design flaw in the archive structure. We assumed that
       DOWNLOAD_x86_64="https://www.perforce.com/downloads/perforce/r18.1/bin.linux26x86_64/p4"
 
    Notice that both URLs end in "/p4". The directory parts of the URL are
-   different, but the filenames are the same.
+   different, but the filenames are the same. In the archive, the 32-bit
+   download will be called "p4" and the 64-bit one will be "p4.x86_64".
 
    The archive script successfully downloads these files and stores them
    in the by-md5 tree in the correct directories. But when it tries to
    store them in the by-name tree, it's trying to save two files in the
-   same directory with the same name. The second one overwrites the first.
+   same directory with the same name. If it didn't use a different name,
+   the second one would overwrite the first.
 
    The current list of builds affected by this is:
 
@@ -272,10 +273,6 @@ A: This is due to a design flaw in the archive structure. We assumed that
       development/p4d
       libraries/p4api
 
-   Eventually this will get more-or-less fixed: for these builds (and only
-   these), there will probably be separate x86/ and x86_64/ subdirectories
-   (e.g. development/p4/x86/p4).
-
 Q: I'm a SlackBuild maintainer, and the download URL for one of my builds
    has disappeared. Can I use the archive URL as the DOWNLOAD in my .info
    file?
@@ -299,12 +296,64 @@ A: Yes, but only as a temporary measure or a last resort.
    every source file to have two working URLs: the original plus the
    sbosrcarch one.
 
+Q: I'm a SlackBuild maintainer, and one of my builds keeps showing up
+   on the sbosrcarch STATUS as missing. How can I prevent this?
+
+   This usually happens for one of these reasons:
+
+   1. You made a mistake in your submission. Double-check the DOWNLOAD URL(s)
+      and MD5SUM(s) in the .info file. If they're wrong, resubmit your build.
+
+   2. The filename in the download URL is "unversioned", meaning the version
+      number isn't part of the filename (e.g. "thingy-latest.tar.gz"). At
+      some point after you last updated your .info file, but before the
+      SBo public update, the file changed on the server. Actually, this
+      occasionally happens even for files that have the version number
+      in the filename: upstream makes a mistake (leave a file out of the
+      tarball for instance) and a day or so later, they fix it without
+      changing the version number. When the archiver downloads the file,
+      it checks the md5sum against your .info file and sees a mismatch,
+      so it won't archive the file.
+
+   3. Upstream made a new release after you updated your build, but before
+      the SBo public update, and they removed the old version from their
+      server (or, possibly, moved it to a different location like /archives/
+      or /old-versions/). When the archiver tries to download the file, it
+      gets a '404 Not Found' error.
+
+   For (2) and (3), the problem is really the same: the web is a moving
+   target. Your download URLs and their md5sums were valid, but they got
+   changed on the server sometime after you submitted your build.
+
+   The solution is the same for both: find somewhere else to host your
+   source downloads. Either use your own web or ftp server if you have
+   one, or ask on the mailing list and someone will probably volunteer
+   to host it for you. Once you have the file(s) hosted somewhere,
+   update your .info file to point to the new location.
+
+   Before you do this, make sure the license allows you to: if it
+   doesn't allow redistribution, you can't host the download somewhere
+   else... and neither can we, so the build should be added to the
+   sbosrcarch blacklist (let us know if this is the case).
+
+   4. The file on the server is 'protected', because the server checks
+      the HTTP Referer and/or User-agent fields in the request. Typically
+      this means the download will work when using a browser, but will
+      fail when using wget or curl. Usually when this happens, one of
+      the sbosrcarch operators will manually download the file and add
+      it to the archive within a day or two. If not, let us know and
+      we'll get to it ASAP. Again, check the license of the download
+      file: if redistribution is not allowed, it should be added to the
+      blacklist and not kept in the archive.
+
 Q: How do I create my own archive?
 
 A: Two choices:
 
-   - Mirror the directory the usual way, using wget or rsync. Using
-     rsync is better!
+   - Mirror the directory the usual way, with rsync. Using wget
+     would be possible, but it would use about twice the bandwidth and
+     storage. This is because rsync supports hard links, which sbosrcarch
+     makes extensive use of.
 
    - Get a copy of the sbosrcarch script and run it on your web server.
      This will be more work on your part, but your archive will be
@@ -313,10 +362,6 @@ A: Two choices:
 
      The script lives here:
 
-     http://urchlay.naptime.net/repos/sbostuff/plain/sbosrcarch
-
-     ...or, it'd be better to use git:
-
      git clone git://urchlay.naptime.net/sbostuff.git
 
      It's written in perl, and has extensive documentation. Run it as
@@ -327,3 +372,15 @@ A: Two choices:
      entry in it) and I'd like it to include all the archives eventually.
      Also I'm pretty good at troubleshooting, if you're having problems
      with the script.
+
+Q: How much disk space will I need for my archive mirror/instance?
+
+A: Currently (2018-06-26), the archive is 93GB. The by-name and by-md5 trees
+   also seem to be 93GB apiece, but that's because hardlinks are used between
+   the two trees.
+
+   If you're using the sbosrcarch script to create your archive, you can
+   run a smaller (incomplete) archive. The config file (sbosrcarch.conf)
+   has a "maxfilemegs" setting. Any file larger that this, won't be
+   downloaded and archived. You can also blacklist builds (or whole
+   categories) to save space.
author	B. Watson <yalhcru@gmail.com>	2018-09-22 21:12:35 -0400
committer	B. Watson <yalhcru@gmail.com>	2018-09-22 21:12:35 -0400
commit	1ca6c7902e03ec8d5cc7b39ffba57bd1560eabae (patch)
tree	750a690f0d9f1bf401b8cb7ecd4087b54623a19e
parent	0693dcb548569f118d1d7eefdf2874ef56334eb2 (diff)
download	sbostuff-1ca6c7902e03ec8d5cc7b39ffba57bd1560eabae.tar.gz