Q: What is sbosrcarch?

A: sbosrcarch is "The SlackBuilds.Org Source Archive". It contains copies
   of the source files listed in the .info files for all (or almost all)
   the builds on SlackBuilds.org.

   sbosrcarch is also the name of the software that created and maintains
   the archive (more about this later, near the end of this FAQ).

Q: What is sbosrcarch for?

A: It's intended to be a backup location for source files that can't be
   downloaded. This happens mainly for these reasons:

   - The upstream web site goes down, is moved, or has connectivity
     issues (intermittent or long-term).
   - Upstream moves or removes the source, when they release a new version.

   Also, the archive is hosted on a fast, well-connected host. Sometimes
   you might choose to use the archive just for faster downloads.

   A side benefit of the archiving process is that the archive maintenance
   software produces a log of failed downloads, which can then be sent
   to the slackbuilds-users mailing list and/or build maintainer so it
   can be fixed quickly.

Q: Who is responsible for sbosrcarch?

A: The archive server is operated by Darren Austin, aka "Tadgy"
   on Freenode IRC. The archive script was written by B. Watson, aka
   "Urchlay" on Freenode. Both of us keep an eye on the logs and keep the
   archive healthy.

   The best way to contact us is using an IRC client to connect to
   Freenode and join the ##slackware or #slackbuilds channel.

   We can also be reached by email:

   B. Watson <yalhcru@gmail.com>
   Darren Austin <mirrors (at) slackware.uk>

   Please read this entire FAQ before asking us questions. Chances are,
   you'll find the answer here. If not, or if the answer isn't clear
   enough, we'll be happy to help.

   Note that the SlackBuilds.org team is NOT responsible for the
   archive. PLEASE don't bother them with questions about sbosrcarch,
   they're already busy enough maintaining the actual SlackBuilds site!
   Same goes for individual build maintainers.

Q: Why create a giant archive like this? Isn't it better to fix the
   SlackBuilds whose sources can't be downloaded?

A: Sort-of. Yes, if a SlackBuild references a no-longer-existing
   source download URL, it should be updated. Usually the SlackBuild
   maintainer is responsible for this. Sometimes the SBo admins take
   care of it instead. Sometimes, it takes longer than expected to
   update a SlackBuild: the new version uses a different build system,
   or requires some dependency to be updated first, or the maintainer
   is too busy with Real Life and can't spare the time just at the moment.

   Once the build is updated, it still doesn't appear instantly on the
   site. It has to sit in the "pending" queue until it's been reviewed by
   the admins, and then in the "ready" queue until the next public update.

   The SBo update process is complex, and requires coordination between
   the various admins. Generally this means that site updates ("Public
   www update" in the git log) only happen once a week.

   During the time it takes for the SlackBuild to get updated for the
   new download URL (and possibly new version), users won't be able to
   download the source as listed on the SBo site.

   That's what the archive is mainly intended for. It's a fallback,
   a stop-gap solution, that allows builds to keep working during the
   period between the source disappearing and the build being updated.
   Usually this is only a week or less, but sometimes things slip through
   the cracks...

Q: How do I use the archive?

A: Several answers here:

   - Using a tool that supports the archive, such as sbopkg or sbotools.

     This is by far the easiest way: they automatically use the archive
     if they need to, without you having to do any extra work.

   - Manually with a web browser. The easy way is to start at:

     http://slackware.uk/sbosrcarch/by-name/

     ...which shows a list of category directories (academic, accessibility,
     audio, etc). Choose a category, then within the category
     you'll see a list of build name directories. Each of these will
     contain the source file(s) for the build.

     Example: you can't download the source to system/atari800
     from its original URL, so you go to the by-name page, click on
     "system", then "atari800".  There you'll see the file you wanted,
     atari800-3.1.0.tar.gz (unless it's been updated since I wrote this).

   - With a download tool like wget or curl. You could do this using the
     same by-name tree as you would for manual lookups, but it's better to
     do this by md5sum. The base URL for this is:

     http://slackware.uk/sbosrcarch/by-md5/

     In the build's .info file, take the 'filename' part of each download
     URL. Example: "atari800-3.1.0.tar.gz", where the link is
     http://downloads.sourceforge.net/project/atari800/atari800/3.1.0/atari800-3.1.0.tar.gz

     Now take the MD5SUM (or MD5SUM_x86_64 if you're using DOWNLOAD_x86_64),
     and use the first two characters as subdirectory names, followed by the
     full md5sum. Example: we have

     MD5SUM="354f8756a7f33cf5b7a56377d1759e41"

     in the .info file. The directory for this would be:

     3/5/354f8756a7f33cf5b7a56377d1759e41

     Add this to the base URL and get:

     http://slackware.uk/sbosrcarch/by-md5/3/5/354f8756a7f33cf5b7a56377d1759e41/

     Now add the filename part from DOWNLOAD or DOWNLOAD_x86_64, and you get:

     http://slackware.uk/sbosrcarch/by-md5/3/5/354f8756a7f33cf5b7a56377d1759e41/atari800-3.1.0.tar.g

     This is the exact URL for the file, if it's actually present in the
     archive. Most likely, it will be, and your download will succeed. If
     the download fails, the file's not in the archive.

     Of course, all these steps should be automated. You'll end up writing
     a script in your favorite language to do the job. Or:

   - Using the sbosrc script

     Same as above, except someone's already written it for you. Download
     it here:

     http://urchlay.naptime.net/repos/sbostuff/plain/sbosrc

     ...or, it'd be better to use git:

     git clone git://urchlay.naptime.net/sbostuff.git

     Make it executable (chmod +x) and place it somewhere on your $PATH,
     such as /usr/local/bin.

     Whenever you need to download something from the archive, change
     to the directory containing the .info file (same place as the
     .SlackBuild) and just run:

     sbosrc

     ...which will check the current architecture (32-bit or 64-bit),
     parse the info file, calculate the URL as above, and download the
     file to the current directory.

Q: I need a specific older version of a source file, not the latest
   version that's packaged on SBo. Will the archive have it?

A: Probably not. Old versions don't disappear immediately when new
   ones are archived, but they do get purged monthly... or, almost:
   old files are deleted on the 30th of every month, and February is
   only 28 or 29 days long!

   Use the by-md5 tree if you're looking for an old version, since some
   builds use unversioned filenames (new one will overwrite the old,
   in the by-name tree).

   If you know the exact filename and/or md5sum, you can always try a
   google search for them. Use "quotes" around the filename.

Q: How do I know it's safe to use files downloaded from the archive?

A: The same way you know it's safe to use any file you downloaded for
   use with a SlackBuild: check the downloaded file's md5sum against
   the MD5SUM line in the build's .info file.

Q: How do I use the archive with automated tools such as sbopkg and sbotools?

A: For sbopkg and sbotools, you just run them normally. They'll automatically
   search the archive, if a source download fails.

Q: How complete is the archive?

A: Currently (2018-06-26), the by-md5 tree is 100% complete. This does
   NOT count blacklisted sources (see next question).

   For a more up-to-date answer, see the archive status page:

   http://slackware.uk/sbosrcarch/STATUS

   This gets updated nightly.

Q: Why are some sources missing from the archive?

A: Multiple answers:

   - The archiver couldn't download the file. Maybe the site was down
     when it tried, or the upstream developers removed the file. Generally
     this will require the build's maintainer to fix the .info file or
     update the SlackBuild to a newer version (that actually exists).
     In some cases, the archive operator will find the file and manually
     add it to the archive.

   - The archiver downloaded the file, but the download's md5sum doesn't
     match. The build maintainer will have to fix the .info file. We
     won't archive any files we can't verify by md5sum.

   - There is some software that can't be automatically downloaded
     (requires account creation on the upstream site) or whose license
     doesn't allow us to redistribute it.

     The classic example of both is development/jdk: Oracle's license
     requires that users download the file directly from their site and
     doesn't allow us (or anyone else) to offer it for download. Also,
     downloading from Oracle requires creating an Oracle account, so
     the archiver couldn't auto-download it even if it were allowed.

     Sources we can't download are blacklisted by the archiver, and
     don't count towards the completion percentage on the status page.
     The current blacklist is:

       academic/novocraft
       academic/wehi-weasel
       development/amd-app-sdk
       development/decklink-sdk
       development/jdk
       development/J-Link
       development/sqlcl
       development/sqldeveloper
       office/treesheets
       system/displaylink
       system/elo-mt-usb
       system/oracle-instantclient-devel
       system/oracle-xe
       system/oracle-instantclient-basic

   If you find a file in the archive that shouldn't be there due to
   its license not allowing redistribution, PLEASE let us know so we
   can remove and blacklist it. It is not our intention to violate
   anyone's license.

Q: Why do some of the by-name directories have filenames ending in ".x86_64"?

A: This is due to a design flaw in the archive structure. We assumed that
   download filenames would either be unique within an .info file, or else
   that 2 files with the same filename were in fact the same file.

   For 4 of the SlackBuilds, this turns out to be a bad assumption. Example:
   development/p4's .info file has this:

      DOWNLOAD="https://www.perforce.com/downloads/perforce/r18.1/bin.linux26x86/p4"
      DOWNLOAD_x86_64="https://www.perforce.com/downloads/perforce/r18.1/bin.linux26x86_64/p4"

   Notice that both URLs end in "/p4". The directory parts of the URL are
   different, but the filenames are the same. In the archive, the 32-bit
   download will be called "p4" and the 64-bit one will be "p4.x86_64".

   The archive script successfully downloads these files and stores them
   in the by-md5 tree in the correct directories. But when it tries to
   store them in the by-name tree, it's trying to save two files in the
   same directory with the same name. If it didn't use a different name,
   the second one would overwrite the first.

   The current list of builds affected by this is:

      academic/ucsc-blat
      development/p4
      development/p4d
      libraries/p4api

Q: I'm a SlackBuild maintainer, and the download URL for one of my builds
   has disappeared. Can I use the archive URL as the DOWNLOAD in my .info
   file?

A: Yes, but only as a temporary measure or a last resort.

   It's better to do one of these:

   - Find another copy of the source. Try a google search for the exact
     filename (in "quotes"), or the md5sum.

   - Host the source yourself, if you have access to a web or ftp server.

   - Ask on the slackbuilds-users mailing list. Someone will probably
     volunteer to host the source for you, provided you have a copy of
     it to send them (and if you don't, hey, there's this handy source
     archive you can probably get it from...)

   Using the archive as the DOWNLOAD results in less redundancy. Nobody
   is currently mirroring the archive that we know of. Ideally, we want
   every source file to have two working URLs: the original plus the
   sbosrcarch one.

Q: I'm a SlackBuild maintainer, and one of my builds keeps showing up
   on the sbosrcarch STATUS as missing. How can I prevent this?

   This usually happens for one of these reasons:

   1. You made a mistake in your submission. Double-check the DOWNLOAD URL(s)
      and MD5SUM(s) in the .info file. If they're wrong, resubmit your build.

   2. The filename in the download URL is "unversioned", meaning the version
      number isn't part of the filename (e.g. "thingy-latest.tar.gz"). At
      some point after you last updated your .info file, but before the
      SBo public update, the file changed on the server. Actually, this
      occasionally happens even for files that have the version number
      in the filename: upstream makes a mistake (leave a file out of the
      tarball for instance) and a day or so later, they fix it without
      changing the version number. When the archiver downloads the file,
      it checks the md5sum against your .info file and sees a mismatch,
      so it won't archive the file.

   3. Upstream made a new release after you updated your build, but before
      the SBo public update, and they removed the old version from their
      server (or, possibly, moved it to a different location like /archives/
      or /old-versions/). When the archiver tries to download the file, it
      gets a '404 Not Found' error.

   For (2) and (3), the problem is really the same: the web is a moving
   target. Your download URLs and their md5sums were valid, but they got
   changed on the server sometime after you submitted your build.

   The solution is the same for both: find somewhere else to host your
   source downloads. Either use your own web or ftp server if you have
   one, or ask on the mailing list and someone will probably volunteer
   to host it for you. Once you have the file(s) hosted somewhere,
   update your .info file to point to the new location.

   Before you do this, make sure the license allows you to: if it
   doesn't allow redistribution, you can't host the download somewhere
   else... and neither can we, so the build should be added to the
   sbosrcarch blacklist (let us know if this is the case).

   4. The file on the server is 'protected', because the server checks
      the HTTP Referer and/or User-agent fields in the request. Typically
      this means the download will work when using a browser, but will
      fail when using wget or curl. Usually when this happens, one of
      the sbosrcarch operators will manually download the file and add
      it to the archive within a day or two. If not, let us know and
      we'll get to it ASAP. Again, check the license of the download
      file: if redistribution is not allowed, it should be added to the
      blacklist and not kept in the archive.

Q: How do I create my own archive?

A: Two choices:

   - Mirror the directory the usual way, with rsync. Using wget
     would be possible, but it would use about twice the bandwidth and
     storage. This is because rsync supports hard links, which sbosrcarch
     makes extensive use of.

   - Get a copy of the sbosrcarch script and run it on your web server.
     This will be more work on your part, but your archive will be
     independent: it'll keep updating itself even if the original archive
     at slackware.uk goes away someday.

     The script lives here:

     git clone git://urchlay.naptime.net/sbostuff.git

     It's written in perl, and has extensive documentation. Run it as
     "sbosrcarch --help" to see the docs.

     If you're thinking about running a sbosrcarch instance, please
     contact me (yalhcru@gmail.com). I've got a list (with only one
     entry in it) and I'd like it to include all the archives eventually.
     Also I'm pretty good at troubleshooting, if you're having problems
     with the script.

Q: How much disk space will I need for my archive mirror/instance?

A: Currently (2018-06-26), the archive is 93GB. The by-name and by-md5 trees
   also seem to be 93GB apiece, but that's because hardlinks are used between
   the two trees.

   If you're using the sbosrcarch script to create your archive, you can
   run a smaller (incomplete) archive. The config file (sbosrcarch.conf)
   has a "maxfilemegs" setting. Any file larger that this, won't be
   downloaded and archived. You can also blacklist builds (or whole
   categories) to save space.