Q: What is sbosrcarch? A: sbosrcarch is "The SlackBuilds.Org Source Archive". It contains copies of the source files listed in the .info files for all (or almost all) the builds on SlackBuilds.org. sbosrcarch is also the name of the software that created and maintains the archive (more about this later, near the end of this FAQ). Q: What is sbosrcarch for? A: It's intended to be a backup location for source files that can't be downloaded. This happens mainly for these reasons: - The upstream web site goes down, is moved, or has connectivity issues (intermittent or long-term). - Upstream moves or removes the source, when they release a new version. Also, the archive is hosted on a fast, well-connected host. Sometimes you might choose to use the archive just for faster downloads. A side benefit of the archiving process is that the archive maintenance software produces a log of failed downloads, which can then be sent to the slackbuilds-users mailing list and/or build maintainer so it can be fixed quickly. Q: Who is responsible for sbosrcarch? A: The archive server is operated by Darren Austin, aka "Tadgy" on Freenode IRC. The archive script was written by B. Watson, aka "Urchlay" on Freenode. Both of us keep an eye on the logs and keep the archive healthy. The best way to contact us is using an IRC client to connect to Freenode and join the ##slackware or #slackbuilds channel. We can also be reached by email: B. Watson Darren Austin Please read this entire FAQ before asking us questions. Chances are, you'll find the answer here. If not, or if the answer isn't clear enough, we'll be happy to help. Note that the SlackBuilds.org team is NOT responsible for the archive. PLEASE don't bother them with questions about sbosrcarch, they're already busy enough maintaining the actual SlackBuilds site! Same goes for individual build maintainers. Q: Why create a giant archive like this? Isn't it better to fix the SlackBuilds whose sources can't be downloaded? A: Sort-of. Yes, if a SlackBuild references a no-longer-existing source download URL, it should be updated. Usually the SlackBuild maintainer is responsible for this. Sometimes the SBo admins take care of it instead. Sometimes, it takes longer than expected to update a SlackBuild: the new version uses a different build system, or requires some dependency to be updated first, or the maintainer is too busy with Real Life and can't spare the time just at the moment. Once the build is updated, it still doesn't appear instantly on the site. It has to sit in the "pending" queue until it's been reviewed by the admins, and then in the "ready" queue until the next public update. The SBo update process is complex, and requires coordination between the various admins. Generally this means that site updates ("Public www update" in the git log) only happen once a week. During the time it takes for the SlackBuild to get updated for the new download URL (and possibly new version), users won't be able to download the source as listed on the SBo site. That's what the archive is mainly intended for. It's a fallback, a stop-gap solution, that allows builds to keep working during the period between the source disappearing and the build being updated. Usually this is only a week or less, but sometimes things slip through the cracks... Q: How do I use the archive? A: Several answers here: - Using a tool that supports the archive, such as sbopkg or sbotools. This is by far the easiest way: they automatically use the archive if they need to, without you having to do any extra work. - Manually with a web browser. The easy way is to start at: http://slackware.uk/sbosrcarch/by-name/ ...which shows a list of category directories (academic, accessibility, audio, etc). Choose a category, then within the category you'll see a list of build name directories. Each of these will contain the source file(s) for the build. Example: you can't download the source to system/atari800 from its original URL, so you go to the by-name page, click on "system", then "atari800". There you'll see the file you wanted, atari800-3.1.0.tar.gz (unless it's been updated since I wrote this). - With a download tool like wget or curl. You could do this using the same by-name tree as you would for manual lookups, but it's better to do this by md5sum. The base URL for this is: http://slackware.uk/sbosrcarch/by-md5/ In the build's .info file, take the 'filename' part of each download URL. Example: "atari800-3.1.0.tar.gz", where the link is http://downloads.sourceforge.net/project/atari800/atari800/3.1.0/atari800-3.1.0.tar.gz Now take the MD5SUM (or MD5SUM_x86_64 if you're using DOWNLOAD_x86_64), and use the first two characters as subdirectory names, followed by the full md5sum. Example: we have MD5SUM="354f8756a7f33cf5b7a56377d1759e41" in the .info file. The directory for this would be: 3/5/354f8756a7f33cf5b7a56377d1759e41 Add this to the base URL and get: http://slackware.uk/sbosrcarch/by-md5/3/5/354f8756a7f33cf5b7a56377d1759e41/ Now add the filename part from DOWNLOAD or DOWNLOAD_x86_64, and you get: http://slackware.uk/sbosrcarch/by-md5/3/5/354f8756a7f33cf5b7a56377d1759e41/atari800-3.1.0.tar.g This is the exact URL for the file, if it's actually present in the archive. Most likely, it will be, and your download will succeed. If the download fails, the file's not in the archive. Of course, all these steps should be automated. You'll end up writing a script in your favorite language to do the job. Or: - Using the sbosrc script Same as above, except someone's already written it for you. Download it here: http://urchlay.naptime.net/repos/sbostuff/plain/sbosrc ...or, it'd be better to use git: git clone git://urchlay.naptime.net/sbostuff.git Make it executable (chmod +x) and place it somewhere on your $PATH, such as /usr/local/bin. Whenever you need to download something from the archive, change to the directory containing the .info file (same place as the .SlackBuild) and just run: sbosrc ...which will check the current architecture (32-bit or 64-bit), parse the info file, calculate the URL as above, and download the file to the current directory. Q: I need a specific older version of a source file, not the latest version that's packaged on SBo. Will the archive have it? A: Probably not. Old versions don't disappear immediately when new ones are archived, but they do get purged monthly... or, almost: old files are deleted on the 30th of every month, and February is only 28 or 29 days long! Use the by-md5 tree if you're looking for an old version, since some builds use unversioned filenames (new one will overwrite the old, in the by-name tree). If you know the exact filename and/or md5sum, you can always try a google search for them. Use "quotes" around the filename. Q: How do I know it's safe to use files downloaded from the archive? A: The same way you know it's safe to use any file you downloaded for use with a SlackBuild: check the downloaded file's md5sum against the MD5SUM line in the build's .info file. Q: How do I use the archive with automated tools such as sbopkg and sbotools? A: For sbopkg and sbotools, you just run them normally. They'll automatically search the archive, if a source download fails. Q: How complete is the archive? A: Currently (2018-06-26), the by-md5 tree is 100% complete. This does NOT count blacklisted sources (see next question). For a more up-to-date answer, see the archive status page: http://slackware.uk/sbosrcarch/STATUS This gets updated nightly. Q: Why are some sources missing from the archive? A: Multiple answers: - The archiver couldn't download the file. Maybe the site was down when it tried, or the upstream developers removed the file. Generally this will require the build's maintainer to fix the .info file or update the SlackBuild to a newer version (that actually exists). In some cases, the archive operator will find the file and manually add it to the archive. - The archiver downloaded the file, but the download's md5sum doesn't match. The build maintainer will have to fix the .info file. We won't archive any files we can't verify by md5sum. - There is some software that can't be automatically downloaded (requires account creation on the upstream site) or whose license doesn't allow us to redistribute it. The classic example of both is development/jdk: Oracle's license requires that users download the file directly from their site and doesn't allow us (or anyone else) to offer it for download. Also, downloading from Oracle requires creating an Oracle account, so the archiver couldn't auto-download it even if it were allowed. Sources we can't download are blacklisted by the archiver, and don't count towards the completion percentage on the status page. The current blacklist is: academic/novocraft academic/wehi-weasel development/amd-app-sdk development/decklink-sdk development/jdk development/J-Link development/sqlcl development/sqldeveloper office/treesheets system/displaylink system/elo-mt-usb system/oracle-instantclient-devel system/oracle-xe system/oracle-instantclient-basic If you find a file in the archive that shouldn't be there due to its license not allowing redistribution, PLEASE let us know so we can remove and blacklist it. It is not our intention to violate anyone's license. Q: Why do some of the by-name directories have filenames ending in ".x86_64"? A: This is due to a design flaw in the archive structure. We assumed that download filenames would either be unique within an .info file, or else that 2 files with the same filename were in fact the same file. For 4 of the SlackBuilds, this turns out to be a bad assumption. Example: development/p4's .info file has this: DOWNLOAD="https://www.perforce.com/downloads/perforce/r18.1/bin.linux26x86/p4" DOWNLOAD_x86_64="https://www.perforce.com/downloads/perforce/r18.1/bin.linux26x86_64/p4" Notice that both URLs end in "/p4". The directory parts of the URL are different, but the filenames are the same. In the archive, the 32-bit download will be called "p4" and the 64-bit one will be "p4.x86_64". The archive script successfully downloads these files and stores them in the by-md5 tree in the correct directories. But when it tries to store them in the by-name tree, it's trying to save two files in the same directory with the same name. If it didn't use a different name, the second one would overwrite the first. The current list of builds affected by this is: academic/ucsc-blat development/p4 development/p4d libraries/p4api Q: I'm a SlackBuild maintainer, and the download URL for one of my builds has disappeared. Can I use the archive URL as the DOWNLOAD in my .info file? A: Yes, but only as a temporary measure or a last resort. It's better to do one of these: - Find another copy of the source. Try a google search for the exact filename (in "quotes"), or the md5sum. - Host the source yourself, if you have access to a web or ftp server. - Ask on the slackbuilds-users mailing list. Someone will probably volunteer to host the source for you, provided you have a copy of it to send them (and if you don't, hey, there's this handy source archive you can probably get it from...) Using the archive as the DOWNLOAD results in less redundancy. Nobody is currently mirroring the archive that we know of. Ideally, we want every source file to have two working URLs: the original plus the sbosrcarch one. Q: I'm a SlackBuild maintainer, and one of my builds keeps showing up on the sbosrcarch STATUS as missing. How can I prevent this? This usually happens for one of these reasons: 1. You made a mistake in your submission. Double-check the DOWNLOAD URL(s) and MD5SUM(s) in the .info file. If they're wrong, resubmit your build. 2. The filename in the download URL is "unversioned", meaning the version number isn't part of the filename (e.g. "thingy-latest.tar.gz"). At some point after you last updated your .info file, but before the SBo public update, the file changed on the server. Actually, this occasionally happens even for files that have the version number in the filename: upstream makes a mistake (leave a file out of the tarball for instance) and a day or so later, they fix it without changing the version number. When the archiver downloads the file, it checks the md5sum against your .info file and sees a mismatch, so it won't archive the file. 3. Upstream made a new release after you updated your build, but before the SBo public update, and they removed the old version from their server (or, possibly, moved it to a different location like /archives/ or /old-versions/). When the archiver tries to download the file, it gets a '404 Not Found' error. For (2) and (3), the problem is really the same: the web is a moving target. Your download URLs and their md5sums were valid, but they got changed on the server sometime after you submitted your build. The solution is the same for both: find somewhere else to host your source downloads. Either use your own web or ftp server if you have one, or ask on the mailing list and someone will probably volunteer to host it for you. Once you have the file(s) hosted somewhere, update your .info file to point to the new location. Before you do this, make sure the license allows you to: if it doesn't allow redistribution, you can't host the download somewhere else... and neither can we, so the build should be added to the sbosrcarch blacklist (let us know if this is the case). 4. The file on the server is 'protected', because the server checks the HTTP Referer and/or User-agent fields in the request. Typically this means the download will work when using a browser, but will fail when using wget or curl. Usually when this happens, one of the sbosrcarch operators will manually download the file and add it to the archive within a day or two. If not, let us know and we'll get to it ASAP. Again, check the license of the download file: if redistribution is not allowed, it should be added to the blacklist and not kept in the archive. Q: How do I create my own archive? A: Two choices: - Mirror the directory the usual way, with rsync. Using wget would be possible, but it would use about twice the bandwidth and storage. This is because rsync supports hard links, which sbosrcarch makes extensive use of. - Get a copy of the sbosrcarch script and run it on your web server. This will be more work on your part, but your archive will be independent: it'll keep updating itself even if the original archive at slackware.uk goes away someday. The script lives here: git clone git://urchlay.naptime.net/sbostuff.git It's written in perl, and has extensive documentation. Run it as "sbosrcarch --help" to see the docs. If you're thinking about running a sbosrcarch instance, please contact me (yalhcru@gmail.com). I've got a list (with only one entry in it) and I'd like it to include all the archives eventually. Also I'm pretty good at troubleshooting, if you're having problems with the script. Q: How much disk space will I need for my archive mirror/instance? A: Currently (2018-06-26), the archive is 93GB. The by-name and by-md5 trees also seem to be 93GB apiece, but that's because hardlinks are used between the two trees. If you're using the sbosrcarch script to create your archive, you can run a smaller (incomplete) archive. The config file (sbosrcarch.conf) has a "maxfilemegs" setting. Any file larger that this, won't be downloaded and archived. You can also blacklist builds (or whole categories) to save space.