Hi all. I figured I'd post this to the larger thread so others can see it as well...
We can definitely use the Atari community's help to improve the preservation project and you don't need to be highly technical to contribute. Even just a knowledge of how to use a spreadsheet and a willingness to scour through the existing public website, magazine PDFs, etc. would be extremely valuable. See the items highlighted in red bold-italics below for specific areas we could use some help and please PM me if you'd like to get involved.
The longer version
We have a very small group of people assisting with the core preservation project work and the bulk of what we do is the actual dumping of media, scanning, performing time-consuming analysis, scouring old publications and doing other research to identify release dates, similar titles from a time period, similar protection types, etc. This information not only helps in cataloging everything but can also be valuable when we encounter rare disks or tapes with data corruption and need to manually attempt to fix them. For example, a non-trivial number of ATX and CAS files do not come from a single raw dump but are actually manual reconstructions from several raw dumps that all have data corruption in different parts of the disk or tape. We've created and maintain various tools to help us in these endeavors.
I wrote the a8preservation.com site itself completely from scratch and I have been slowly enhancing it to expose the results of our research and experimenting with new capabilities that don't exist (at least to my knowledge) in the Atari community today. For example...
1. Rather than simply creating a single category for a title, the site uses a tagging system which allows multiple tag associations per title. The site's search feature can filter based on the presence of one or more tags and even the absence of tags. This is how the Browse Software page is driven. For example, clicking the Text Adventures - Fantasy category is really a search for all titles that have the "Game", "Adventure", "Text", "Fantasy" and "NOT Graphics" tags. These tags are also used to generate the "You may also be interested in..." section at the bottom of each title page. There are even tags for all the different vendor disk protection variants we have identified so far.
We can use help identifying and fixing gaps in the tagging of our titles and releases.
2. Our Publications section has entries for most major Atari magazines. If you click into a particular issue, you will get a list of articles. Some articles also have the article text itself available.
Software review articles are of particular significance because we can discover information such as details of a particular release, how many disks were included, etc. The database currently has the ability to establish a relationship between a title and its software reviews. If this relationship exists, and we have the text of the article in the database, the site performs AI-based sentiment analysis of the review texts and generates an aggregate "critics' score" for the title. An example of this can be seen on the page for Jumpman.
We also have the ability to establish a relationship between titles and magazine ads (as also seen on the Jumpman page). This is another data point we use to identify release dates for a title if we don't have other indicators or if the other indicators are misleading (e.g. copyright dates that come from the release of the title on another platform rather than the actual Atari release date).
We can use help identifying missing article entries from each magazine issue, identifying missing ads, creating formatted article text from magazine PDFs, etc.
3. We still have gaps in our release metadata such as number of players, memory requirements, supported controllers, BASIC, etc. These are not only displayed on the website but also drive the generation of the filenames that we distribute.
We can use help identifying and resolving gaps in our release metadata.
Why is the above work useful to the community?
Well, there are two main reasons I have right now...
1. I've been slowly working on a public REST API for the site. It's still being actively refined but is already driving portions of the website today. This API could be used by the community in various ways. An obvious example would be an emulator getting the data it needs to configure itself accurately from the file CRC/MD5 when a disk, cassette or cartridge is loaded.
2. I am not currently aware of a data-driven estimation of title rarity. Others have made a great effort to estimate this manually based on their impressive knowledge of the Atari scene, but I'd like to try something more deterministic. We've been accumulating dumps for 7-8 years now. I think we have a number of data points that could go into a rarity calculation including how many dumps of that title we've seen, whether the media was mass produced, how many magazine reviews exist, how many publication ads exist, etc.
Sorry for the long post but it's been a while since I've shared what we've been doing in the background besides releasing a new collection of media dumps every few months. If you have read up to this point, thanks for your attention and please consider donating some time 😉