Jump to content
IGNORED

ATX support for long sectors WIP


ijor

Recommended Posts

I'm implementing a few enhancements to the ATX file format. One of them is potentially a bit delicate in terms of backwards compatibility. Especially now that there are several implementations in embedded SIO2XX hardware where changes are not as easy to implement. So I would like some feedback from the developers before establishing the final format.

 

One of the copy protections used on the Atari is so called big or long sectors. Sectors on a single density track normally have a size of 128 bytes. Long  sectors are formatted with a physical size of 256 bytes, or sometimes more. The drive doesn't expect this and produces an error (the exact behavior depends on the firmware) and this can be verified by the protection check. ATX images already specify this type of error, but they don't include the full data of the sector beyond the first 128 bytes. In most cases the full content of a long sector is not important because the computer normally can only access the first 128 bytes. But there are a couple of cases that the full content is relevant and then the idea to extend the format to support including the full sector data.

 

There are a couple of ways to implement this that I am considering. The simplest way would be simply to include the full content at the normal sector data location in the track record. ATX images already support recording the exact physical size of long sectors, and that can be easily extended to signal that the full content is stored. The problem with this method is that, conceivable, it might break backwards compatibility with software that expects that data for every sector would always be 128 bytes. I'm not concerted about older software not being able to access the full sector data. That of course can't be helped. My concern is older software being confused by the larger track data record and not being able to process the track correctly at all.

 

The other way is to implement a new out of band chunk and store the full sector data (or the rest of the sector data) in that new chunk. This probably would have a lower risk of breaking backwards compatibility. But out of band processing is always a bit of a PITA, and this is definitely not the most efficient method.

 

The other enhancements would be to optionally store the recorded bitrate and support for higher densities. These are simpler to implement, at least from the compatibility point of view.

 

Edited by ijor
  • Like 1
Link to comment
Share on other sites

Given the difficulty of obtaining some of the hardware that may be affected, it may be best just to post a test disk with boot sectors stored as 256 bytes and have people check whether their hardware can boot it. The simplest method would be to store the extra sector data in-band but there may be implementations that don't handle this, particularly before the chunk nature of the ATX format was revealed. The case that's most likely to break is such a sector stored at the end of the sector data chunk with weak sector info after it.

 

Should a problem be discovered, another option would be to just take the incompatibility and include a small Python script to strip the info for older parsers.

 

  • Like 1
Link to comment
Share on other sites

Are there many disks that use this type of protection?  If the number is small, then the potential impact on older devices is also minimal, and I'd vote for the more straightforward (inline) implementation for the sake of simplicity.  As phaeron points out, there are always mitigation mechanisms in those cases anyway.  (I'm wondering: since sector data is indicated as an offset to the data chunk, it's hard to imagine an implementation that wouldn't simply ignore/skip the additional data.)

 

Quote

ATX images already support recording the exact physical size of long sectors,

You mean by the Extended Sector chunk, right?  Is it safe to assume that the actual size of the physical sector will be one of the values currently defined (128, 256, 512, 1024), or might they be an arbitrary size?

 

Edited by jamm
Link to comment
Share on other sites

21 minutes ago, jamm said:

Are there many disks that use this type of protection?  If the number is small, then the potential impact on older devices is also minimal, and I'd vote for the more straightforward (inline) implementation for the sake of simplicity.  As phaeron points out, there are always mitigation mechanisms in those cases anyway.

All double density disks have an extra 128 bytes in each boot sector that are usually ignored but occasionally contain valuable data. In particular, ATR8000 CP/M disks have the same physical geometry as a standard 180K double density disk but store a full 256 bytes of valid data in these sectors. From the standpoint of a dumping tool, it would be important to be able to dump the full sectors without having to guess as to whether they are relevant.

 

21 minutes ago, jamm said:

(I'm wondering: since sector data is indicated as an offset to the data chunk, it's hard to imagine an implementation that wouldn't simply ignore/skip the additional data.)

Before the VAPI library was open sourced, we had to reverse engineer the ATX file format and got some parts of it wrong. One of them was the overall structure of a track, which we had assumed was a predefined order, but was actually a chunk based format. As a result, some older parsers may not have the correct algorithm to determine the location of weak sector info after the sector data chunk. For instance, versions of Altirra prior to 2.60 don't use the chunk size, they estimate the location of the weak sector data as 128 bytes past the highest sector data offset. I'm not terribly concerned about five year old versions of my emulator, but there may be more recent programs or devices that still have issues.

 

21 minutes ago, jamm said:

You mean by the Extended Sector chunk, right?  Is it safe to assume that the actual size of the physical sector will be one of the values currently defined (128, 256, 512, 1024), or might they be an arbitrary size?

The address field only has two bits to encode the sector size and thus those are the only two sizes possible. Even if the sector is encoded as some other size, such as with a missing/truncated/overlapped data field, that's still the sector's size from the FDC's point of view.

 

  • Like 1
Link to comment
Share on other sites

2 hours ago, phaeron said:

All double density disks have an extra 128 bytes in each boot sector that are usually ignored but occasionally contain valuable data.

All except those formatted by the 1050 Duplicator. For double density, all known 1050 Duplicator firmware actually formatted 128 byte physical (MFM) sectors 1-3, which created wonderful density detection issues in other drives. :) (ie the US doubler would have a 3/18ths chance of thinking the disk was enhanced density depending which sector on track 0 it happened to see first...)

 

Not a lovely trait to preserve, but I guess it's a unique fingerprint to ID what formatted those disks.

Link to comment
Share on other sites

I'm implementing this shortly. In the meantime, I would like to compile a list of all the ATX image creators.

 

So everybody that implements a tool that creates ATX images, please record your creator signature here. Remember that this is a 16-bit field in the ATX file header. I'll include the list when publishing the new track definitions for the new enhancements. Thanks.

 

  • Like 2
Link to comment
Share on other sites

I leave this field empty ($0000) in RespeQt. I guess this is not a good practice ?

Maybe you can tell me which free creator signature we should use for RespeQt once you have all the signatures from the other tools.

 

EDIT: also, if you open an ATX file and write part of it inside RespeQt, I keep the original header.

So actually, it keeps the original creator signature. This is probably something to change also.

Edited by ebiguy
Link to comment
Share on other sites

On 9/25/2020 at 8:45 AM, ebiguy said:

Maybe you can tell me which free creator signature we should use for RespeQt once you have all the signatures from the other tools.

 

The field is 16 bits so as you could imagine most values are still free. You might want to choose two letters that resemble the RespeQt name somehow?

  • Like 1
Link to comment
Share on other sites

On 9/25/2020 at 4:45 AM, ebiguy said:

EDIT: also, if you open an ATX file and write part of it inside RespeQt, I keep the original header.

So actually, it keeps the original creator signature. This is probably something to change also.

 

10 hours ago, ijor said:

 

The field is 16 bits so as you could imagine most values are still free. You might want to choose two letters that resemble the RespeQt name somehow?

 

Would it be useful to reserve a few of the high bits for other information (assuming that all existing signatures have them blank)?

For example, the top 2 bits could be a counter, where 00 = original signature (backwards compatible with all existing sigs if they have them blank), 01 = ATX updated once from original ATX, sig from 2nd creator, 10 = ATX updated twice, sig from 3rd creator, 11 = ATX updated 3 or more times, sig from most recent creator.

 

This doesn't preserve the history of who the earlier creators were, but it does provide the information that it was not an original ATX off of a floppy, but that it is an ATX generated from a previous ATX.  

 

Just a thought.

Link to comment
Share on other sites

Too late to edit my post above, but to simplify the description of my proposed modification to the spec:

 

bits 15:14 = number of times ATX has been modified, pegs at 3

bits 13:12 = reserved for future use

bits 11:0 = signature of creator

 

This allows for 4K unique signatures, is backwards compatible with existing ATX files (assuming the they all have sigs with 0s in the high nibble), reserves a couple of bits for future use, and allows one to tell if the ATX is an original "rip", or if it was modified from a previous ATX.

 

If this idea is useful, maybe use 3 bits for counter instead of 2?  That would give 0-6 & 7+ instead of 0-2 & 3+.

 

I doubt that there would ever be 4K creators - 11 bits for 2K is probably sufficient.  Heck, I doubt that there would even be 1K creators!

 

Link to comment
Share on other sites

I like the idea of knowing if the file has been modified, but I don't know that knowing it's been modified X times is useful.

 

I think it'd be of more practical use to have an MD5 hash stored when the original ATX is created and then have a second MD5 hash stored when/if the ATX is updated along with an ATX creator ID for that second value.  That would give us verification that the file has not been corrupted and that it's been modified.

Of course, that would require either an extra header value, or use of some of the reserved bits, or a separate ATX record.

 

 

Link to comment
Share on other sites

4 minutes ago, jamm said:

I like the idea of knowing if the file has been modified, but I don't know that knowing it's been modified X times is useful.

 

I was thinking that the information that a small counter tells you that the ATX is:

 

An original ATX ripped from a floppy.

An ATX that was modified from an original ATX.

An ATX that was modified from a non-original ATX.

 

Link to comment
Share on other sites

18 minutes ago, jamm said:

I like the idea of knowing if the file has been modified, but I don't know that knowing it's been modified X times is useful.

 

I think it'd be of more practical use to have an MD5 hash stored when the original ATX is created and then have a second MD5 hash stored when/if the ATX is updated along with an ATX creator ID for that second value.  That would give us verification that the file has not been corrupted and that it's been modified.

Of course, that would require either an extra header value, or use of some of the reserved bits, or a separate ATX record.

 

 

that's a good idea from a preservation standpoint, and I think it's something that all of the archival formats should adopt.

 

I would even argue it one step further, to keep a delta of changes, adding to the file as needed. Although arguably this would be useful to very few people (e.g. to those who want to use copy protected disks, and keep their high scores, or somebody who accidentally uses their original Ultima I character disk to store a character. :)

 

-Thom

Link to comment
Share on other sites

Altirra -3.00: creator $0002 version 0

Altirra 3.00+: creator $5441 ("AT") version 0

a8rawconv -0.93: creator $0002 version 0

a8rawconv 0.94+: creator $5241 ("AR") version 0

 

I don't think keeping non-mnemonic numeric codes for creators is a good idea as there's not good precedent for keeping such a global registry. The best example we have is cartridge type IDs and it still isn't managed in a way that anyone could easily allocate a new ID and disseminate that information to everyone who needs to implement it. Also, such a system makes it impossible for tools to identify creators added after the tool was made. Best action IMO would be to deprecate the 16-bit creator fields and just put it in a string metadata chunk at the end of the file.

 

MD5s cannot be incrementally updated by design and thus using them would necessarily require that metadata to be outside of the region covered by the MD5s. You can do the exclusion trick for one MD5 but the moment you have two MD5s covering each other you are screwed. IMO integrity checksums don't belong in the file -- we have this for the .CAR format and it's only been a headache while not catching anything, when it's even verified at all. Traditional archive formats are better for both integrity checking and compression, and if you really want to be able to still use the files directly it's better to support transparent gzipping on the files.

 

A general problem with metadata is that it's hard for automated processes to figure out how to deal with it when resaving the disk. If an emulator or disk editing tool needs to update or rewrite a file, it's not convenient to pop up a big dialog asking how to handle the metadata. Best and simplest way I know of to handle this is what PNG does and just have one bit for each metadata item indicating whether to keep or drop that item on save.

 

  • Like 1
Link to comment
Share on other sites

3 hours ago, phaeron said:

Best action IMO would be to deprecate the 16-bit creator fields and just put it in a string metadata chunk at the end of the file.

Mnemonic IDs make sense to me. Since these are unlikely to have a purpose in real-time use (i.e. change the behavior of an ATX reader) and are more for tracking down potential bugs, IMO, why not use the already-established "Data Record" structure of ATX files and either create a new "Creator" data record or use the existing "Hots Data" record?  That strikes me as more consistent than adding an extra block of metadata at the end of the file. 

 

Quote

MD5s cannot be incrementally updated by design and thus using them would necessarily require that metadata to be outside of the region covered by the MD5s. You can do the exclusion trick for one MD5 but the moment you have two MD5s covering each other you are screwed.

True enough, although I do think it'd still be useful from the software preservation aspect.  I think the calculation would only be interesting in the case where an archiving program is creating an image from its source.  Any modifications after that don't necessarily have to be documented by a flag or such, since it'd be evident there was a change from the hash mismatch.

 

The hash doesn't make much sense for programs that simply create an ATX as another form of "blank Atari disk image" in lieu of ATR (I believe RespeQt is able to do this), so those might simply leave it blank.

 

Quote

A general problem with metadata is that it's hard for automated processes to figure out how to deal with it when resaving the disk.

Let's take inventory of what current metadata falls into that category:

  • Creator code
  • MD5 hash (potentially)

I can't think of anything else, really.  Although there's other metadata in the file, one would expect those other values to change as needed (e.g. if the sector count or density were to change, etc.)

 

I can't think of any cases where a "MODIFIER ID" would be useful.  For simplicity's sake, I would argue that if a program modifies an ATX file, it should change the CREATOR ID to it's own value, and then optionally change the MD5 hash if that were also included.  It could also leave the MD5 alone: just the fact that it no longer matches is indication enough that we no longer have the original file.

 

In that usage, the MD5 becomes "The hash value when the image was created from the source media", rather than "The currently correct hash value".  In that sense, it works a bit more as the currently-unused "IMAGE ID/IMAGE VERSION" header fields.

 

Again, I'm assuming that no one is really going to care about any hash value except for cases of software preservation where they want to make sure they have the original or same disk image as was originally created.  If the MD5 doesn't match, then who modified it and why seems unimportant - they're going to keep looking for the original/correct one.

 

Or it could be even simpler and leave all that out... :)

 

 

 

 

Edited by jamm
Link to comment
Share on other sites

  • 2 weeks later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...