apersson850 Posted September 14, 2020 Share Posted September 14, 2020 Using MOVB, it's simple. The problem is we want to use MOV for efficiency. Quote Link to comment Share on other sites More sharing options...
+mizapf Posted September 14, 2020 Share Posted September 14, 2020 People still think too much in terms of 8 bit. Byte manipulation is expensive on the 16-bit TMS9900. As seen from the CPU, the memory is a sequence of 16-bit values, and every move across word boundaries will necessarily involve changing the memory contents, and not only the location. Quote Link to comment Share on other sites More sharing options...
wierd_w Posted September 14, 2020 Share Posted September 14, 2020 (I was being pulled literally 5 different directions when I was writing the prior post. Take that with some salt. I am home now, and no longer have such demands placed on me.) The intention was that you could pad your "Odd number of bytes" data with a preceding 1 byte pad, to make it even-aligned data. Then you can copy using only whole-word-copy operations. If you retain "even aligned with padding", and don't try to reclaim the RAM, you lose 1 byte for every odd lengthed data element. However, I was not contemplating that there is a restriction on the word vs memory situation. (EG, that the CPU addresses at the word level, so every single byte operation has to be chunked up and manipulated so that it correlates with a whole word being read or written.) Again, I was being literally pulled in 5 directions. This sadly, means you cannot use "odd address start" as a signal, since you have to write whole words at a time, with even alignment, to get the value-add of MOV. It might be better to have the data element itself contain a header specifying how large the element is, then back-pad to even length, artificially imposing even-alignment, and just waste the memory with the padding for the performance boost. (Unless I am mistaken again, and you totally CAN do a whole word operation in one cycle on an odd numbered location that straddles a word boundary. In which case, you can reclaim the padding by overwriting it on subsequent writes, and end up wasting only a single byte at the end of the chain.) Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 14, 2020 Share Posted September 14, 2020 No, you can't do word operations on odd addresses. If you know the hardware details of the TMS 9900 CPU, you know that it has only 15 address bits. Thus it's always addressing words at even addresses. If you do a byte access, then the hardware reads the whole word, since it can't access memory with less granularity than that, then shifts the read data internally if necessary, to make it look the same if the byte originally came from the left or right half of the word. That's why I keep repeating that it's not a question about if there's an odd or even number of bytes to move. It's about if the source and the destination are differently aligned that there's no cure that will allow you to use MOV, none that doesn't cost much more than just using MOVB. But it is of course possible to write code that can do MOV, if the source and destination are equally aligned, and revert to using MOVB if they aren't. Then at least half of the time it will be faster. Otherwise you have to figure out algorithms that don't need to move data from odd to even alignment. 1 Quote Link to comment Share on other sites More sharing options...
wierd_w Posted September 14, 2020 Share Posted September 14, 2020 yes, that is congruent with my (now) current understanding. The issue is that you want to use MOV, but your data is of an odd length. (and or, exists at a location that is an odd number, and thus straddles a word. Which only happens if a data element is an odd number of bytes somewhere in memory before it. In either case, padding it, and enforcing even alignment affords the speedup.) This can cause problems in at least two ways. First, if your source is not even aligned, then your reads will straddle a word boundary too, not just the destination. You would have to identify this, do a MOVB to read the bottom half of the word your data is straddling, then read your file out as if you had chopped the first byte off of it, at even aligned word boundaries until the end of the record using MOV. The second is if you are writing to a location that is not even aligned, in which you would have to do similar. These two conditions could be either-or, or BOTH at the same time. As such, you have to check for all 3 conditions, with the OTHER condition being straight up MOV only copy (because data at both source and destination are even aligned.) If this data is being read from a ROM, I would suggest a combination of padding in the ROM (to force even alignment), with a header or some other construct to know if the data was padded, so it can be treated properly. (Or better yet, require that your data format always be an even number of bytes.) With the padding already in the data structure, you can always trust that the data is even aligned, and not worry about it. The only issue would be to ensure that your initial program can determine if it has been loaded into an odd numbered address, pad itself to fix the alignment, then continue. I would think the best option would be to strive to assure that your data never is anything other than word boundary aligned from the beginning. Then you wouldn't have to invest instruction cycles on testing to see if it is not (because you can presume not, because you know exactly where cartridge memory is, and control how data is mapped there-- Assuming this is a ROM), and you wouldn't have to waste instruction cycles on wastefully slow copy operations that could run foul of compounding conditions of source, destination, or both not being word aligned, which could easily pile up if you are allowing data that does not meet the boundary alignment condition. I think the best solution is a combination of environmental sanitization at init (check only once at first execution if program was loaded outside of word alignment, then fix it if it was), followed by strict data type rules. Quote Link to comment Share on other sites More sharing options...
BillG Posted September 14, 2020 Share Posted September 14, 2020 If the size of the block is large enough, it may be worthwhile to try to copy words at a time. 00001 * 00002 * R0 = source address 00003 * R1 = destination address 00004 * R2 = count 00005 * 0000 00006 Move 0000 0282 0006 00007 ci R2,6 ; Enough to get clever? 0004 1404 (000E) 00008 jhe TryWord 00009 0006 00010 ByteLoop 0006 DC70 00011 movb *R0+,*R1+ ; Copy a byte at a time 0008 0602 00012 dec R2 ; More? 000A 16FD (0006) 00013 jne ByteLoop 00014 000C 00015 MoveDone 000C 045B 00016 b *R11 ; Return 00017 000E 00018 TryWord 000E 0203 0001 00019 li R3,1 ; Load mask for LSB 00020 0012 C100 00021 mov R0,R4 ; Are they aligned? 0014 2901 00022 xor R1,R4 0016 2503 00023 czc R3,R4 0018 16F6 (0006) 00024 jne ByteLoop ; No, do a byte at a time 00025 001A 24C0 00026 czc R0,R3 ; Word aligned? 001C 1302 (0022) 00027 jeq Aligned 00028 001E DC70 00029 movb *R0+,*R1+ ; Copy the first byte 0020 0602 00030 dec R2 00031 0022 00032 Aligned 0022 C102 00033 mov R2,R4 ; Save count 00034 0024 0912 00035 srl R2,1 ; Convert to number of words 00036 0026 00037 WordLoop 0026 CC70 00038 mov *R0+,*R1+ ; Copy a word 0028 0602 00039 dec R2 002A 16FD (0026) 00040 jne WordLoop 00041 002C 24C4 00042 czc R4,R3 ; One byte left over? 002E 13EE (000C) 00043 jeq MoveDone ; No 00044 0030 DC70 00045 movb *R0+,*R1+ ; Copy the last byte 00046 0032 045B 00047 b *R11 ; Return 00048 00049 * 00050 * Just to see the overhead of combining bytes into words 00051 * 0034 00052 NotAligned 0034 24C0 00053 czc R0,R3 ; Is source aligned? 0036 160F (0056) 00054 jne DestAligned 00055 0038 C102 00056 mov R2,R4 ; Save count 00057 003A 0912 00058 srl R2,1 ; Convert to number of words 00059 003C D170 00060 movb *R0+,R5 ; Get unaligned byte 00061 003E 00062 SrcLoop 003E C1B0 00063 mov *R0+,R6 ; Get next word 00064 0040 06C6 00065 swpb R6 ; Assemble dest word 0042 C1C6 00066 mov R6,R7 0044 D185 00067 movb R5,R6 00068 0046 CC46 00069 mov R6,*R1+ ; Store word 00070 0048 D147 00071 movb R7,R5 ; Ready for next 00072 004A 0602 00073 dec R2 004C 16F8 (003E) 00074 jne SrcLoop 00075 004E 24C4 00076 czc R4,R3 ; One byte left over? 0050 13DD (000C) 00077 jeq MoveDone ; No 00078 0052 DC45 00079 movb R5,*R1+ ; Store the last byte 00080 0054 045B 00081 b *R11 ; Return 00082 0056 00083 DestAligned 00084 * 00085 * Why bother? The overhead is too bad. 00086 * 1 Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 14, 2020 Share Posted September 14, 2020 24 minutes ago, BillG said: If the size of the block is large enough, it may be worthwhile to try to copy words at a time. Spoiler 00001 * 00002 * R0 = source address 00003 * R1 = destination address 00004 * R2 = count 00005 * 0000 00006 Move 0000 0282 0006 00007 ci R2,6 ; Enough to get clever? 0004 1404 (000E) 00008 jhe TryWord 00009 0006 00010 ByteLoop 0006 DC70 00011 movb *R0+,*R1+ ; Copy a byte at a time 0008 0602 00012 dec R2 ; More? 000A 16FD (0006) 00013 jne ByteLoop 00014 000C 00015 MoveDone 000C 045B 00016 b *R11 ; Return 00017 000E 00018 TryWord 000E 0203 0001 00019 li R3,1 ; Load mask for LSB 00020 0012 C100 00021 mov R0,R4 ; Are they aligned? 0014 2901 00022 xor R1,R4 0016 2503 00023 czc R3,R4 0018 16F6 (0006) 00024 jne ByteLoop ; No, do a byte at a time 00025 001A 24C0 00026 czc R0,R3 ; Word aligned? 001C 1302 (0022) 00027 jeq Aligned 00028 001E DC70 00029 movb *R0+,*R1+ ; Copy the first byte 0020 0602 00030 dec R2 00031 0022 00032 Aligned 0022 C102 00033 mov R2,R4 ; Save count 00034 0024 0912 00035 srl R2,1 ; Convert to number of words 00036 0026 00037 WordLoop 0026 CC70 00038 mov *R0+,*R1+ ; Copy a word 0028 0602 00039 dec R2 002A 16FD (0026) 00040 jne WordLoop 00041 002C 24C4 00042 czc R4,R3 ; One byte left over? 002E 13EE (000C) 00043 jeq MoveDone ; No 00044 0030 DC70 00045 movb *R0+,*R1+ ; Copy the last byte 00046 0032 045B 00047 b *R11 ; Return 00048 00049 * 00050 * Just to see the overhead of combining bytes into words 00051 * 0034 00052 NotAligned 0034 24C0 00053 czc R0,R3 ; Is source aligned? 0036 160F (0056) 00054 jne DestAligned 00055 0038 C102 00056 mov R2,R4 ; Save count 00057 003A 0912 00058 srl R2,1 ; Convert to number of words 00059 003C D170 00060 movb *R0+,R5 ; Get unaligned byte 00061 003E 00062 SrcLoop 003E C1B0 00063 mov *R0+,R6 ; Get next word 00064 0040 06C6 00065 swpb R6 ; Assemble dest word 0042 C1C6 00066 mov R6,R7 0044 D185 00067 movb R5,R6 00068 0046 CC46 00069 mov R6,*R1+ ; Store word 00070 0048 D147 00071 movb R7,R5 ; Ready for next 00072 004A 0602 00073 dec R2 004C 16F8 (003E) 00074 jne SrcLoop 00075 004E 24C4 00076 czc R4,R3 ; One byte left over? 0050 13DD (000C) 00077 jeq MoveDone ; No 00078 0052 DC45 00079 movb R5,*R1+ ; Store the last byte 00080 0054 045B 00081 b *R11 ; Return 00082 0056 00083 DestAligned 00084 * 00085 * Why bother? The overhead is too bad. 00086 * Very nice, but all of your CZC instructions after the first one need the operands reversed, I think. ...lee Quote Link to comment Share on other sites More sharing options...
BillG Posted September 14, 2020 Share Posted September 14, 2020 I thought that CZC is just fancy 9900-ese for do an AND, set the EQ flag appropriately and throw away the result. If not, then you are right that the order matters. Quote Link to comment Share on other sites More sharing options...
+Lee Stewart Posted September 14, 2020 Share Posted September 14, 2020 21 minutes ago, BillG said: I thought that CZC is just fancy 9900-ese for do an AND, set the EQ flag appropriately and throw away the result. If not, then you are right that the order matters. Yeah, it looks for zeros in the destination operand that correspond to ones in the source operand. ...lee Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 14, 2020 Share Posted September 14, 2020 5 hours ago, wierd_w said: I would think the best option would be to strive to assure that your data never is anything other than word boundary aligned from the beginning. Obviously, but that wasn't the case in the original task here. Note that there's no problem with an odd number of bytes to transfer. You can always handle the odd one with a MOVB. Note that there's no problem with data starting at an odd address, or ending at an even, or both for that matter, as long as the destination does too. You can always handle the single bytes at the beginning and/or end by a MOVB, and handle the brunt of the data with MOV. No, the problem is only when the source starts at an even address and the destination at an odd, or the opposite. In such a case, whatever you do will be less efficient than simply do the whole transfer using MOVB for every byte. Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted September 15, 2020 Share Posted September 15, 2020 17 hours ago, apersson850 said: No, the problem is only when the source starts at an even address and the destination at an odd, or the opposite. In such a case, whatever you do will be less efficient than simply do the whole transfer using MOVB for every byte. I did a cycle analysis of 4 versions of the inner loop. Assume the source is even the dest is odd. Assume registers and code are in PAD (no wait state) and data is in expansion memory or cartridge. MOV MOV4 MOVB MOVB4 ALU 48 43 13 34 PAD 40 37 9 24 MEM 8 8 3 12 Cycles 240 224 68 212 per byte 60 56 68 53 where Cycles = 2*ALU + 2*PAD + (2+W)*MEM MOVB4 wins I did not realize that even MOV does read-before-write. The source and destination evaluation work the same--both do a read from the effective address. Code MOVB loop: movb *r0+,*r1+ dec r2 jne loop MOVB x4, loop unrolled 4 times loop: movb *r0+,*r1+ movb *r0+,*r1+ movb *r0+,*r1+ movb *r0+,*r1+ ai r2,-4 jne loop MOV copy: * cache first byte movb *src+,r3 loop: * fill two src words: * grab a word mov *src+,r4 movb r4,@r3lb mov r3,*dst+ dect cnt jeq done swpb r4 * grab a word mov *src+,r3 movb r3,@r4lb mov r4,*dst+ swpb r3 dect cnt jne loop done: MOV4 copy: * cache first byte movb *src+,r3 loop: * fill two src words: * grab a word mov *src+,r4 movb r4,@r3lb mov r3,*dst+ swpb r4 * grab a word mov *src+,r3 movb r3,@r4lb mov r4,*dst+ swpb r3 ai r2,-4 jne loop done: To see all the equates and stuff, open the attached file. copy.a99 Quote Link to comment Share on other sites More sharing options...
Tursi Posted September 15, 2020 Share Posted September 15, 2020 (edited) But MOV and MOV4 will write to the wrong address... assuming DST is odd, it's not incremented before the first MOV, only (src) is, so the mov r3,*dst+ will actually write to (dst-1). Actually, since the assumption was src is even, and a movb is used to increment it, the MOV will actually read from the wrong address too, and you'll get two copies of the first byte written starting at (dst-1). The gotcha that hit many of us was remembering that MOV is incapable of accessing an odd address, it will always truncate to 15 bits. * assume src = A000 (bytes 11,22,33,44), dst = B001 * cache first byte movb *src+,r3 * read A000, r3=11xx, src=A001 loop: * fill two src words: * grab a word mov *src+,r4 * read A000, R4=1122, src=A003 movb r4,@r3lb * MSB 11, r3=1111 mov r3,*dst+ * write 1111 to B000, dst=B003 Even though the addresses are right, MOV is not capable of accessing them. I decided to run it a little farther, cause it seemed silly to me that you'd miss that... and as long as you can overlook the first byte being early... the data does land in the right place. Sorry about that! Fire alarm testing today, that's my excuse. Edited September 15, 2020 by Tursi 1 Quote Link to comment Share on other sites More sharing options...
+FarmerPotato Posted September 15, 2020 Share Posted September 15, 2020 39 minutes ago, Tursi said: But MOV and MOV4 will write to the wrong address... assuming DST is odd, it's not incremented before the first MOV, only (src) is, so the mov r3,*dst+ will actually write to (dst-1). Actually, since the assumption was src is even, and a movb is used to increment it, the MOV will actually read from the wrong address too, and you'll get two copies of the first byte written starting at (dst-1). The gotcha that hit many of us was remembering that MOV is incapable of accessing an odd address, it will always truncate to 15 bits. * assume src = A000 (bytes 11,22,33,44), dst = B001 * cache first byte movb *src+,r3 * read A000, r3=11xx, src=A001 loop: * fill two src words: * grab a word mov *src+,r4 * read A000, R4=1122, src=A003 movb r4,@r3lb * MSB 11, r3=1111 mov r3,*dst+ * write 1111 to B000, dst=B003 Even though the addresses are right, MOV is not capable of accessing them. I decided to run it a little farther, cause it seemed silly to me that you'd miss that... and as long as you can overlook the first byte being early... the data does land in the right place. Sorry about that! Fire alarm testing today, that's my excuse. Thanks for your understanding. I goofed up and posted a src odd, dst even copy. The first MOVB aligns the source pointer to even, while caching the first byte. I see that it gives the same result whether src is odd or even. But if dst is odd, the result is even. My test code was: li src,>a001 li dst,>b000 li r2,>400 bl @copy Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 16, 2020 Share Posted September 16, 2020 I haven't actually done it, but my gut feeling is that you make two separate routines. One with MOV, one with MOVB. Start with a test figuring out which to use. For the MOV routine, handle an odd byte at the start and/or end with one or two MOVB. Quote Link to comment Share on other sites More sharing options...
+TheBF Posted September 16, 2020 Share Posted September 16, 2020 8 hours ago, apersson850 said: I haven't actually done it, but my gut feeling is that you make two separate routines. One with MOV, one with MOVB. Start with a test figuring out which to use. For the MOV routine, handle an odd byte at the start and/or end with one or two MOVB. I think your gut is good. An alternative is write both and call the correct code where needed, cuz you kept your data organized. Patient: "Doctor it hurts when I do this" Doctor: "Then don't do that!" Quote Link to comment Share on other sites More sharing options...
Asmusr Posted September 16, 2020 Author Share Posted September 16, 2020 Are you suggesting to solve the problem by solving it somewhere else? ? 1 Quote Link to comment Share on other sites More sharing options...
apersson850 Posted September 16, 2020 Share Posted September 16, 2020 5 hours ago, TheBF said: An alternative is write both and call the correct code where needed, cuz you kept your data organized. Well, that is what I meant. If the program must work with both aligned and non-aligned data movement, then write a procedure, which contains both methods, and at each invocation, determine which one to call. Then you all the time get the best performance, depending on the circumstances. And if you run out of memory, you can revert to MOVB only. 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.