+InsaneMultitasker Posted April 5, 2013 Share Posted April 5, 2013 (edited) I finally got around to implementing a table-driven CRC-16 routine. Table-driven is usually touted as faster, so I wanted to test that claim. With it I have been able to drop overall CRC validation of the Geneve operating system from 8.5s to just under 3.0s. The following code takes advantage of the 9995 internal RAM. The 128K data file is loaded from disk and stored in RAM (not internal 9995 ram!) for the CRC loop. I'm looking for any ideas that might further optimize the loop at label NEXTBYTE. I suspect it's about as tight as possible but would appreciate an outside opinion or two ******************************************************************************** * CRC16s 2013 March 31 * * Approximate execution speed under emulation, 128K file: * 1. 8.5s CRC calculated 'old faithful' way one byte at a time * 2. 4.5s CRC calculated with table; primary loop in local SRAM page * 3. 3.0s CRC calculated with table; primary loop in 9995 scratchpad * WS EQU >F000 9995 workspace * MAIN3 LWPI WS LET'S LOAD OUR WORKSPACE LIMI 0 turn off those interrupts! * * Copy CRC calculation code to u9995 for fastest execution * Called @>F020 after file is loaded into RAM * LI R1,STARTCHIPCODE LI R2,>F020 LI R7,SAVE9995 CLR R8 CHIPLP MOV *R2,*R7+ ultra-conservative - save scratchpad MOV *R1+,*R2+ then move code INC R8 # of words are we overwriting CI R1,ENDCHIPCODE end of code? JNE CHIPLP not yet MOV R8,@CT9995 yes. save the word count for exit[/font] *----------------------------=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= * Create 16-bit CRC table in memory * GENERATECRC16 LI R7,CRCTABLE512 point to start of table LI R8,0 0 to 255 LOOPI MOV R8,R10 get next CRC value into RESULT (R10) SLA R10,8 RESult shifted LEFT 8 bits CLR R1 for table, always start as if cleared XOR R1,R10 R1^R10->r10 bug, earlier reversed operands ):- *inner loop starts: LI R9,8 now for our inner loop LOOPJ MOV R10,R2 R2=temp ANDI R2,>8000 SLA R10,1 CI R2,>8000 temp have msbit set? JNE SKIP no LI R3,>1021 XOR R3,R10 yes, XOR it SKIP DEC R9 any left? JNE LOOPJ yes *end inner loop MOV R10,*R7+ move crc value into table INC R8 +1 CI R7,CRCTABLE512+512 last entry? JNE LOOPI * * Set up registers to loop through the 128K MDOS file * GOODFILE2 LI R3,16 number of 8k pages LI R4,PGETBL+8 Using memory starting at >10000 (first page above 64K) LI R5,CRCLIST location to store CRC bytes LI R9,MDOSCRC location to put the MDOS embedded CRC (from memory) * BL @>F020 ** call into on-board RAM - calculate CRC from 128K byte file * EXIT LI R1,SAVE9995 restore 9995 scratchpad saved earlier LI R2,>F020 REST1 MOV *R1+,*R2+ DEC @CT9995 JNE REST1 BLWP @0 Thank MDOS for working so well, then exit *------------------------------------------------------------------------------- * CRC Calculation - Table driven * approx 66 bytes (0xF020-F062) - careful of internal WS's at 0xF080+ * Destroys: R0,R1,R3,R4,R7,R8 * * STARTCHIPCODE MOVB *R4+,@>F111 get a page of data LI R0,>2000 CLR R8 clear the CRC * * COMPUTE THE CRC using our CRC table (generated earlier) * NEXTBYTE NOSKIP MOVB *R0+,R1 GET character from buffer XOR R8,R1 XOR the MSByte! SRL R1,7 we want a word index. [Can we ignore the Lsbit, one less bit shift?] * ANDI R1,>FFFE for now lets mask it [Yes! uP doesnt care] MOV @CRCTABLE512(R1),R1 get the table value (16-bit) SLA R8,8 Shift the old result 8 bits XOR R1,R8 and xor with table entry. R8 is the new value SKIPIT CI R0,>4000 end of this page? JL NEXTBYTE no, get another[/font] MOV R8,*R5+ yes,add CRC to the list DEC R3 any pages left? JNE STARTCHIPCODE RT *** remove if we don't move to on-chip! ENDCHIPCODE ****************************************************** * end of code, data follows * H1021 DATA >1021 crc polynomial (16-bit) CRCTABLE512 BSS 520 CRC table buffer. Pad a few bytes for stupid programmer errors SAVE9995 BSS 256 scratchpad storage for u9995 CT9995 DATA 0 number of words used *--------------------------------------------------------- END Edited April 5, 2013 by InsaneMultitasker Quote Link to comment Share on other sites More sharing options...
matthew180 Posted April 5, 2013 Share Posted April 5, 2013 I never looked into the details for CRC, but it looks like you are processing 1-byte at a time. Is is possible to read a 16-bit value from buffer and operate on that instead? Also, the CI R0,>4000 would be a candidate for replacement, if only to using a register counter that is setup outside of the loop, then replace the CI with DEC Rx and JNE. Also, if you can't do a 16-bit value, then you could unroll the loop twice, i.e. do MOVB *R0+,R1 twice inside the loop and decrement the loop counter twice. 1 Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted April 5, 2013 Author Share Posted April 5, 2013 I never looked into the details for CRC, but it looks like you are processing 1-byte at a time. Is is possible to read a 16-bit value from buffer and operate on that instead? Also, the CI R0,>4000 would be a candidate for replacement, if only to using a register counter that is setup outside of the loop, then replace the CI with DEC Rx and JNE. Also, if you can't do a 16-bit value, then you could unroll the loop twice, i.e. do MOVB *R0+,R1 twice inside the loop and decrement the loop counter twice. Thank you. Good call on the DEC / JNE. That will be easy to implement. I can also easily unroll the loop once, maybe twice in the processor space, to reduce the loop count. The CRC routine that seems "common" in TI terminal emulators operates on a byte of data, with the 8 bits calculated in real-time. The table pre-calculation at the start of my program loosely corresponds to the bit-wise code. The look-up table is also byte-oriented but it utilizes properties of mod 2 and polynomial division, thereby eliminating a lot of the bit-shifting in the common calculation. ( I don't have the common code handy to post for comparison.) Alas, I don't see a way to operate on a 16-bit value given the CRC's mathematical properties. However, it occurs to me that the memory at 0x0000-0x1fff is RAM on this machine, so I might eek out some speed if I create the CRC table at offset 0x0000, and eliminate the base address. [color="#000000"]MOV [/color][color="#006666"]@CRCTABLE512[/color][color="#666600"]([/color][color="#000000"]R1[/color][color="#666600"]),[/color][color="#000000"]R1 [/color][color="#000088"]get[/color][color="#000000"] the table value [/color][color="#666600"]([/color][color="#006666"]16[/color][color="#666600"]-[/color][color="#000000"]bit[/color][color="#666600"])[/color] would become MOV *R1,R1 I don't remember if XOR allows indirect register access. If so, I should be able to change [b]from 3 instructions:[/b] [color="#000000"] MOV *R1[/color][color="#666600"],[/color][color="#000000"]R1 [/color][color="#000088"]get[/color][color="#000000"] the table value [/color][color="#666600"]([/color][color="#006666"]16[/color][color="#666600"]-[/color][color="#000000"]bit[/color][color="#666600"])[/color][color="#000000"] SLA R8[/color][color="#666600"],[/color][color="#006666"]8[/color] [color="#660066"]Shift[/color][color="#000000"] the old result [/color][color="#006666"]8[/color][color="#000000"] bits XOR R1[/color][color="#666600"],[/color][color="#000000"]R8 [/color][color="#000088"]and[/color][color="#000000"] xor [/color][color="#000088"]with[/color][color="#000000"] table entry[/color][color="#666600"].[/color][color="#000000"] R8 [/color][color="#000088"]is[/color][color="#000000"] the [/color][color="#000088"]new[/color][color="#000000"] value [/color] [b]to 2 instructions:[/b] [color="#000000"] SLA R8[/color][color="#666600"],[/color][color="#006666"]8[/color] [color="#660066"]Shift[/color][color="#000000"] the old result [/color][color="#006666"]8[/color][color="#000000"] bits XOR *R1[/color][color="#666600"],[/color][color="#000000"]R8 [/color][color="#000088"]and[/color][color="#000000"] xor [/color][color="#000088"]with[/color][color="#000000"] table entry[/color][color="#666600"].[/color][color="#000000"] R8 [/color][color="#000088"]is[/color][color="#000000"] the [/color][color="#000088"]new[/color][color="#000000"] value [/color] Guess I have a few things to try this weekend Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted April 5, 2013 Author Share Posted April 5, 2013 I don't know why the AA editor keeps translating color and fonts in the 'code' section. Here is the snippet. MOV @CRCTABLE512(R1),R1 get the table value (16-bit) would become MOV *R1,R1 I don't remember if XOR allows indirect register access. If so, I should be able to change from 3 instructions: MOV *R1,R1 get the table value (16-bit) SLA R8,8 Shift the old result 8 bits XOR R1,R8 and xor with table entry. R8 is the new value to 2 instructions: SLA R8,8 Shift the old result 8 bits XOR *R1,R8 and xor with table entry. R8 is the new value Quote Link to comment Share on other sites More sharing options...
matthew180 Posted April 6, 2013 Share Posted April 6, 2013 Also keep in mind that SWPB is only 10 cycles and the shifts are 12+2*C and for a shift of 8 that would be 12+16=28 clocks. It looks like the shift is doing something else, but if you can change out the shifts, that should help too. Quote Link to comment Share on other sites More sharing options...
+InsaneMultitasker Posted April 7, 2013 Author Share Posted April 7, 2013 Unrolling the computation helped as did combining the table lookup with the XOR. When I originally wrote the code, I had reversed the source and destination. XOR did not like a symbolic address in the destination; however, using one in the source was acceptable. I had replaced one of the shift instructions with SWPB and ANDI until I read the 9995 manual more closely I'd say this one is conquered for now. thank you 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.