apersson850
Members-
Content Count
1,063 -
Joined
-
Last visited
Content Type
Profiles
Member Map
Forums
Blogs
Gallery
Calendar
Store
Everything posted by apersson850
-
I didn't expect it to work, but I tried making the array large enough to accomodate numbers up to 10000. But the p-system responded "Can't allocate global data", which is what I expected.
-
Well, frequency is how often a period is repeated. So you need two data elements to create a frequency. 1 Hz is one signal per second, but you need two data output operations, one to turn on and the other to turn off, to create that frequency. Thus it takes two updates per second to get 1 Hz. Thus it would be correct to say that you can achieve 14 kHz by removing one of the three data elements. That also implies that you are reloading the array frequently. My calculations were based on that the array is at least 100 elements long, so that the time for the reload doesn't really matter. That's what gave me the 19.7 kHz figure.
-
What kind of data are you measuring? If you have data that alternates from on to off, then on again, on the same pin for each cycle, then the duty cycle of your signal should be roughly 50%. It's obviously not. I get 120 cycles for the first method and 92 for the second, in the inner loop. This assuming everything is in slow memory. With everything in fast memory, it's 64 vs. 48. 48 cycles is equivalent to 31.25 kHz, since you have to run through the loop twice to output first a "one", then a "zero". By pre-loading a register with PIO, you can change the MOVB to output to *R5 instead of @PIO, saving eight more cycles, and thus reaching 37.5 kHz. If we assume workspace in fast memory and the rest in slow memory, I get 100 vs. 76 cycles. The latter implies 19.7 kHz. An interesting observation is that by "simply" adding 16-bit wide zero wait-state RAM to the machine, you roughly double the speed of the computer, in the cases where you make things easy for yourself and let both code and workspace reside in expansion RAM. Especially when writing software that works together with a higher level language, like Extended BASIC or Pascal, it's valuable to be able to leave the RAM pad as it is, to avoid messing up things you shouldn't, and still have full speed from the CPU. For software that frequently accesses VDP RAM and such stuff, the impact is of course less.
-
Being crystal controlled, the variation from the hardware should be neglectable. But the playing with the scope thing I endorse fully, so I'll not object any more!
-
Considering that a standard TI 99/4A runs most of your own assembly code in memory which has four wait-states per access, you can add roughly 10 more cycles per instruction, and then you land at almost exactly what you estimate. On the other hand, having 14 cycles per instruction gives you about 214000 instructions/s, so a modification to have all 16-bit RAM is quite valueable when it comes to performance. Due to a more efficient internal design, the TMS 9995 is capable of executing the same instructions using about 1/3 of the clock cycles. But it must always read memory byte by byte, except when using the internal scratch pad RAM inside the chip. Vorticon, you know you can start counting the clock cycles if you want to, right? You don't have to measure.
-
When accessing bytes (MOVB), indexed access is slightly slower than indirect with auto-increment as well. Then INC or DEC and a JMP are roughly like 1½ MOV, but C is like a MOV. So you reduce the loop time to roughly 5/7 by counting downwards instead. Accessing array times one by one will be much slower, that's correct. Perhaps you want the output to cycle through as fast as possible. Otherwise you may have dual arrays, either after each other or interleaved. The second data item could be the duration you want the output to hold the same data. Here's the original function (just slightly modified) GPLWS EQU >83E0 * Bla, bla... LI R5,PIO REDO CLR R3 LI R2,VERTEX MOVB *R2+,@GPLWS+7 GET NUMBER OF VERTICES IN ARRAY LOOP MOVB *R2+,*R5 SEND ARRAY BYTE TO PIO DEC R3 JNE LOOP JMP REDO And here assuming a delay count (byte) follows each data item. GPLWS EQU >83E0 * Bla, bla... LI R5,PIO REDO CLR R3 CLR R4 LI R2,VERTEX MOVB *R2+,@GPLWS+7 GET NUMBER OF VERTICES IN ARRAY LOOP MOVB *R2+,*R5 SEND ARRAY BYTE TO PIO MOVB *R2+,@GPLWS+9 Fetch delay count DELAY DEC R4 JNE DELAY DEC R3 JNE LOOP JMP REDO
-
As long as we stay within the range of signed integers, as we do here, it doesn't make any difference.
-
No, it doesn't run anything from the GPL interpreter. It does of course read data from GROM, but uses its own code to do that. The p-code card doesn't only have 48 K GROM, it also has 12 K ROM with assembly code inside. That ROM contains startup routines, BIOS and the main part of the PME, i. e. the P-Machine Emulator. The PME is flexible enough to run p-code directly from RAM as well as from memory-mapped autoincrementing memory, i.e. VDP RAM or GROM. Thus p-code can be run from GROM on the p-code card without first moving it to RAM. Overall, the p-system doesn't use too much of the console's code. Floating point arithmetic is an exception. But even things like DSR calls are handled differently. They don't use DSRLNK or equivalent in the console. Printing on the screen is different, since the p-system always emulate an 80-column screen by sideways scrolling. Thus it has its own screen image in low memory expansion, and uses the VDP screen area simply as a viewport into the 80-column screen. This of course makes all print on screen routines different. It has its own file handling system, which is why you can implement a RAMdisk for the p-system by supporting sector read/write only.
-
Hahaha, that was a good one!
-
Sure. Here it is. Note that this is "proof of concept", so it's only for a RAMdisk at CRU base 1500H, with 512 K memory and implements only sector read/write for unit #11 (DSK5). That's the only thing Pascal uses. Then it also requires an installer to put the code on the card and another to convince the p-system about that there are more than the normal three blocked devices. A more flexible driver could be done, if desired, so you don't need to change it for different configurations. This is based on the DSR I have for my own RAM-disk, which simply uses available memory, not used by the p-system, to implement a simple RAMdisk.
-
I entered the Pascal code as it is on the Rosetta page, using a set to keep track of the primes. It seems the sets aren't too efficiently implemented in Pascal. Probably because they are represented by a bit in a packed field, so there's a lot of packing and unpacking. I changed from using a set to an array of booleans instead. That reduced the time from 10.558 s to 4.663 s. Then I suspected some time could be saved in setting up the array, so I used the faster system intrinsic moveleft to do that. Now the time is 2.869 seconds. With this solution, a boolean bit is stored as an integer, so 1000 elements require 2000 bytes.
-
After writing my own DSR for the Horizon 4000 card I now have, it works fine with the p-code system. Which confirms that there's nothing wrong with my card, but it's the ROS 8 that's at fault. It's too much geared towards the standard 99/4A operating system.
-
No, the p-code card doesn't run the GPL interpreter. But it uses GROM chips to store p-code, assembly code that's transferred to RAM to execute it and the data that makes up the OS:, a ROM-disk volume, which contains the operating system in the file SYSTEM.PASCAL and some other auxiliary files. So it only uses the same chip technology as GPL, not GPL itself.
-
Ehh, this was a long time ago, but doesn't XB also store some data/code in low RAM when you do CALL INIT? Or is that Editor/Assembler only?
-
Generally, when passing an array like this, it's better to store the number of relevant elements in the first element of the array. If you store that in array(0), then you don't have to know in advance how large the array is. That's good, since it implies that the same assembly routine can handle any size of array. You know that you always have array(0), so you can always look there to find out if there are seven elements in the array, or 389 or whatever. With the byte array method I suggested, you have to send the byte count as a 16-bit value (two bytes), if you need an array larger than 255 bytes. But again, sending the size first will allow you to know how far down the array you should index.
-
First, I wouldn't write this code using the GPLWS, if you have big problems saving values and restoring them again. Define your own workspace and use that. Go back to GPLWS just before returning to BASIC. I know this will run slower on machines that have the standard memory expansion, but it doesn't matter in this case, as the assembly code is small and the BASIC code running around it is slower anyway. Second, for the array passing, I'd do it so that I would have a specific call to the assembly program where it returns an address to a memory area where the control array should be stored. CALL LINK("PIOMEM",ADDR) will come back with the start address of the array (which is a byte array). Then you do FOR I=0 TO N CALL LOAD(ADDR+I,VALUE(I)) NEXT I Now you've populated the byte array with values from the array VALUE. After that, you call an assembly routine which will output this array to the port. If the number of values change, you'll have to inform about the length of the byte array too. Perhaps you also want to include some kind of time stamp for how long you want each value to remain on the port. I've done almost exactly the same thing once, so this is nothing I dreamed up now. You can have an exercise in Swedish through these two links: Number 1 Number 2 There are two articles I wrote in 1987 about something similar. I implemented a traffic crossing, with traffic lights, controlled by BASIC and Forth, but the Forth code isn't in the magazine, but was published through the Forth special interest group within Programbiten. But there are two routines, one which sets the outputs in the port and one that reads the two auxiliary inputs. Look for "Styr and ställ" (Control and set) in the magazines.
-
If you use it for saving return addresses only, it doesn't matter much. But if the stack is used for other things too, then it's sometimes handy to have the stack pointer actually pointing at the top of stack all the time, not pointing to the next free space above top of stack. With the TMS 9900 you can easily address top of stack in any case, but if you let the stack grow downwards, as I indicated, then top of stack is accessed by *SP. If you let it grow upwards, and want to use autoincrement, then you have to refer to top of stack as @-2(SP). Doable, but slower and consumes more space. When allocating a frame on the stack, i.e. a larger piece of data, then it's also easy to refer to the items in that record by indexing from the stack pointer with positive indexes. You can do the opposite, but at least to me, I find it easier to think in positive terms. You push by AI -ITEMSIZE,SP When you transferred the data to the stack, you reach the top element by *SP and items further down by @OFFSET(SP). You have use for such data on the stack when traversing graphs, for example.
-
DORG can be convenient. You can map a memory area without actually populating it with data. Like if you want to use the fast RAM in the console, you can have a DORG >8300 directive, below which you lay out data and perhaps code that you plan to have in that memory area. When your program starts, you'll have to move the proper data into the space, but you already then have the symbols correct.
-
If you want to reserve a memory area, you can place BSS or BES anywhere in your program. The computer doesn't care if your data is in the middle of the code or at any end. As long as you don't start executing the data as code, it doesn't matter. But that's something you have to avoid wherever you have your data. BSS (Block starting with symbol) implies that it defines a symbol at the first address of the block. You then increase the address to go further into the data. BES (Block ending with symbol) is the same, but the symbol is defined at the end of the data. If you have a stack, where you want it to grow towards lower addresses, then you want the address to initially load to the stack pointer to be at the end (high address) of the data. A push is then done with DECT SP MOV @DATA,*SP and a pop with MOV *SP+,@DATA As there's no pre-autodecrement, only post-autoincrement, this works better than having the stack grow in the other direction, as the stack pointer here always point at the current top entry on the stack. You don't need BES to define such a stack, since you can do DUMMY BSS stackspace STACKBOTTOM EQU $ but it's convenient. Anyway, placing player data at >F000 is completely arbitrary. You can have it wherever you like, as long as you know where you have it. And as long as your program (game) fits in the memory as a single item, i.e. isn't bigger than 24 Kbytes, there's no need to worry about absolute locations. Just let the program and data be relocateable, and let the loader/linker stuff it where it wants to. The only thing you should AORG in such a case is variables and stuff you want to optimize access to, so that you want to direct them to fast memory. I never care with my TI, but I have fast memory everywhere.
-
A. The space >F000 - >FFFF is actually 4 Kbytes. Since a mob is eight bytes, it's 512 mobs. Where to put them doesn't really matter. You should store them somewhere. B. Relocatable code is more flexible. Instead of you keeping track of where everything goes in memory, you let the loader do that instead. The assembler will generate relocatable addresses to both code and data from your symbols, and the loader will convert them to absolute when the code is loaded. That you used absolute code with Mini Memory is because you had to force the code to go into that module, not wherever the loader preferred. C. There's no conflict between code and data for relocatable code. It's rather the opposite. The assembler/loader will keep track, but if you generate absolute code, it's up to you entirely. D. If you store something at >F000, then that's per definition an absolute location. It's up to you to make sure you don't mess that area up. A combination of absolute and relocatable code is the most difficult to keep track of, except for when storing data in reserved areas, like the scratch pad RAM. Such a place is pretty easy to keep track of. E. The scratch pad RAM is too small to store large data structures in. Workspaces, frequently accessed data and sometimes small chunks of time critical code is best placed there. For that, the AORG directive is good, provided your loader can handle it properly. F. I would analyze which data to put in fast memory from the beginning. Start messing with that later is hardly the most efficient way. You should have at least a reasonably good idea of what will be most frquently used already when you start.
-
Here the ABC 80 was the most popular at that time. Made by Metric (software), Dataindustrier (hardware) and Luxor (mass production).
-
sprite help - clarifying how the command works
apersson850 replied to digdugnate's topic in TI-99/4A Development
But isn't the thing asked here about how to move a graphic block that's not defined as a sprite, or several sprites, but just as normal graphics? That's completely different. -
Methods for speeding up CALL LINK
apersson850 replied to senior_falcon's topic in TI-99/4A Development
Well, it's not really a calculator company only, but the fact that the calculator division got the responsibility for the 99/4 and 99/4A gives the same result. And we have to admit that the calculation accuracy and capability of the TI 99/4A, using floating point, is significantly better than almost every other competing design. But why they didn't also provide integers, when they have a 16-bit CPU handling 16-bit values naturally, that's a mystery. But looking at the whole VDP RAM only, GPL powered thing, it's obvious that speed wasn't the first thing they were thinking about. -
I meant that the user interrupt is pretty inefficient since it's a part of the computer's full interrupt routine. Which checks for several other things too, not only the user interrupt. If you can modify the interrupt vector directly, which you can't as long as it's in ROM, then you can make an interrupt routine dedicated to driving a clock, for example.
-
The problem with a standard 99/4A is that you can't change the interrupt vectors, as they are in ROM. Thus you are much more limited in what you can do with interrupts. There's a user defined interrupt you can use, but it's pretty inefficient, compared to a "real" interrupt. If you have a console modified like mine, you can change the vectors, but that's not any standard. If you try something similar on the 99/4A, you'd also find that things like disk access implies that no interrupts are serviced, so the clock would lose some time there. I have a real-time clock implemented in hardware in my machine instead. It keeps track of time, even when the computer isn't on.
