Jump to content

insomnia

Members
  • Content Count

    91
  • Joined

  • Last visited

  • Days Won

    1

insomnia last won the day on March 15 2012

insomnia had the most liked content!

Community Reputation

193 Excellent

About insomnia

  • Rank
    Star Raider

Profile Information

  • Gender
    Male
  • Location
    Pittsburgh, PA

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I saw there was a lot of impact to the problem of using a "movb" instruction to test for zero, so I rushed out a patch to fix this. Jedimatt42 has already written a good description of the problem, so I won't repeat him here. I also realized that there was no changelog entry for 1.18, so I fixed that too. I also changed the installer code to remove the need for a "tree" command, and we now automatically build libgcc. Tschack909, I tried to help you out, but I don't have enough information to help you out. I looked at the assembly, and I couldn't find anything that looked like a problem. So here's the changes for the latest patch: Use the "ci" instruction when testing for 16-bit zero values Use the combination of "jeq" and "cb" to test for 8-bit zero values The "ci" test is pretty self explanatory so there's not really much to say about that. However, I wanted to talk a little more about the 8-bit test since I thought it was kinda neat. Assume we have a value in r1 we want to test for zero. This is the code that the compiler will now emit: >1300 jeq 0 >9801 fffe cb @$-1, r1 The "jeq" instruction is effectively a no-op. Since it's more likely for some random instruction to result in a non-zero value, we save two clocks by not taking the jump. In either case, we will proceed to the "cb" instruction. The "cb" instruction is using the zero of the "jeq" instruction as one of its arguments, and we test against the value in r1 as expected. So by using this combo, we have a bytewise test for zero that's only two bytes and 8 clocks longer than a "ci" instruction. Neat! At any rate, thanks for finding a bug, and let me know if you find any problems.
  2. A new patch is now available in the first post, and the installer has been updated as well. So in this patch, we only have two fixes: Fixed 16-bit signed right shift Fixed 32-bit unsigned right shift The 16 bit shift was shifting in the wrong direction, and the 32-bt shift was performing signed shifts. TheMole found the 16-bit shift bug, and I found the other one after doing some automated testing. Hopefully that's the last of these kinds of errors. Making new releases takes no effort at all, so don't worry about that. Also, I appreciate bug reports no matter if they are drip fed or not. Thanks for doing these tests, clearly my own testing has some gaps.
  3. OK, I'm really embarrassed about this bug. It turns out that instead of emitting a right shift, we are emitting a left shift. As you discovered, this only affects signed integers. I can only assume this was due to a cut-and-paste error somewhere that escaped testing. This is a one-line fix, but I'll confirm that the rest of the shift operations work as intended before making another patch.
  4. OK, it's patch time! The first post has been edited to include the new patches and an updated GCC installer. The compiler is fairly mature at this point, so there's not many changes here: More strict checks for address in BL commands Optimization for 32-bit left shift by 8 bits Optimization for 32-bit logical right shift by 8 bits Fixed 32-bit right shift by more than 16 bits Fixed 8-bit multiplies The big news here is fixing 8-bit multiplies. This was first seen by TheMole, so thank you for the find. There's also some edge cases that got worked on in 32-bit shifts. These are nothing major, and were not likely to have been seen by too many people. I found these problems and the BL addressing mode bug while working on my OS project. I don't want to derail this thread, so look here for details on that: http://atariage.com/forums/topic/282474-linux-like-os-for-the-ti Anyway, thanks to everyone for continuing to find bugs. I love to swat 'em so don't hold back if you find one. PS: My blog is majorly out of date, I should do something about that...
  5. The linker can already handle non-contiguous memory. The problem is that the program loader needs to copy that code from its storage location to the active location. The crt0 code in the example programs can handle this for basic cartridges and EA5 files, but more complex designs need to have additional code written. Unfortunately, the compiler will never natively support SAMS, We really don't want that anyway. SAMS support is something that needs to be in something like an OS layer. We would never be able to make the compiler smart enough to handle all the use cases, and we don't want to have two compilation outputs (one for a system with a SAMS card, one without). I'm slowly working on a replacement OS for the TI which automatically handles SAMS usage. It isn't ready yet, and might not be useful for your project anyway.
  6. It has made math totally cool.
  7. I'm currently working on a fix for this. The problem is that char values stored in registers occupy the upper byte, When multiplying two register-stored values, the result is in the upper word of the 32-bit result. When multiplying memory-stored values, the char-to-int promotion occurs as expected, and the result is in the lower word of the 32-bit result. For extra fun, when multiplying register-stored and memory-stored values the result is stored in the middle two bytes of the 32-bit result. It's difficult to control how values are stored, so currently char multiplication can give unexpected results (as has been shown already). I'm working on a fix to force all char multiplications to be first converted to register-storage format, and to extract the correct portion of the result. Here's the math behind this: Result = (A * 2^ * (B * 2^ = (A * B) * (2^16) Before the operation completes, the resulting value needs to be converted to register-storage format, with the value stored in the upper byte of the result register. Here's an example of what I'm trying to achieve: mpy r1, r2 * Value A stored in register R1, value B in R2, result in [r2,r3] * The resulting char value is stored in the low byte of R2 swpb r2 * Move result to high byte of R2 to convert to register-stored format The problem I'm having is that I can't get the compiler to recognize that the the low word of the 32-bit result is clobbered by the multiply. This means that any value stored in R3 in the example above will be destroyed by the multiplication, but the compiler will assume it's still valid. That error value will propagate through the code and have unpredictable results. I can't just promote all char values to int values either, The value we want will then be stored in the low byte of the 32-bit result, and we would still need to convert to register-stored format. Bear with me, a fix should be coming soon...
  8. The floppy driver is written to directly access the FDC registers, specifically the TI FDC. As you suspected, this means a new driver will need to be written for each hardware variant (Myarc, Corcomp, etc.) This may sound like an unnecessary hassle, but the driver is only 600 lines of well-commented code. About 100 of those lines are just symbol definitions. Additionally, it looks like most FDC and HDC manufacturers took inspiration from the TI FDC, so there are a lot similarities to exploit. Heck, there may even be common code that can be made device-independent. The reason for ignoring the DSR was that it didn't provide the features I needed, was slow, and needlessly stepped on the scratchpad and VDP memory. Additionally, the DSR code only works with the TI disk format. I wanted to use a different filesystem that allowed for deep directory trees, support for zero-length files, and softlinks (not yet available, I'll come back to it though.). I also wanted a sector cache to speed up common operations. The only way to do that would be to embed the CFS volume in the TI volume. The DSR would treat CFS as a big file, and then another layer would do the actual filesystem stuff. I didn't time it, but I suspect that to things this way would increase access time by about 200 to 300 percent. No thank you. I'm using sector-by-sector access as that's how the hardware wants to operate. File operations ride on top of this and function in the normal Unix-y way. I'm not using inodes though. Standard Unix filesystems use a fixed number of inodes that subtract from storage space. Instead, I'm dynamically allocating file metadata records as needed to maximize space. Sorry, I've gone off topic a bit. Let me try to reel it back. Yes each piece of hardware will need it's own driver, but there is a lot of similarity between devices of the same type that can be exploited. Plus it's more fun this way. Other topics: I read the AMS overlay document, and that is an approach to paging I have not seen before. I'm not sure it's any better than what people have been using so far, but it's worth more time to think about it. So far, I'm liking plan 99, and I think the parallels with plan 9 are a good thing. I would report some progress here, but I got bit by what looks like a compiler bug in my bank-switch code, so I'm working to fix that first.
  9. First off, I should answr some of these questions. @ retroclouds RE: scheduler preemption I'm using the VDP interrupt to periodically check the process queue to see if there is a higher-priority process that is ready to run. if so the currently running process is packed up and the new one is started. Additionally, each time a syscall is invoked (basically any IPC or file IO operation), the system pauses the calling process and starts the next ready, highest-priority process. In some pathological cases, it is possible for a process to starve same or lower priority processes, but that's a risk with other operating systems as well. In all cases, higher-priority processes are started when expected. @TheMole RE: additional developers Once I get the major system design decisions made and get more of a framework put together it would definitely be helpful to bring on additional help. I'm typically used to being a solo developer, but in the interest of time and feature coverage, it's probably a good idea to get additional hands on the project at some point. @TeBF RE: Tinix I thought about the same name, but unfortunately someone already has a minix-derived system by that name. Oh well. @gemintronic RE: contiki Don't worry about it. I looked into porting Contiki a while ago, and it just wasn't a good fit for the TI. A LOT of compromises were taken to make that OS work. It's a microkernel with stackless processes. The whole project has the feel of a demo program. Impressive, but highly tuned for a single purpose. We have enough resources to make something with a more familiar feel and a lot more capability. @Vorticon RE: user programming Good question. For now I expect cross development on a PC to be the only way to go. I would eventually like to get self-hosted development working (C, assembly, pascal, basic, etc), but that's a ways off. So last night I looked at the AMS card, and started thinking about how best to use that resource. Specifically, how do we maximize memory for user processes? Even though we have up to 1MB of RAM available, we don't have a Memory Management unit (MMU) or facilities in the processor to detect reads or writes to unmapped space. We would need to add wrapper code to every read or write operation to guarantee that the intended memory is available for use before the operation takes place. This looked even worse as I tried to figure out a way to implement that. I looked at the source for uClinux, which is a variant of linux which suppports processors without MMUs. They resolve the problem by not allowing virtual memory or similar systems like what I was originally looking at. This wasn't too suprising, since I wasn't very far from making the same decision. So here's the memory model for user processes: 0x2000 - 0x3fff : Application code, 8KB paged 0xa000 - 0xfdff : Application memory, 23.5 KB 0xfe00 - 0xffff : Process stack, 512 bytes If more memory is needed for data, we can use a floppy or a temp file residing in AMS space. This should be OK. I've already got infrastructure set up for handling paged kernel code. This should be extendable for user code. In my experience, there's usually about a 80/20 split between code and data. So for 24 KB data, we would typically have a 120 KB executable file, and with a total of 1 MB RAM available, we can simultaneously run about 8 of these executables. Alternatively, we should be able to run up to to 128 lightweight processes (4 KB code, 4 KB data). Not too shabby. So the next step for me is to get AMS mapping working, and to find a good way to do block management to allow kernel and applications to share memory resources. Temp file management will probably come after that. Finally, I think application loaders will be good to work on. At that point, an alpha release doesn't seem so far away.
  10. So I've mentioned on here a few times that I've been working on a new operating system for the TI. Development has gotten to the point where I think it's time to do some kind of announcement. Additionally, I've read the suggestion that something like Contiki should be ported one too many times. We can do better, folks. I'm taking inspiration from Linux for several reasons. It's clearly a successful design. If it's similar enough, I might be able to port some code instead of writing everything from scratch. Writing user programs will be easier since there is no shortage of good references out there. Keep in mind, I'm nowhere close to an alpha release, but I did want to make sure I got some kind of input that what I'm working on might be of interest to someone besides me. The design philosophy I've been following is that while TI wrote a bunch of neat code for the time, modern developers should only be limited by the constraints of hardware, not software decisions of the past. So that means backwards-compatibility is out. Developing against raw hardware is in. I'm assuming this as a base platform: Console 1MB AMS card Two 96KB floppy drives RS232 card Here's a current feature list: Kernel runs from a cart image Device driver API Drivers for: text screen keyboard floppy drive /dev/null /dev/zero /dev/random Posix-compatable filesystem API CFS filesystem (a simple filesystem similar to ext2) Preemptive multitasking Priority scheduling Mutexes Semaphores Shared memory Syscall API Timers Conspicuously missing features: User shell Use AMS for process memory seperation loading executables (EA5 and ELF) Shell scripting Some kind of catchy name As of today, the kernel size is about 12KB and has about 10,000 lines of code (not including libc or libgcc). The code will eventually be put up on github after i clean it up a bit. I need to impose a coding standard and make sure all dead code has been removed properly. So, what does everyone think of this project? Is this something that sounds interesting or possibly useful? Is there something I seem to have overlooked? Deep-seated opposition to the very concept? Let me know. I'm trying to work on documentation, but I suspect it will always lag the current code state. If there's some aspect of this thing someone would like more information on. I'll be happy to share. Thanks for listening.
  11. Oh yes, the TMS9900 is definitely "special". And at times really annnoying. Anyway, there is a libgcc included with the gcc patches. To build it run "make all-target-libgcc; make install". This will build the library and install it in a location GCC can find it. It may be necessary to include "-lgcc" in your linker options. It's been a while since I've made a project that needed this, so I might be forgetting something. If you have any problems let me know and I will try to help.
  12. Thanks for the replies everyone. From what I'm hearing, what I'm trying to do is possible, I just need to look harder for my bugs. Well, I was looking for a challenge...
  13. So I'm trying to write a tiny operating system for the TI, and I'd like to avoid as much of TI's original design decisions as possible. This is mostly for the challenge of it and to see if whatever I can come up with is better then what the original engineers made. This would also let me make maximum use of the limited hardware resources. My most basic problem is that I need to do my development on an emulator since my actual hardware is packed away at the moment. So my first question is if there are any Ti99/4a emulators which permit direct access to the FD1771 floppy controller. I'm pretty sure Classic99 does not do this, but I think MESS does. I tried writing a really basic driver for the FD1771 and ran it in MESS. The problem I had there was that any attempt to read from the disk failed. I was able to get a lot of functions working properly, but when I tried to read a sector, I only got the first byte. After that I got "disk not ready" errors. I've been using Thierry Nouspikel's code as a starting point, along with dumps of the floppy controller firmware and the FD1771 datasheet. That's a bunch of text. Here's the short version: 1) Does anyone know of an emulator that emulates the floppy controller hardware? 2) Has anyone come across code which directly interfaces with the floppy controller, bypassing the DSR? I've included the code I'm using under the spoiler tag.
  14. Yep, sounds like a compiler bug. If you can point me to your source code I'll take a look
  15. There's a working implementation in the tms9900 binutils package in binutils-2.19.1/opcodes/tms9900-dis.c. Look for the print_insn_tms9900 function. This function does basically the same thing mizapf describes. Here's some pseudocode: index = (opcode >> 12) & 0x0F switch(index) { case 0: index = (opcode >> & 0x0F switch(index) { case 0, 1, 12, 13, 14, 15: format[] = {"","","","","","","","","","","","","","","",} break case 2, 3: index = (opcode >> 4) & 0x1F format[] = {"li","", "ai","", "andi","", "ori","", "ci","", "stwp","", "stst","", "lwpi","", "limi","", "idle","", "rset","", "rtwp","", "ckon", "", "ckof","", "lrex", "","","",} break case 4, 5, 6, 7: index = (opcode >> 6) & 0x0F format[] = {"blwp", "b", "x", "clr", "neg", "inv", "inc", "inct", "dec", "dect", "bl", "swpb", "seto", "abs", "", ""} break case 8, 9, 10, 11: format[] = {"", "", "", "", "", "", "", "", "sra", "srl", "sla", "slc"} break } break case 1: index = (opcode >> & 0x0F format[] = {"jmp", "jlt", "jle", "jeq", "jhe", "jgt", "jne", "jnc", "jno", "jl", "jh", jop", "sbo", sbz", "tb"} break case 2, 3: index = (opcode >> 10) & 0x07 format[] = {"coc", "czc", "xor", "xop", "ldcr", "stcr", "mpy", "div"} break default: format[] = {"", "", "", "", "szc", "szcb", "s", "sb", "c", "cb", "a", "ab", "mov", "movb", "soc", "socb"} break } if(format[index] != "") decode as format[index] else invalid instruction There's more to it than this, but this should give you somewhere to start. Good Luck!
×
×
  • Create New...