Jump to content
IGNORED

Lots of Assembler Questions


kl99

Recommended Posts

Thank you again for your support. TIcode99 is starting to show off the first fruits.

It can create a html version of any assembler source code with a much needed proper syntax highlighting.

If you open both: the .a99 original file and the .html file with the same name in Mozilla Firefox you should see the same content, using the same fonts and sizes.

ROM-4A_Acomplete.a99

ROM-4A_Acomplete.html

 

However the .html version features syntax highlighting and some tooltips when you hover. You can also click on any Symbol Reference to jump to the Definition in the code.

 

Here is some snapshot comparison:

post-27826-0-92030300-1518076694.png post-27826-0-82338800-1518076693.png

 

Now I am working on generating the proper address line left to the instruction.

ROM-4A_Acomplete - WithAddresses.html

However I am not calculating the instruction byte length correct yet for every type of Instruction.

Has anyone a good read for this?

  • Like 1
Link to comment
Share on other sites

Hi Lee,

nice to read from you. :)
i will. i want to include more features still but nothing prevents a interims release of the set of files.

The hard part was making an Intermediate Representation of all the segments of an instruction.
Tweaking the html generation is now like putting sugar on the cake.

 

Thinking forward big times would process the source code file like the Cpu and performs the instructions, line by line, in a TI-99 simulation sandbox.

So instead of running a TI-99 simulator/emulator from a compiled binary for the System Roms/Groms, this simulator would run a TI-99 off from the assembler source code (and later Gpl source code).

The final goal would be to be able to debug the TI operating system in this simulator with a full context of what is available in the source code files, with labels, symbols, expressions, references, comments and all its things.

  • Like 1
Link to comment
Share on other sites

The simulator got into development stage. So far I am processing these instructions with success:

IDT
TITL
DEF
EQU
B
BL
MOV
INCT

 

I am ignoring DATA, BYTE and RORG so far.

 

The simulator is already jumping to KSCAN and Branch Linking to PUTSTK.
It feels a bit like Dr. Frankenstein bringing something to life.

I am eager to continue on this.

 

For now I am running the Memory as ArrayList and each address in the memory can have an object instead of only an Byte Value.

This wiill slow it down but the purpose is to have full debugging not full speed.

Should it be that 83E0 is set as Workspace Register Pointer when then computer starts to execute the first line?

I assume the CPU starts with PC being at 0, right?

Does the actual CPU detect Data/Byte/Text Statements and for those only increases the PC and goes on to the next instruction?

  • Like 1
Link to comment
Share on other sites

When you turn the computer on, it goes through the reset vector at address zero (hardware logic makes sure this happens). There you find the workspace and the address of the first instruction.

The CPU will execute everything it encounters as instructions. Data generating statements like DATA, TEXT etc. aren't there, once the program is assembled. Thus the CPU has no way of knowing if it's supposed to be instructions or data. That's up to the programmer to keep track of.

Instructions are one word long (always on even byte boundary). If instructions contain either a memory address or immediate data, they are two words long. If they contain two addresses, they are three words long. That's the max length for the TMS 9900.

So if you decode the instruction as an immediate one (Load Immediate, Compare Immediate etc.) it's two words long. If it has general addressing (MOVe, Add, Subtract etc.), the you have to look into the addressing mode bits. Register, register indirect and register indirect with auto-increment adds no additional word, but symbolic and indexed does.

  • Like 1
Link to comment
Share on other sites

thanks. that was the missing piece in the puzzle.

0000 83E0 DATA >83E0 RESET vector
0002 0024 DATA >0024

 

0000 contains the Workspace Register Pointer

0002 contains the address of the first instruction

i always wondered how it passed that first data bytes and Classic99 doesn't stop when you have a breakpoint at 0000. It seem to abstract this hardware logic away.
Thank you a lot for this info.

I think I handle most of the lengths correct, I still have to check the Assembler Directives and the Data, Text Statements to calculate their correct value.

Link to comment
Share on other sites

Well, you don't run through 0000, so setting a breakpoint there will not accomplish anything. The hardware effectively does a BLWP @0000 when the computer is reset. Actually executing a BLWP @0000 instruction will accomplish the same thing, but the reset is then in software only. The hardware reset is normally carried through to all chips needing a physical reset signal when active.

  • Like 1
Link to comment
Share on other sites

Correct, the vector data is not executed, only read, so a breakpoint won't stop there. But the RESET (and, I think, LOAD) vectors are not checked for read breakpoints in Classic99, they're hard coded. I'll have to think about whether I hate that...technically (and practically) neither case is triggered by software.

Link to comment
Share on other sites

At least it is clear for me now. Please don't take it wrong, Classic99 is a great help for debugging, I need it to get this project anywhere.

Since the 2nd word is the address of the first instruction and this is a data statement I was so far ignoring those, I missed that actual Branch to >0024 and my simulator therefore went another path. ;)

I will fix it and check the correct "route".

 

The other 2 topics that popped up:

 

1. Byte versus Word instructions

 

The memory representation for now is an ArrayList of size 65536, however I store the object at the target address when there is a MOV or similar instructions.

This is wrong. So for word operation I either have to store the object at both byte addresses with some flag of LSB. Or I reduce the Array to represent 32768 words. And handle byte operations special.

 

2. Object gets destroyed when modified

 

Since I want the provide the user with as much information as possible about what is in memory, I want to store the object in memory instead of its value only.

Example:

MOV @STKADD,R4

Let's assume R4 contains 0 (still initialized).

After this instruction I store an object representing the operand @STKADD in R4 (actually at the address in Memory that R4 is pointing to).

So far so good, instead of only having >8373 as value in R4, the user will see that R4 is containing the STKADD (which means Stack Address afaik).

SRL R4,8

However the next instruction logically shifts the content of R4 to the right by 8 bit.

So I wondered how do I not loose track of what is stored in R4 after processing this instruction?

The only solution is to start storing an Expression of objects in R4.

R4 contains an expression representing: SRL(@STKADD, 8 )

 

I could extend this expression whenever R4 is modified until R4 is freshly assigned with something that is not at all based on the current Value of R4.

 

This way the user will not only see the actual value the expression or parts of it is resolving to but a history of how the current value evolved.

Edited by kl99
Link to comment
Share on other sites

I think I got the automatic address number generation right, at least for these files I was not able to find a single offset from the address number shown in TI-99 Intern from Heiner Martin.

These 3 files represent the original commented source code by TI for Rom 0 (>0000 - >1FFF) for the TI-99/4A, now with added address numbers.

 

ROM-4A_Acomplete.html

ROM-4A_Bcomplete.html

ROM-4A_Ccomplete.html

 

@mizapf: I am not sure whether I understood it right? There is a big bug in the operating system?

 

Link to comment
Share on other sites

I understand the thought. Even if it's interesting from a pedagogic view, what do you do when later R4 happens to be executed? Remember that in the TMS 9900 architecture, registers are just an arbitrary sequence of bytes in memory. You can execute them as instructions if you like. This is sometimes actually done on purpose, and sometimes by accident.

 

Regarding memory representation: The TMS 9900 is a 16-bit instruction CPU, which can address instructions on even addresses only. If you ask it to execute an odd address, it will reduce it to the nearest lower even address. But as data, you can address bytes at any address. Words only at even addresses, though.

 

Another issue is see with your "content" idea is memory mapped devices. The content at the Video Display Processor Read Data address (VDPRD) will change by itself, and you will not know for sure what's in there. Well, the data you may know, but how it's derived depends on how the VDP is set up.

Link to comment
Share on other sites

@mizapf: I am not sure whether I understood it right? There is a big bug in the operating system?

 

Sorry, that was a bit too cryptic from me. :) And I did not even think about it more deeply but just expressed my immediate thought, so don't that that comment as an educated conclusion.

 

The halting problem is a famous problem in theoretical computer science. It basically says that there is no machine (algorithm, program, rule set etc.) that can tell whether some given program will halt for a given input. This is an undecidable question.

 

Example:

 

 

void main(int c) {
   while (c!=42) { };
}

 

If you pass 42 to this program, it will halt, else it will cycle forever. We as intelligent beings are able to immediately see that this program will halt for 42. The halting problem is formulated as: Given a pair of (program,input), decide whether the program will halt for the input. It is proven that this is impossible for a machine in the general case (, and since time does not matter in mathematics, forever in this universe).

 

This is not the scenario that you described. From some corollary to the halting problem, you can deduce that it is impossible for an algorithm to say what another program is doing. Or, in other words - and nota bene, in the general case, not in constrained areas - it is not possible for an algorithm to deduce the effects or the purposes of a some portion of code. Informally, you cannot write a program that deduces the meaning of code (I always say: in the general case).

 

This is where I threw in my comment, since you seemed to try keeping track of the meaning of some value, e.g. as a stack address. For the computer, addresses are values. A "stack address" is already an interpretation. Suppose, as another example, you pass a value for an upper bound of a loop. This value will be decremented for every pass of the loop. There is no feasible way to keep the idea of "loop bound" throughout the computation.

 

I just wanted to make you aware that the issue that you just found may be the first one of an ugly long list of issues on that track. If you were able to solve it in full generality, you would possibly be able to decide the halting program - which is impossible.

  • Like 2
Link to comment
Share on other sites

 

Sorry, that was a bit too cryptic from me. :) And I did not even think about it more deeply but just expressed my immediate thought, so don't that that comment as an educated conclusion.

 

The halting problem is a famous problem in theoretical computer science. It basically says that there is no machine (algorithm, program, rule set etc.) that can tell whether some given program will halt for a given input. This is an undecidable question.

 

Example:

void main(int c) {
   while (c!=42) { };
}

If you pass 42 to this program, it will halt, else it will cycle forever. We as intelligent beings are able to immediately see that this program will halt for 42. The halting problem is formulated as: Given a pair of (program,input), decide whether the program will halt for the input. It is proven that this is impossible for a machine in the general case (, and since time does not matter in mathematics, forever in this universe).

 

This is not the scenario that you described. From some corollary to the halting problem, you can deduce that it is impossible for an algorithm to say what another program is doing. Or, in other words - and nota bene, in the general case, not in constrained areas - it is not possible for an algorithm to deduce the effects or the purposes of a some portion of code. Informally, you cannot write a program that deduces the meaning of code (I always say: in the general case).

 

This is where I threw in my comment, since you seemed to try keeping track of the meaning of some value, e.g. as a stack address. For the computer, addresses are values. A "stack address" is already an interpretation. Suppose, as another example, you pass a value for an upper bound of a loop. This value will be decremented for every pass of the loop. There is no feasible way to keep the idea of "loop bound" throughout the computation.

 

I just wanted to make you aware that the issue that you just found may be the first one of an ugly long list of issues on that track. If you were able to solve it in full generality, you would possibly be able to decide the halting program - which is impossible.

 

 

 

Thank you for this very clear explanation.

 

Brief sidetrack if possible: (I don't want to hijack the thread)

 

It occurs to me that the halting problem may be a better Turing test than "THE" Turing Test.

And... if a neural net machine was constructed that could decide that a C program like yours will not halt,

does that negate the proof of the halting problem being impossible for a machine?

 

Sunday morning thoughts

Link to comment
Share on other sites

Another word of caution. It is attractive to draw conclusions from such theoretical theorems. For example, is it a proof of human mental superiority over machines that we can analyse programs, and the computer cannot? If we consider that human neurons are just a kind of biological circuit and the processing in the brain can be modeled as a massive parallel computation, does that mean that there is a transcendent soul?

 

The theory makes unrealistic assumptions, in particular that we have unlimited computing space and time. You can show that there is a solution to the halting problem when the space is bounded (because there is only a limited number of states in the machine). However, due to the tremendous amount of information in a typical computer, ranging up to terabytes, the number of states is so high that an enumeration would take more time than the universe has existed so far.

 

That is, for all realistic environments the problem reduces itself from impossible to unfeasible.

 

I don't want to discourage Klaus to continue his way. This was more like an educated reflex from me (if there is such a thing). Maybe Klaus can achieve what he wants because there is a clear scope how much shall be representable in his environment. The thing to keep in mind is that the farther you reach out, the more likely it can happen that you may end up in a jungle of issues and get stuck.

Link to comment
Share on other sites

I worked a lot on the project on the weekend. It is really a great feeling you get when building something.

From having two routes for the project (Simulator, "Text/Html Compiler" using an Intermediate Representation) I introduced a third route on sunday which is a Comparer of two Source Files.

 

Let me summarize, how it came:

After being satisfied with the generated Html files from the commented Source Code (ROM-4A_Acomplete.a99, ROM-4A_Bcomplete.a99, ROM-4A_Ccomplete.a99) I did go back to my collection of other Source Files.

 

con4ar0.txt

The Cyc Dvd features a very nice file called con4ar0.txt in the path \vendors\ti\internal\consrc, which also represents the source code for Rom 0 of the TI-99/4A, therefore Memory Range >0000 - >1FFF.

So we are talking about a file that is about the same Memory Range and basically should compile to the same binary like the combination of the upper 3 files.

Never the less the Symbol Names are many times different. And there are many times that the file ROM-4A_A.a99 only features the resolved Value while there is a nicely named Symbol in con4ar0.txt.

 

con4ar0.txt has some different but also familiar looking format, and looks to be like some list output file from some TI-990 Assembler.

The original source code lines got trimmed because the lines got intended by adding line numbers, memory address and compiled values to the left. That sadly trimmed some of the valuable comments.

After playing a bit in Notepad++ I got the file back to the supposed original source code format, so I was able to run it into the Text/Html Compiler.

 

The Text Compiler is meant to recreate a perfect looking copy of the scanned file from the intermediate Representation, even though the various strings (">1000", ">0C", "124", "R4") got read in as correct objects with a value instead of a (not-so-intelligent) string representation.

It takes into account zero leading numbers, stores whether a number was in hexadecimal or decimal Format in the original File and the spacing and positioning of Label, Opcode, Operands and Comments.

 

con4ar0.txt brought some minor challenges:

+ last line in text file doesn't end with line break

+ spaces after the last non-whitespace Character in the line

+ first use of BYTE and EVEN opcodes and the requirements of their effect on the Memory address line.

 

After taking care of those the Text Compiler recreated a .txt file that was identical to the scanned in file of con4ar0.txt as well.

The Html Compiler created a well looking file with the Memory addresses matching the ones from TI Intern, at least on all occations I have checked.

 

After starting a manul comparison of ROM-4A_A.a99 with con4ar0.txt I have immediately detected the need to introduce some rendering options to allow proper comparison

 

+ Option to turn off keeping leading zero digits in numbers

+ Settings how to render Workspace Registers, whether always use R as Prefix or never or keep the original way from the scanned line.

+ Settings how to render Numbers, whether to always render as Hexadecimal or always as Decimal or keep the original way from the scanned line.

+ Option to not render Empty Lines

+ Option to not render Comment Lines

+ Option to not render Comments

+ Option to Merge lines where Labels are defined in a dedicated line, with the actual opcode following in the line below, to one line only. (tricky one)

 

It required some intensive changes, but the Text/Html Compiler can now recreate the Text/Html File based on the given Rendering Options.

The two source code files look a lot more comparable already with the right set of renderOptions already.

 

The last thing I am currently working on and I hope that clears out most of the remaining diffs, is an option to resolve Symbolic References to their actual value.

Besides those I only see that one file might use DATA to define a word while the other file uses BYTE twice to define the same. So this might also be something to integrate into Rendering Options.

Edited by kl99
Link to comment
Share on other sites

Whether you used DATA or BYTE, or perhaps TEXT, doesn't matter to the final result. It's just a question about how the programmer thinks about the various data pieces, when he designs the program.

 

Opcodes are normally binary data which the CPU can decode and execute as instructions. Menmonics like CI, MOVB and MPY are representations of these opcodes, just in a way that are easier to remember for humans. But BYTE and EVEN aren't opcodes. They don't represent instructions the CPU can execute. Rather, they are assembler directives, controlling how the assembler generates its output.

 

Symbolic can be absolute or relative. At assembly time, only absolute values can be computed. The relative ones aren't actually known until the program is loaded into memory, and may be different from one instance to another.

 

Consider this example

     .PROC START,1 
SP   .EQU 10
     MOV  R5,53
     MOV *SP+,R2

BOO   CI   R2,562
      JHE  BAA
      A    R7,R2
      INC  R4
      JMP  BOO

BAA   MPY  R2,R9

LAB1  .EQU BAA-BOO
LAB2  .EQU LAB1+START

This is a piece of relocatable code. It can be loaded anywhere in memory by the loader.

Here, LAB1 can always be calculated, since that's the difference between two positions in the same code segment. That distance is always the same.

But LAB2 can't be calcuated until the code is actually loaded in memory, as the real value of START isn't known until at load time.

 

I realized just now that by habit, I used the syntax of the p-system's assembler, but you get the thing anyway. The .PROC directive declares an assembly procedure, here with one parameter. Similar to the DEF directive in the E/A assembler.

  • Like 1
Link to comment
Share on other sites

Whether you used DATA or BYTE, or perhaps TEXT, doesn't matter to the final result. It's just a question about how the programmer thinks about the various data pieces, when he designs the program.

 

This is not about executing such statements but rather align the representation of the data according to the given Rendering Options, when recreating a Text Source File or an Html Source File.

 

The user of TIcode99 should be able to select:

- I want to keep the opcode that was used in the original source lines.

- I want to render always as DATA

- I want to render always as BYTE

 

This way I can eliminate that false alert of a difference when comparing two source code files by aligning their render mode to either DATA or BYTE.

 

That is the whole purpose of those rendering Options.

 

The User can run his original Source code File through TIcode99 with the Option: render all numbers with 4 digit and as Hexadecimal.

And voila he gets a new file where all the Decimal numbers are suddenly Hexadecimals and use 4 digits.

Same for "R" Prefix usage of Workspace Registers.

And that will be true for all the other Rendering Options.

 

With the Intermediate Representation of the Source Code File I can do a lot of things, and that is why there are currently multiple ways to use that:

- start a TI Simulator

- Recreate the Source Code File given the provided Rendering Options

- Create an Html Version given the provided Rendering Options using syntax highlighting on a deep level and tooltips.

- Compare two Sourcecode Files and settings options which things to ignore for that comparison.

 

This list is not final and I could think of many more useful side-products:

- Source Code Editor / Application to develop in an Application with syntax Highlighting and with much knowledge on the meaning of the code

- Compile into Tms9900 Object Code

- sort of High Level Assembler Source Code that is generated from Assembler Source Code file, and compiles back to it

...

Link to comment
Share on other sites

...

+ Settings how to render Workspace Registers, whether always use R as Prefix or never or keep the original way from the scanned line.

...

 

As indicated before by @mizapf (I think), prefixing register numbers with ‘R’ is arbitrary and must be effected with EQUates. Though the ‘R’ prefix is a widespread mnemonic, it is certainly not universal. TI’s TI Forth developers, for example, used ‘TEMP’ as the prefix for the first eight Forth system registers (‘TEMP0’ – ‘TEMP7’) and non-numeric mnemonics for the remaining eight (‘U’, ‘SP’, ‘W’, ‘LINK’, ‘CRU’, ‘IP’, ‘R’, ‘NEXT’). They vacillated in the use of prefixes for naming the registers for other workspaces—probably reflecting different programmers. They often just used the register numbers—which is what you would get for the E/A Assembler listing by not choosing the ‘R’ option.

 

Though I certainly continued the use of unique register names for the system registers in fbForth, I did change all of the occurrences of ‘TEMP’ to ‘R’. Even retaining the mnemonic names for Forth registers 8 – 15 is fraught with danger for the programmer (me in this case) because it can make debugging difficult when a register has the wrong name, e.g., it has a name indicating it is a Forth system register, but, in fact, it is referencing a different workspace. I must admit that has tripped me up more than once. This is one of the dangers @mizapf mentioned of attaching meaning to code.

 

All of that said, I am enjoying reading of your progress with this ambitious project. Please, carry on. :)

 

...lee

  • Like 1
Link to comment
Share on other sites

After a long day in the office some question:

The left side shows the EA manual syntax definition, while the right side shows the Definition from the 990/9900 Assembly Language Reference Manual.

 

[<label>] b XOP b <gas>,<xop> b [<comment>] different: [<label>] b XOP b <gas>,<cnt> b [<comment>]

 

[<label>] b MOVB b (gas),(gad) b [<comment>] different: [<label>] b MOVB <gas>,<gad> b [<comment>]
[<label>] b STST b (wa) b [<comment>] different: [<label>] b STST b <wa> b [<comment>]
[<label>] b STWP b (wa) b [<comment>] different: [<label>] b STWP b <wa> b [<comment>]
[<label>] b SWPB b (gas) b [<comment>] different: [<label>] b SWPB b <gas> b [<comment>]

[<label>] b ANDI b (wa),(iop) b [<comment>] different: [<label>] b ANDI b <wa>,<iop> b [<comment>]
[<label>] b ORI b (wa),(iop) b [<comment>] different: [<label>] b ORI b <wa>,<iop> b [<comment>]
[<label>] b XOR b (gas),(wad) b [<comment>] different: [<label>] b XOR b <gas>,<wad> b [<comment>]
[<label>] b INV b (gas) b [<comment>] different: [<label>] b INV b <gas> b [<comment>]
[<label>] b CLR b (gas) b [<comment>] different: [<label>] b CLR b <gas> b [<comment>]
[<label>] b SETO b (gas) b [<comment>] different: [<label>] b SETO b <gas> b [<comment>]
[<label>] b SOC b (gas),(gad) b [<comment>] different: [<label>] b SOC b <gas>,<gad> b [<comment>]
[<label>] b SOCB b (gas),(gad) b [<comment>] different: [<label>] b SOCB b <gas>,<gad> b [<comment>]
[<label>] b SZC b (gas),(gad) b [<comment>] different: [<label>] b SZC b <gas>,<gad> b [<comment>]
[<label>] b SZCB b (gas),(gad) b [<comment>] different: [<label>] b SZCB b <gas>,<gad> b [<comment>]

[<label>] b SRA b (wa),(scnt) b [<comment>] different: [<label>] b SRA b <wa>,<scnt> b [<comment>]
[<label>] b SRL b (wa),(scnt) b [<comment>] different: [<label>] b SRL b <wa>,<scnt> b [<comment>]
[<label>] b SLA b (wa),(scnt) b [<comment>] different: [<label>] b SLA b <wa>,<scnt> b [<comment>]
[<label>] b SRC b (wa),(scnt) b [<comment>] different: [<label>] b SRC b <wa>,<scnt> b [<comment>]

 

Is there any difference between <gas> and (gas) ?

Somewhere the Manual writes () Indicates “the contents of”.

Does this even refer to the syntax Definitions or to something else?

 

I am currently trying to understand how to resolve all these expressions of operands that use Symbols to their actual value.

It seems sometimes they refer to the memory address who contains the label.

And sometimes the operand refers not to the memory address who contains the label but to the EQU value.

Still lots to learn it seems :)

Link to comment
Share on other sites

Interesting ... I never noticed that. The parentheses are only used for the semantic definition (execution results), while the angle brackets occur in the syntax definition. Thus, the parentheses in the syntax definition are not correct in my opinion.

 

Since I always use the documents from The Cyc I initally thought it to be a mistake in those only, but it is like this in the original printed manual.

Then it was my thought that the round parantheses have a different meaning and it would help me understand the resolving better, but it seems not.

At least I will ask Mike Wright to streamline it in their PHM 3055 Editor Assembler Manual.

I also want the credits for the program author and the manual author to be corrected in terms of Susan Jean Bailey in his edition.

From talking with her on Facebook and from her postings on Facebook TI99ers it seems she got way overruled back in the days in a men dominated TI company.

 

Also I have discovered some mistakes in the TI Intern but once my Comparer is developed, I will have a complete list of its source code mistakes.

 

The days at work are intense this week so I will not get to much to work on TIcode99.

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...