Jump to content

Photo

VBXE speed


64 replies to this topic

#26 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Wed May 10, 2017 8:08 PM

Yes - incredible savings potential.  Generally source data will be in some sort of array anyhow so easily picked/placed by a blit whereas with the CPU it's lots of work since the target address will be +1 a few times then +21 from the first one for the next BCB.  For several objects not a problem but I imagine moving polygons like you have would be into the hundreds.



#27 phaeron OFFLINE  

phaeron

    River Patroller

  • 2,196 posts
  • Location:USA

Posted Wed May 10, 2017 9:39 PM

There is currently an emulation limitation in Altirra where it will only "flush" blit lists at the end of a scanline. For short blits like this, this is slower than the real hardware, which does not have this artifact. However, it's still the case that you will get much better performance with blit lists instead of individual blits. You don't want to use the CPU to copy the left and right edges into the blit list -- put your ledge and redge lists in VRAM and let the blitter do the copies. This is a prerequisite anyway for doing the edge stepping on the blitter.



#28 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 10, 2017 10:31 PM

Thanks guys as we wondered in altirra that it had such yellow bands and looked like it they were aligned and synced somehow.

Phaeron.... how accurate would you describe the emulation level of VBXE in Altirra?

#29 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 10, 2017 10:36 PM

Right now CPU is building the edge lists (thanks of not having fraction steps in Blitter ;)) which are aligned in vram window.

Then 2 copy bcbs copy the right edge into the scanline BCB list and the color.

Unfortunately as the Blitter has no 2nd source channel CPU calculates the sizex for span length and writes them into the blitlist.

Then vbxe blits the poly.

Don't see that how to do that with Blitter....

#30 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Wed May 10, 2017 11:35 PM

Yeah, the blitter has it's limitations.  It's got the basic add function but without carry it's not greatly useful.

 

Are you using "constant source data" ?  Refer fx1.24.pdf page 37:

 


The Blitter and constant source data
If the result of the following equation:
(blt_and_mask==0)
is true, then the source data is CONSTANT – it is independent from the source area and
its value is equal to blt_xor_mask. The Blitter will skip the phase of fetching the source
data, and the entire operation will be performed quicker. Filling VRAM with a constant
value is twice as fast as copying.

 

Set the AND mask in the BCB to 00 which instructs the blit to not fetch source data, instead whatever is in the XOR mask is used as the fill data.



#31 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Thu May 11, 2017 1:01 AM

I simply copied data with standard copy of the edge buffer into blitlist.

The colors of spans are filled in with xor.

Not sure if it makes sense to blit constant array values into xor and then blit in burst mode into blitlist.

Sounds stupid operation ;) but if that works.

#32 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Thu May 11, 2017 1:18 AM

Do you mean XOR as in to force the fast fill mode... not XOR as in XOR to plot then second XOR to unplot?

 

I imagine doing it that way would be very slow... fastest method would probably be to just have a single fill/erase blit that wipes all possible memory that the polygons can occupy each time.



#33 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Thu May 11, 2017 1:43 AM

I mean AND #0 EOR #tablevalue.

 

I need to copy several values into blitlist

 

start pos and face color

span size.



#34 phaeron OFFLINE  

phaeron

    River Patroller

  • 2,196 posts
  • Location:USA

Posted Thu May 11, 2017 10:19 PM

Phaeron.... how accurate would you describe the emulation level of VBXE in Altirra?

 

Probably about 90%. Attribute map is probably the biggest issue as attribute map collision is not implemented and attribute map cells narrower than 8 pixels do not work authentically -- they are clamped to 8 pixels wide instead of rendering narrower and then running out of data. MEMAC, overlay, and blitter should be feature complete. The emulation is not cycle exact and you will encounter small differences if you attempt to race the beam very tightly. Also, MEMAC cycles are not counted against the blitter.

 

Another thing to keep in mind is that the emulator emulates core version 1.24. The current version is 1.26 and there have been some changes to overlay priority. Since there are multiple versions in the wild, you will probably want to try to work on both and maybe even 1.09 as well.

 

Right now CPU is building the edge lists (thanks of not having fraction steps in Blitter ;)) which are aligned in vram window.

Then 2 copy bcbs copy the right edge into the scanline BCB list and the color.

Unfortunately as the Blitter has no 2nd source channel CPU calculates the sizex for span length and writes them into the blitlist.

Then vbxe blits the poly.

Don't see that how to do that with Blitter....

 

Assuming you always have left <= right, use one blit to copy right into width, a second blit to add left into it with XOR $FF, and a third blit to add constant $01. A + (B XOR $FF) + 1 = A - B and the three blits cost 8 VBXE cycles per entry.

 

Add with carry CAN be done in a rather expensive way: create a 64K lookup table and do one blit per operation to look up the result from the two bytes. I think it's also possible to do it with 7-bit math instead of 8-bit math, by using the 8th bit as a carry bit. It can be shifted down by ANDing with $80 during a stencil blit on top of $7F, leaving either $7F or $80. It takes a lot of steps to do all this, but keep in mind that given a big enough blit the blitter is still more than 20x faster than the 6502.



#35 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Thu May 11, 2017 10:42 PM

As i am not a maths guy.... I thought of combinations all of those logic operations might be used to make some more complex maths... ;)

Thx.

#36 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Tue May 16, 2017 3:01 PM

Ok... another issue regarding stepx,y esp source.

The doks are somehow misleading.

Assume a texture line 0-4095.

And I want to scale that on a size of maybe 64.

Which would mean a stepx of 4096/64 = 256

But stepx works in ranges -128 to 127.

And stepy get be added after one line.

My size x 64

#37 phaeron OFFLINE  

phaeron

    River Patroller

  • 2,196 posts
  • Location:USA

Posted Tue May 16, 2017 3:15 PM

Yes, if you are trying to downscale a 4096x1 image to 64x1, the X step is too large to fit. What you can do is interpret it as 1x4096 and use Y step instead. Both the X step and Y step are controllable, so you are not required to have X and Y match your actual X and Y in the bitmap.

 

Using the blitter step to scale will only get you integer factors, though, so it's not going to work for texture mapping if that's what you're thinking.



#38 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Tue May 16, 2017 9:52 PM

Well.... you can have the fraction bits inside the texture.... thinking of duplicating each pixel 256x to gain 8bit fraction?

#39 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Tue May 16, 2017 9:57 PM

Phaeron... when is that offsets added? Each pixel? when blitting one line (sizex=64 and sizey=0) why seems it adds the source stepx and stepy to start source? (Luckily it does but....)

#40 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Tue May 16, 2017 10:36 PM

Have to wonder though - would just having "zoom tables" be more memory efficient than replicating an object massively oversized?

 

I guess a zoom table for a 100 pixel wide object could potentially be 10,000 entries or make it 20,000 if you want to represent each size from 1 pixel to 200 wide.

Doing it the 256 per pixel way to allow that method of pick and place would come to 25,600 bytes.

I suppose it comes down to what zoom factors, is there more emphasis on enlarge or reduce mode?  The advantage of a zoom table might be that you only need one and it can be used for multiple objects, though you'd probably need to align each object on some address boundary.

 

Another possibility - with graphics card textures on PCs, not sure if it's still done - they keep multiple copies of each texture, each one 50% reduced from the previous.  Once the displayed size drops to 50% it starts using the next reduced size.


Edited by Rybags, Tue May 16, 2017 10:37 PM.


#41 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 17, 2017 1:41 AM

appling zoom... makes the blit slower or faster?

 

I am not playing with idea of texture mapping yet... :) just trying to "map" an 1d texture so to say and trying to find a way to use the 12bit step y to get the "fraction" abbility somehow.



#42 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Wed May 17, 2017 2:13 AM

Native VBXE zoom horizontally is faster than a normal copy operation since you have N replications where only the first one needs the source read.

As Avery said earlier, Y zoom has no such advantage since the data can't be buffered so has to be re-read each time.



#43 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 17, 2017 2:39 AM

ah ok... so that's true with zoom-x?



#44 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Wed May 17, 2017 4:23 AM

Zoom-X is easy since it's doing the copy/fill operation in a linear fashion and the "current" value is being repeated.

Zoom-Y not quite so since the blits are usually in a raster fashion so after pixel (0,0) is copied, pixel (0,1)  might be after another hundred or more have been moved.

 

Since Step_X only allows the signed 8-bit value, doesn't really make it feasible to use the blit in a "sideways" mode.



#45 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 17, 2017 4:41 AM

yeah... only usecase which pops into my mind for stepx usage is flipping sprites...



#46 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Wed May 17, 2017 6:13 AM

Step_x has a use case for size reduction, but it again comes down to lack of any fraction ability.  So in a real situation you'd need that replicated source data to allow a decent variety of sizes.



#47 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 17, 2017 6:53 AM

any ideas why Electron did not implemented fractions? ST Blitter has, Amiga Blitter has... Lynx Blitter has...



#48 Rybags OFFLINE  

Rybags

    Quadrunner

  • 14,971 posts
  • Location:Australia

Posted Wed May 17, 2017 8:21 AM

ST and Amiga have them?  I didn't think so.  Can't say I've ever seen stuff on either that uses the sort of variable sizing that would allow.

 

Why not implemented, I suppose it wouldn't fit.  Though doing the simple fractional stuff where it's just a phase-accumulator type thing like SID uses, it'd be fairly cheap.

Plenty of things VBXE didn't get that we want... if I had the skills I'd do a core that sacrificed a few display features to give more coprocessing type aids.



#49 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 17, 2017 8:25 AM

word size increments fits to 8.8 ;)



#50 Heaven/TQA OFFLINE  

Heaven/TQA

    Quadrunner

  • Topic Starter
  • 10,143 posts
  • Location:Baden-Württemberg, Germany

Posted Wed May 17, 2017 8:26 AM

I think purpose of VBXE was to get the damned "color ram" to A8... ;) other stuff might being added later.






0 user(s) are browsing this forum

0 members, 0 guests, 0 anonymous users