Jump to content
IGNORED

music player with low cpu usage?


Recommended Posts

If using RLE the obvious solution would be to do seperate arrays per register as obviously you'll get decent runs for AUDCTL, fair to good ones for AUDF, probably not so good ones for AUDC if using envelopes.

Maybe the ideal would be a combination of RLE and some sort of delta system - often AUDF and AUDC registers will just change value by 1 up or down.

Link to comment
Share on other sites

Nice work with the dumber ;) Packing is a must as you have pointed out.

 

What Rybags meant - probably - that RLE should be perfect for AUDCTL as it doesn't change much and there for oyu can pack maybe 250 bytes into 2.

And the 4 AUDC channels use - most probably - envelopes with the same waveform and different volume so delta packing the 4 bits should be enough. Or at least, RLE is ill-suited as you will almost never have two or more times the same value. So it might even be bigger in the end, as you have to escape certain byte values.

AUDFx COULD be suitable for RLE as well.

 

So, heaven, next test for oyu is a packed and depacked data stream to check how much cycles/scanlines the depacker takes :)

Link to comment
Share on other sites

So, if your musicians need to use RMT for creating music. And you have the possibility to change the code, and to do optimisation there, Someone should do the instruments that change "interleaved". You could do that wit h 2 or even 4 channels. Later you adjust the player to play "Frame a" Channel 1 & 3 , "Frame b" Channel 0 & 2 ...

Avoiding "Vibrato" and other automated FX helps to keep the player small.

Link to comment
Share on other sites

Oh well, I thought about putting PLA as first instruction (to allow running it from BASIC) and then make the program exit after each pass. Don't know if if will be very hard to recreate actual "tempo" of song.

 

Good idea or waste of time, what do you think?

Edited by miker
Link to comment
Share on other sites

 

Nah, overkill for me (aside from a great Motorhead track) is to not have any RMT player on the A8 at all.
Have the RMT composer app itself output the whole 9-byte dump stream and then:
a) take each byte stream and convert the values to the movement size (note 1)
b) Huffman analyse the frequencies of the resulting values (note 2)
c) encode each stream using the table produced.
Alternatives would be to make a) optional (leave values as is) and/or for b) & c) be to replaced with run-length-encoding.
note 1: i.e. new value - old value but as this only gives us -127 -> +127
So -128 (b10000000) could be used as an escape to have the following byte set the new value.
note 2: bit of extra memory space wasted needing to save the table in addition to the encoded song data.
Therefore a small playback routine can get the next byte from each of the the 9 streams and then pump these to the Pokey registers.
Benefit... more CPU time for your super screen/sprite code. Potential space savings by the compression on the raw Pokey data and size of playback routine.
Disadvantages... probably a PITA to implement... even the compressed data may end up larger than the actual RMT song/pattern/instrument data :(
Other considerations... sound effects?.... 50/60Hz need different initial outputs?

 

yeah... look how much a Stereo RMT track needs cycles per frame... and a 9 byte RLE encoded stream could be the way to go. I definitly will try...

 

right now RMT costs me 1 FPS. Oxyron used on c64 similar method in some productions btw.

 

re: sound fx... right now we even have no sound fx library (not RMT)... I have no clue how... Look like RoF, Koronis Rift, Alley Cat, Star Raiders etc programmed sound. I am totally newbie on this field... Does anyone have experience in programming "explosions" etc?

 

 

 

Forgive me if I sound dull, but what is RLE & 9 bit streams, I'm really finding it difficult to understand what is trying to be explained, don't get me wrong, I am up on POKEY, I just don't know some of these modern terms, I've been away from 8-bit programming for some time & so am a bit rusty, some things are coming back to me, though.

I do understand some of what is being said, in that your trying to update the sound registers at an evenly measured amount of time, which ofcourse can be done via a DLI. This is the same for graphics, when your processing some kind of visual effect that takes multiple frames to process. The sound method merely needs a flag, which when set the DLI or VBI stores the values into the AUDio registers, obviously, you'll need to extract the AUDio storing from the sound monitor play routine, and when this is finished, then set the flag which the DLI or VBI needs to store the sound values.

The graphics method is similar, you just switch between 2 different screens, like page flipping, displaying the one whilst altering the 2nd & displaying the 2nd when updating the 1st for nice smooth changes. Off the top of my head, this is probably necessary with sound, unless the sound play routine takes this into consideration, for nice smooth changes.

Edited by ac.tomo
Link to comment
Share on other sites

Basicly what we trying to do is instead of using complex music routines simply use a stream of data which gets fed into the pokey.

 

Look RMT creates music with its own patterns and algorithms like instruments etc but in the end of the day it simply writes 50x or 60x per second values into the 9 pokey registers.

 

Now.... Think of we are recording these stream into memory and instead of using the complex music routines we just store the already recorded ("precalced") values into the pokey registers? Normally precalculating happens with gfx. Think of sprite animations which get stored in frames. We do the same for audio data. An example of audio data sampling is digitized music.

 

Now... Writing 50 times per second data into 9 audio registers of pokey costs a lot of memory for storage... In my example nearly 27kb for 40 sec.

 

Now when you are looking into the stream you recognize that you can pack the data stream to make it shorter.

 

The dli example was to show how many CPU cycles we are saving by using a stream instead of RMT replay routines.

  • Like 1
Link to comment
Share on other sites

Basicly what we trying to do is instead of using complex music routines simply use a stream of data which gets fed into the pokey.

 

Look RMT creates music with its own patterns and algorithms like instruments etc but in the end of the day it simply writes 50x or 60x per second values into the 9 pokey registers.

 

Now.... Think of we are recording these stream into memory and instead of using the complex music routines we just store the already recorded ("precalced") values into the pokey registers? Normally precalculating happens with gfx. Think of sprite animations which get stored in frames. We do the same for audio data. An example of audio data sampling is digitized music.

 

Now... Writing 50 times per second data into 9 audio registers of pokey costs a lot of memory for storage... In my example nearly 27kb for 40 sec.

 

Now when you are looking into the stream you recognize that you can pack the data stream to make it shorter.

 

The dli example was to show how many CPU cycles we are saving by using a stream instead of RMT replay routines.

 

Ah, right, I got you. In my opinion, this way obviously reduces CPU time, but excessively uses memory, This is usually the method for DIGItized SamPLes, perhaps an extra variable can be included in order to know how many frames before re-doing the AUDio registers.

Link to comment
Share on other sites

And to reduce the memory needed we can pack the data... And there are several ways like run length encoding ("rle" instead of storing 100x byte 0 why not only store 100,0?) or delta encoding ("storing only +,- difference to previous value and pack that cleverly) or more complex stuff like lz4.

Link to comment
Share on other sites

When I put my first RMT in the source code I thought that it was using 50 Hz update frequency, but it didn`t!

Streamwise doing the music at 50 Hz would reduce the quality of RMT music a bit.

But of course it has a lot of advantages to play at constant 50 Hz. For example it can be done with constant time in a raster program.

Link to comment
Share on other sites

When I put my first RMT in the source code I thought that it was using 50 Hz update frequency, but it didn`t!

Streamwise doing the music at 50 Hz would reduce the quality of RMT music a bit.

But of course it has a lot of advantages to play at constant 50 Hz. For example it can be done with constant time in a raster program.

 

The stream method would work with 100Hz, 200Hz and all other freqs. It's just a matter of memory.

 

FYI, for a quick and not really useful test I ZIPped the dat file from Heaven.

From 27k to 736 bytes. Don't know how many bytes are gzip header :)

 

WAIT... maybe not so useless. Maybe a kind of double buffer would work. decoding the next 5 seconds with inflate while playing :)

Guess Heaven (or someone else but me) needs to produce dump data for a more complex song which might not be packable that well.

Link to comment
Share on other sites

Creature... That's sound promising.... You can simply export another song and alter the addresses in my source. Not a big deal.

 

Too much trouble for a quicky.

i have no mads, and no 7z on my windows machine, so I pass and wait for your experiments :)

 

 

 

EDIT:

I'd rather try out the ring-buffer thinggy. In the end I would prefer that for ease of use.

Edited by Creature XL
Link to comment
Share on other sites

The crux of the matter with this to be most useful has been down to compression. What I was intending to do was to compress the individual patterns to a list of streams and reconstruct the song so I would have a very simple minimal player with hopefully not too large compressed streams. When it's been explored we'll have a better idea of what the trade off of memory for performance will be compared to the play routines, frequency tables and song data.

  • Like 2
Link to comment
Share on other sites

When it's been explored we'll have a better idea of what the trade off of memory for performance will be compared to the play routines, frequency tables and song data.

What to explore? The A8 has no math Co-Processor. So any calculation means lost CPU cycles.

It's as I wrote above. Using a "Runtime" that only acts on changes and does nothing when no change is needed.

One VBI cycle difference, to start an instrument on different channels is the way to go.

Also, no automated features as portamento or vibrato is allowed.

 

For something like this:

 

 

you need some update every 20-40 VBI cycles...

Link to comment
Share on other sites

here an example of packing channel 1 of the horror.rmt with lz4...

 

 

its little harder than thought... I need to alter Fox unlz4... esp. the store_byte... and have no time

 

basicly my idea is to alter the depacker directly into POKEY register... so we got 9 depack routines for each pokey register. not sure if that would be any faster... ;)

 

I guess Tezz's way might the way to go

 

 

.proc lz4_depacker
 
unlz4
                  jsr    lz4_GET_BYTE                  ; length of literals
                  sta    token
                  lsr
                  lsr
                  lsr
                  lsr
                  beq    read_offset                     ; there is no literal
                  cmp    #$0f
                  jsr    getlength
literals          jsr    lz4_GET_BYTE
                  jsr    store
                  bne    literals
read_offset       jsr    lz4_GET_BYTE
                  tay
                  sec
                  eor    #$ff
                  adc    lz4_dest
                  sta    lz4_src
                  tya
                  php
                  jsr    lz4_GET_BYTE
                  plp
                  bne    not_done
                  tay
                  beq    unlz4_done
not_done          eor    #$ff
                  adc    lz4_dest+1
                  sta    lz4_src+1
                  ; c=1
                  lda    #$ff
token             equ    *-1
                  and    #$0f
                  adc    #$03                            ; 3+1=4
                  cmp    #$13
                  jsr    getLength
 
@                 lda    $ffff
lz4_src               equ    *-2
                  inc    lz4_src
                  bne    @+
                  inc    lz4_src+1
@                 jsr    store
                  bne    @-1
                  beq    unlz4                           ; zawsze
store             sta    $ffff
lz4_dest              equ    *-2
sta $d200
                  inc    lz4_dest
                  bne    @+
                  inc    lz4_dest+1
@                 dec    lenL
                  bne    @+
                  dec    lenH
@
unlz4_done        rts
getLength_next    jsr    lz4_GET_BYTE
                  tay
                  clc
                  adc    #$00
lenL              equ    *-1
                  bcc    @+
                  inc    lenH
@                 iny
getLength         sta    lenL
                  beq    getLength_next
                  tay
                  beq    @+
                  inc    lenH
@                 rts
lenH              .byte    $00
 
lz4_get_byte  lda $ffff
lz4_source equ *-2
inc lz4_source
bne @+
inc lz4_source+1
@  rts
.endp

post-528-0-55013000-1422715066_thumb.png

Edited by Heaven/TQA
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...