Jump to content
IGNORED

CloneSpy resurrected!


Thomas Jentzsch

Recommended Posts

Since I switched to Windows 10 (yes, I confess), it became more and more complicated to run my old 16-Bit CloneSpy program. So I adapted it to 32-Bit and now it works under Windows 10 too. icon_smile.gif

 

Additionally I created a little program (CloneSpy2CSV) which converts the CloneSpy output (here based on the latest Atarimania ROM collection V11.0 minus multi game ROMs) into CSV-format (filename must be clones.txt!). So you can now load it into your favorite spreadsheet and format it there. Mine is LibreOffice (Clones*.ods), so I also attached my spreadsheet with some nice coloring (see example pic).

 

Some results are quite interesting...

 

EDIT: Added V2.3 with some bugfixes, now including results for homebrews until 2012 too.

post-45-0-00520300-1474374582_thumb.png

CloneSpy 2.0.zip

CloneSpy V2.3.zip

Edited by Thomas Jentzsch
  • Like 4
Link to comment
Share on other sites

Some explanation, first the basics (if you know CloneSpy already, you can skip this):

 

CloneSpy identifies relationships between ROMs. For doing that it compares the ROM. Text compare tools are not suitable here and simple diff tools (like DOS fc) too. Instead CloneSpy is looking for identical (or very similar) byte sequences all over the ROM. This way, also code or data which was just moved around in the ROM is identified.

 

By doing that, CloneSpy can identify:

  • hacks/pirates and which ROMs they are (most likely) based on
  • relationships between ROMs which seem unrelated
  • differences between various versions of the same game (e.g. NTSC and PAL)
  • development history of various prototypes of a ROM
  • ROMs which may contain other, yet not identified games
  • ...

So CloneSpy can provide valuable, immediate information without having to analyze ROMs by hand. Of course the results are only (very reliable) indicators, for knowing the details you still have to look into the ROM.

 

Here is how to read the output in the spreadsheet (examples from the screenshot above):

  • ROMS are grouped, whenever a significant (in this case >33%) relation is found.
  • Within each group, CloneSpy tries to create sub groups of the most related ROMs again.
  • The first row of each group shows two numbers, the first is the counter for the ROM groups, the second for the total ROMs listed so far (e.g. 172/788 means that 172 ROM groups containing 788 ROMs have been listed so far)
  • The top row continues with the aliases of the ROMs (A..ZZ)
  • In the left column you see the names of the ROMs
  • Besides each ROM there is a letter, which serves as an alias for the top group row
  • If e.g. row A, column F reads 77% this means that ROM A contains 77% of the data found in F.
  • Usually you find a very similar value in the symmetric cell, e.g. row F, column A reads 78%
  • If the two values differ significantly (e.g. 20%), this indicates, that the ROM with the higher value does contain extra data. E.g. a combination of 50/95 indicates that there might be a 2nd ROM hidden inside the ROM with the larger number. If the values differ less, e.g. by only 5 or 10%, then the ROMs with the lower values are usually earlier versions of the same ROM (e.g. see Sinistar)
  • Values of 99% might be the result of dumping, the ROMs could be completely identical.

The output used to be a simple console DOS program with coloring (which got lost in Window's console emulation). Now I have switched to CSV output, which you can put into a spreadsheet and then color with conditional formatting. In my case (LibreOffice, available for all relevant platforms) red values mean the two ROMs are very related. With orange, yellow and green the relations are decreasing, blue (or none) values mean that the relation is very low (which could be just by coincidence) or non existing.

 

So what does this mean for the Pac-Kong/Spider Kong/whatever example above?

  1. There seem to be five sub groups of ROMs within the group (A..C, D/E, F..I, J/K and L..N)
  2. Within each sub group the ROMs differ very, very little. Probably less than 10 bytes. In the last sub group, they even seem 100% identical. So it is pretty clear that all ROMS within each sub group are coming from the same source.
  3. Between the sub groups there are varying relations which may allow to create a "family tree" of ROMs. The result has to be speculative, but nevertheless maybe interesting.

So let's try this family tree:

  • A..C is most related to D/E so it is either a parent, a sibling or a descendant of D/E
  • D/E is most related to J..K, followed by A..C and F..I, probably J..K is a parent and A..C and F..I are siblings of D/E (IMO multiple parents are very unlikely)
  • F..I is most related to J/K (its parent?)
  • ...

Here is my speculative(!) result:

L..N --- J/K --- F..I  
          |
          +----- D/E --- A..C

Of course it could be the other way around too. Or "something completely different"... :)

 

I hope this makes sense. You can run the CloneSpy program against any set of ROMs you want to analyze. So whenever a new ROM shows up, you can check if and how much it is related to other ROMs. There are a few parameters which allow you to adapt the analysis to your need (try "clonespy.exe ?" for help). CloneSpy2CSV.exe just expects a file named "clones.txt" and needs no parameters.

 

Enjoy!

  • Like 3
Link to comment
Share on other sites

As of now it only checks files up to a size of 32k. For larger files, only the first 32k are checked.

 

I could increase the size (with some extra refactoring), but then the current algorithm might become quite slow. How larger are the files you want to check?

Link to comment
Share on other sites

As of now it only checks files up to a size of 32k. For larger files, only the first 32k are checked.

 

I could increase the size (with some extra refactoring), but then the current algorithm might become quite slow. How larger are the files you want to check?

I was just curious if your algorithm could be used for ROMs from different systems... I sort of assumed you'd tuned it for 6502 opcode/address mode sizes or something, but maybe not? :)

Edited by R.Cade
Link to comment
Share on other sites

I wonder how related small batari basic roms appear to be?

And great tool ;-) I once wrote something similar + simpler for C64 cracks to see who imported/copied whom in what order ;-)

 

I think to analyze BASIC programs the same way you would have to detect and specifically exclude the RUNTIME component.

 

The same would need to be done analyzing C64 BASIC programs - code for the BASIC ROM should be left out of the comparison.

Link to comment
Share on other sites

 

I wonder how related small batari basic roms appear to be?

If someone has a collection of those, I can check them.

 

Probably they will form just one big group.

 

Yup. I tested the old "hacks and homebrews v. 1.2b (sorted)" (you can find it here).

I only ran the utility in the "homebrews" directory and the result is 141 unique files out of 463 total.

post-10599-0-09803600-1474485476_thumb.png

By looking at the resulting Clones.txt file, you can see a huge group of 190 related roms.

post-10599-0-47682600-1474485472_thumb.png

Clones.zip

I couldn't generate the csv file out of it using Clones2CSV. I got a "Runtime error 216". Don't know if that's due to my setup or if the utility just can't handle such a large table.

post-10599-0-33326300-1474485473_thumb.png

It works without problems on the rom collection I use in Stella and Harmony cart, which is the atarimania.com one plus a selection of homebrews and hacks for a total of over 2400 roms, but the largest group in there is about 40 roms.

Link to comment
Share on other sites

Yea, the group size limit was 130 (copied from old 16 bit code which had to stay within 64k data). I increased it to 1024.

 

New program and spreadsheet attached. In the picture you can identify some quite large clusters. The sub groups seem not optimal. Probably there should be only two big clusters plus some smaller sub groups.

 

A few BB games (especially Cave In) are quite isolated, which indicates that there is a lot of additional content inside. Also I suppose that there are different BB versions showing in the clusters.

ClonesHH.zip

Clones2CSV V1.1.zip

post-45-0-56624500-1474487945_thumb.png

Edited by Thomas Jentzsch
  • Like 2
Link to comment
Share on other sites

New program and spreadsheet attached. In the picture you can identify some quite large clusters. The sub groups seem not optimal. Probably there should be only two big clusters plus some smaller sub groups.

 

I can't zoom in and see the titles for some reason. Is there a way to attach a better image? I wonder if it got resized.

 

-John

Link to comment
Share on other sites

Yup. I tested the old "hacks and homebrews v. 1.2b (sorted)" (you can find it here).

I only ran the utility in the "homebrews" directory and the result is 141 unique files out of 463 total.

 

 

Yeah the RUNTIME is confusing this program though Alex, so your results are largely meaningless without excluding it for BASIC programs.

Link to comment
Share on other sites

I found a second BB group, which uses a multi sprite kernel. This kernel is very different from the usual playfield kernel which is used by far the most BB games. That's why it has its own group.

 

BTW: Most playfield kernel BB games are within just two sub groups (probably using two variations of that kernel, their years also seem to indicate that). Their BB framework content groups them together. And the lower the additional content, the more they are clustered.

 

E.g. if two unrelated ROMs share 70% of data (in both directions!), then this means, that only ~30% additional content comes from the developer. The 70% is mostly the BB framework code and data.

 

Personal rule of thumb: The lower the additional content, the less work was probably put into a game, the less quality it probably has.

post-45-0-36923600-1474529264_thumb.png

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...