Thomas Jentzsch Posted September 20, 2016 Share Posted September 20, 2016 (edited) Since I switched to Windows 10 (yes, I confess), it became more and more complicated to run my old 16-Bit CloneSpy program. So I adapted it to 32-Bit and now it works under Windows 10 too. Additionally I created a little program (CloneSpy2CSV) which converts the CloneSpy output (here based on the latest Atarimania ROM collection V11.0 minus multi game ROMs) into CSV-format (filename must be clones.txt!). So you can now load it into your favorite spreadsheet and format it there. Mine is LibreOffice (Clones*.ods), so I also attached my spreadsheet with some nice coloring (see example pic). Some results are quite interesting... EDIT: Added V2.3 with some bugfixes, now including results for homebrews until 2012 too. CloneSpy 2.0.zip CloneSpy V2.3.zip Edited September 23, 2016 by Thomas Jentzsch 4 Quote Link to comment Share on other sites More sharing options...
alex_79 Posted September 21, 2016 Share Posted September 21, 2016 Tried and it worked flawlessy in Wine on my linux machine (both Clonespy and the CSV convert utility).Thanks for the update! 1 Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 Some explanation, first the basics (if you know CloneSpy already, you can skip this): CloneSpy identifies relationships between ROMs. For doing that it compares the ROM. Text compare tools are not suitable here and simple diff tools (like DOS fc) too. Instead CloneSpy is looking for identical (or very similar) byte sequences all over the ROM. This way, also code or data which was just moved around in the ROM is identified. By doing that, CloneSpy can identify: hacks/pirates and which ROMs they are (most likely) based on relationships between ROMs which seem unrelated differences between various versions of the same game (e.g. NTSC and PAL) development history of various prototypes of a ROM ROMs which may contain other, yet not identified games ... So CloneSpy can provide valuable, immediate information without having to analyze ROMs by hand. Of course the results are only (very reliable) indicators, for knowing the details you still have to look into the ROM. Here is how to read the output in the spreadsheet (examples from the screenshot above): ROMS are grouped, whenever a significant (in this case >33%) relation is found. Within each group, CloneSpy tries to create sub groups of the most related ROMs again. The first row of each group shows two numbers, the first is the counter for the ROM groups, the second for the total ROMs listed so far (e.g. 172/788 means that 172 ROM groups containing 788 ROMs have been listed so far) The top row continues with the aliases of the ROMs (A..ZZ) In the left column you see the names of the ROMs Besides each ROM there is a letter, which serves as an alias for the top group row If e.g. row A, column F reads 77% this means that ROM A contains 77% of the data found in F. Usually you find a very similar value in the symmetric cell, e.g. row F, column A reads 78% If the two values differ significantly (e.g. 20%), this indicates, that the ROM with the higher value does contain extra data. E.g. a combination of 50/95 indicates that there might be a 2nd ROM hidden inside the ROM with the larger number. If the values differ less, e.g. by only 5 or 10%, then the ROMs with the lower values are usually earlier versions of the same ROM (e.g. see Sinistar) Values of 99% might be the result of dumping, the ROMs could be completely identical. The output used to be a simple console DOS program with coloring (which got lost in Window's console emulation). Now I have switched to CSV output, which you can put into a spreadsheet and then color with conditional formatting. In my case (LibreOffice, available for all relevant platforms) red values mean the two ROMs are very related. With orange, yellow and green the relations are decreasing, blue (or none) values mean that the relation is very low (which could be just by coincidence) or non existing. So what does this mean for the Pac-Kong/Spider Kong/whatever example above? There seem to be five sub groups of ROMs within the group (A..C, D/E, F..I, J/K and L..N) Within each sub group the ROMs differ very, very little. Probably less than 10 bytes. In the last sub group, they even seem 100% identical. So it is pretty clear that all ROMS within each sub group are coming from the same source. Between the sub groups there are varying relations which may allow to create a "family tree" of ROMs. The result has to be speculative, but nevertheless maybe interesting. So let's try this family tree: A..C is most related to D/E so it is either a parent, a sibling or a descendant of D/E D/E is most related to J..K, followed by A..C and F..I, probably J..K is a parent and A..C and F..I are siblings of D/E (IMO multiple parents are very unlikely) F..I is most related to J/K (its parent?) ... Here is my speculative(!) result: L..N --- J/K --- F..I | +----- D/E --- A..C Of course it could be the other way around too. Or "something completely different"... I hope this makes sense. You can run the CloneSpy program against any set of ROMs you want to analyze. So whenever a new ROM shows up, you can check if and how much it is related to other ROMs. There are a few parameters which allow you to adapt the analysis to your need (try "clonespy.exe ?" for help). CloneSpy2CSV.exe just expects a file named "clones.txt" and needs no parameters. Enjoy! 3 Quote Link to comment Share on other sites More sharing options...
enthusi Posted September 21, 2016 Share Posted September 21, 2016 (edited) I wonder how related small batari basic roms appear to be? And great tool I once wrote something similar + simpler for C64 cracks to see who imported/copied whom in what order Edited September 21, 2016 by enthusi Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 I wonder how related small batari basic roms appear to be? If someone has a collection of those, I can check them. Probably they will form just one big group. Quote Link to comment Share on other sites More sharing options...
R.Cade Posted September 21, 2016 Share Posted September 21, 2016 Does it only work on Atari roms, or any files? Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 As of now it only checks files up to a size of 32k. For larger files, only the first 32k are checked. I could increase the size (with some extra refactoring), but then the current algorithm might become quite slow. How larger are the files you want to check? Quote Link to comment Share on other sites More sharing options...
R.Cade Posted September 21, 2016 Share Posted September 21, 2016 (edited) As of now it only checks files up to a size of 32k. For larger files, only the first 32k are checked. I could increase the size (with some extra refactoring), but then the current algorithm might become quite slow. How larger are the files you want to check? I was just curious if your algorithm could be used for ROMs from different systems... I sort of assumed you'd tuned it for 6502 opcode/address mode sizes or something, but maybe not? Edited September 21, 2016 by R.Cade Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 Nope, it is pretty generic. I am more or less emulating a LZSS compression after an RLE. Quote Link to comment Share on other sites More sharing options...
R.Cade Posted September 21, 2016 Share Posted September 21, 2016 Might be interesting to remove the file limit and throw it at some other ROMs for the heck of it. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 You can do that manually. Take a good archive tool like RAR or ZIP, add the 1st file, write down the size. Then add the 2nd file and check the size increase. Basically that's what CloneSpy is doing for every file combination. Quote Link to comment Share on other sites More sharing options...
Mr SQL Posted September 21, 2016 Share Posted September 21, 2016 I wonder how related small batari basic roms appear to be? And great tool I once wrote something similar + simpler for C64 cracks to see who imported/copied whom in what order I think to analyze BASIC programs the same way you would have to detect and specifically exclude the RUNTIME component. The same would need to be done analyzing C64 BASIC programs - code for the BASIC ROM should be left out of the comparison. Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 Might be interesting to remove the file limit and throw it at some other ROMs for the heck of it. Here is a version which supports up to 1 MB. I hope that size is sufficient for your needs. CloneSpy V2.1.zip 1 Quote Link to comment Share on other sites More sharing options...
R.Cade Posted September 21, 2016 Share Posted September 21, 2016 Nice! I ran it through a set of 5200 ROMs for the heck of it. Jr. Pac-Man is 46% of Ms. Pac-Man. 1 Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 And vice versa? Quote Link to comment Share on other sites More sharing options...
R.Cade Posted September 21, 2016 Share Posted September 21, 2016 I assume so... it groups them. ;-------- 33 41 Jr Pac-Man (1984) (Atari) 14292 Ms. Pac-Man (1982) (Atari) 14414 (7638) 46.66% (47.01%, 46.31%) 42 Ms. Pac-Man (1982) (Atari) 14414 Quote Link to comment Share on other sites More sharing options...
alex_79 Posted September 21, 2016 Share Posted September 21, 2016 I wonder how related small batari basic roms appear to be? If someone has a collection of those, I can check them. Probably they will form just one big group. Yup. I tested the old "hacks and homebrews v. 1.2b (sorted)" (you can find it here). I only ran the utility in the "homebrews" directory and the result is 141 unique files out of 463 total. By looking at the resulting Clones.txt file, you can see a huge group of 190 related roms. Clones.zip I couldn't generate the csv file out of it using Clones2CSV. I got a "Runtime error 216". Don't know if that's due to my setup or if the utility just can't handle such a large table. It works without problems on the rom collection I use in Stella and Harmony cart, which is the atarimania.com one plus a selection of homebrews and hacks for a total of over 2400 roms, but the largest group in there is about 40 roms. Quote Link to comment Share on other sites More sharing options...
GroovyBee Posted September 21, 2016 Share Posted September 21, 2016 It doesn't seem to like file names with the format :- name.extension1.extension2 Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 (edited) Yea, the group size limit was 130 (copied from old 16 bit code which had to stay within 64k data). I increased it to 1024. New program and spreadsheet attached. In the picture you can identify some quite large clusters. The sub groups seem not optimal. Probably there should be only two big clusters plus some smaller sub groups. A few BB games (especially Cave In) are quite isolated, which indicates that there is a lot of additional content inside. Also I suppose that there are different BB versions showing in the clusters. ClonesHH.zip Clones2CSV V1.1.zip Edited September 21, 2016 by Thomas Jentzsch 2 Quote Link to comment Share on other sites More sharing options...
+Propane13 Posted September 21, 2016 Share Posted September 21, 2016 New program and spreadsheet attached. In the picture you can identify some quite large clusters. The sub groups seem not optimal. Probably there should be only two big clusters plus some smaller sub groups. I can't zoom in and see the titles for some reason. Is there a way to attach a better image? I wonder if it got resized. -John Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 21, 2016 Author Share Posted September 21, 2016 (edited) It is a screenshot. My monitor isn't bigger, so I had to resize to 24%. Can't you load the spreadsheet? LibreOffice is free and easy to install. Edited September 21, 2016 by Thomas Jentzsch Quote Link to comment Share on other sites More sharing options...
+Propane13 Posted September 21, 2016 Share Posted September 21, 2016 Ah, I see. Didn't realize the ODS file was a spreadsheet. It's been awhile. Quote Link to comment Share on other sites More sharing options...
Mr SQL Posted September 21, 2016 Share Posted September 21, 2016 Yup. I tested the old "hacks and homebrews v. 1.2b (sorted)" (you can find it here). I only ran the utility in the "homebrews" directory and the result is 141 unique files out of 463 total. Yeah the RUNTIME is confusing this program though Alex, so your results are largely meaningless without excluding it for BASIC programs. Quote Link to comment Share on other sites More sharing options...
enthusi Posted September 22, 2016 Share Posted September 22, 2016 Actually my point was exactly that. How much are BASIC programs alike because they use the BASIC routines. 1 Quote Link to comment Share on other sites More sharing options...
Thomas Jentzsch Posted September 22, 2016 Author Share Posted September 22, 2016 I found a second BB group, which uses a multi sprite kernel. This kernel is very different from the usual playfield kernel which is used by far the most BB games. That's why it has its own group. BTW: Most playfield kernel BB games are within just two sub groups (probably using two variations of that kernel, their years also seem to indicate that). Their BB framework content groups them together. And the lower the additional content, the more they are clustered. E.g. if two unrelated ROMs share 70% of data (in both directions!), then this means, that only ~30% additional content comes from the developer. The 70% is mostly the BB framework code and data. Personal rule of thumb: The lower the additional content, the less work was probably put into a game, the less quality it probably has. 1 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.