AI masters 49 Atari 2600 games without instructions

Mr SQL · February 26, 2015

Same here, speculation by the guy who lost doesn't prove anything.

I find Kasparov's experiments after IBM refused a rematch enlightening in this regard:

In what Rasskin-Gutman explains as Moravec’s Paradox, in chess, as in so many things, what computers are good at is where humans are weak, and vice versa. This gave me an idea for an experiment. What if instead of human versus machine we played as partners? My brainchild saw the light of day in a match in 1998 in León, Spain, and we called it “Advanced Chess.” Each player had a PC at hand running the chess software of his choice during the game. The idea was to create the highest level of chess ever played, a synthesis of the best of man and machine.

Although I had prepared for the unusual format, my match against the Bulgarian Veselin Topalov, until recently the world’s number one ranked player, was full of strange sensations.... A month earlier I had defeated the Bulgarian in a match of “regular” rapid chess 4–0. Our advanced chess match ended in a 3–3 draw. My advantage in calculating tactics had been nullified by the machine.

http://www.nybooks.com/articles/archives/2010/feb/11/the-chess-master-and-the-computer/

I invite anyone to draw their own conclusions playing my KC Munchkin port for the Atari - it is the same idea where you and the computer play together as a team:

The game offers an opportunity to experience Transhumanism, an amalgam of player and machine; the game is hard but via cooperative teamwork with the CPU you can rack up points and flip the score, something that never happened even once in my test groups with the AI player partner disabled. These results, mirror Kasparov's experiment.

If Deep-Blue or Deep-Q achieved transhumanism berift of a human component then like Jentzsch, I would truly like to see that marvelous code.

Edited February 26, 2015 by Mr SQL

Big Player · February 26, 2015

Can you give some details about the meaning of the three bars?

E.g. what does it mean that for Double Dunk the blue bar is very short compared to the gray one? And why the thin line is shorter than the gray one?

Here is the text under the that table. I was hoping someone could explain to me the bar with Double Dunk.

The performance of DQN is normalized with respect to a professional human games tester (that is, 100% level) and random play (that is, 0% level). Note that the normalized performance of DQN, expressed as a percentage, is calculated as: 100 × (DQN score − random play score)/(human score − random play score). It can be seen that DQN outperforms competing methods (also see Extended Data Table 2) in almost all the games, and performs at a level that is broadly comparable with or superior to a professional human games tester (that is, operationalized as a level of 75% or above) in the majority of games. Audio output was disabled for both human players and agents. Error bars indicate s.d. across the 30 evaluation episodes, starting with different initial conditions.

And here is extended data table 2:

Best Linear Learner is the best result obtained by a linear function approximator on different types of hand designed features¹². Contingency (SARSA) agent figures are the results obtained in ref. 15. Note the figures in the last column indicate the performance of DQN relative to the human games tester, expressed as a percentage, that is, 100 × (DQN score − random play score)/(human score − random play score).

+SpiceWare · February 26, 2015

I find Kasparov's experiments after IBM refused a rematch enlightening in this regard:

IBM's refusal and dismantling is suspicious - but without proof, I cannot consider IBM guilty of foul play.

Thomas Jentzsch · February 26, 2015

And since IBM dismantled Deep Blue immediately afterward, this claim cannot be verified, one way or the other. Regardless of any bias on Kasperov's part, that's a bit suspicious to me.

His claim is solely based on the matches he played against that machine. So it should be easy to check. I am very sure, if there would have been anything suspicious, he would have been backed by other grand masters.

Thomas Jentzsch · February 26, 2015

At Video Pinball the differences between random play and all other values (except DQN) are very, very low. From those it looks like there is very little influence on the player's side. But then DQN might have found some strategy.

On the other side the 0 random play results of Freeway (and Enduro too) are indeed surprising.

Mr SQL · February 26, 2015

His claim is solely based on the matches he played against that machine. So it should be easy to check. I am very sure, if there would have been anything suspicious, he would have been backed by other grand masters.

Tom, the team of human chessmasters and programmers networked to Deep Blue could have made some of the moves instead of just reprogramming the machine between moves.

He asked for a rematch with a Chess Federation referee watching them - IBM preferred to dismantle.

Did they resist the very human temptation to make a few moves, becomming one with the machine in the process? His later experiments, suggest to me they did not.

Thomas Jentzsch · February 26, 2015

I get you. There are things which suggest something might have been wrong. But still, how can you call it a fact?

+Nathan Strum · February 26, 2015

So, the AI learned how to play these games entirely without human input, correct?

I wonder how much better they would have done if they could have watched a human play a game first, and based their learning off that?

LS_Dracon · February 27, 2015

Here's a good video about HillClimbing :

http://www.youtube.com/watch?v=boTeFM-CVFw

Actually all his videos about AI are great.

Cobra Commander · February 27, 2015

This is one of the "mysteries and questions" I had when I was a kid. I didn't know what was in IC's at an early age. I thought there was magic in them and went on a spree testing all sorts of AI scenarios. Those random number generators really had me fooled! Games behaved different each time I played them, but stuck true to what they were programmed for.

Later on I sadly discovered it all came down to the transistors following elaborate instructions.

I used to joke that electronics were little pipes filled with smoke. When something fails, or someone screws up, they "Let the smoke out."

Mr SQL · February 27, 2015

I get you. There are things which suggest something might have been wrong. But still, how can you call it a fact?

Easy - there's something wrong with the process to begin with, scout's honour notwithstanding:

It's not possible for a group of Chessmasters and programmers to coordinate to "reprogram" the chess computer between moves without already being in cyborg territory! The only fair match against a computer has to be just against a computer; they can only reprogram between games - between moves is tanamount to team play because you are helping the machine come up with the next move. The ref would call foul just observing the process. How exactly are the chessmasters "advising" if not saying "if Kasparov moves here, adjust the algorythm so that the computer will make this move. If he moves to this spot, do this other move".

No doubt Deep Q could be reprogrammed in similar fashion between each round of play and between each different game with specific handlers but that's the human programmers doing the learning.

Thomas Jentzsch · February 27, 2015

Sorry, I don't get you. Are you even answering my question?

Mr SQL · February 27, 2015

Sorry, I don't get you. Are you even answering my question?

I thought I was; let's try it the other way around.

Facts:

1. The match was supposed to be between a computer and a human.

2. A group of chessmasters and programmers assisted the computer between moves.

Does this make sense to you? :-D

Thomas Jentzsch · February 27, 2015

Makes sense.

But where is the evidence for #2?

Mr SQL · February 27, 2015

http://science.slashdot.org/story/02/11/18/1810222/behind-deep-blue

The programmer discussion on slashdot is pretty interesting; the book Behind Deep Blue by the author of Deep Blue asserts that a bug in the software forced deep blue to make a few human moves unaccounted for by it's algorythm's thus defeating Kasparov... no really.

Passing the turing test because of a bug in the software or via networked humans assisting between moves?

Even if the networked humans only advised DB between games that's still assisting and this magic bug belies the idea they weren't assisting between moves.

Facts:

1. The match was supposed to be between a computer and a human.

2. Either a magic bug or a group of chessmasters and programmers assisted the computer between moves.

Have you ever had a bug like that in your code?

Thomas Jentzsch · February 27, 2015

I had all kind of bugs in my code.

Especially when coding more or less complex algorithms, some results sometimes are surprising. And sometimes a bug in coding causes better results than the correct code (e.g. happened to me while developing DOOD). And that was dead simple code compared to what was inside Deep Blue.

If I would follow your logic and arbitrarily (IMO) limit the possible reasons to just two options, I would probably now call it a fact that the bug was the reason.

Edited February 27, 2015 by Thomas Jentzsch

fiddlepaddle · February 27, 2015

Also, and especially, when you get into parallel processing and multiple processors and threaded code, bugs can be completely counter-intuitive, and then with heuristic (learning) algorithms you have surprising solutions that absolutely short-circuit the human programming or thinking process.

Race conditions can cause thrashing/extreme slow-downs or crashes, and even illogical results, from a common sense perspective. I can sure understand how one or more bugs could require manual intervention ("Hey Bill! It's stuck again...hit reset!")

Another simple example of a bug surprising to many people: even though we all know that area=pi*r*r, when you go to actually calculate this area using a digital computer using this formula, it cannot be completely accurate since pi is an irrational number; so if you subsequently attempt to compare that calculated area to a number you obtained through another means that you know IS completely accurate, your test may fail (ie., IF AREA == x THEN), despite what appears to be perfect logic.

AI masters 49 Atari 2600 games without instructions

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members