General, News

Microsoft uses AI to score 999 990 in Ms. Pac-Man, and we should all be worried

Brendyn Lotz

15th June 2017

Earlier this year Microsoft acquired Canadian deep learning startup Maluuba and that team has been hard at work, playing Ms. Pac-Man.

Over the last few months the team has been using a branch of artificial intelligence known as reinforcement learning to teach a machine to play the Atari 2600 version of Ms. Pac-Man. Yesterday the team announced its AI bot had achieved a score of 999 990 – the highest score possible.

Cool, so an AI beat a video and got a score four times better than any human has ever achieved, what’s the point? Well the real feat of genius is the use of reinforcement learning which is explained rather well in the video below.

For those that can’t watch the video here’s a brief explanation. The team broke the game down into “agents” including pellets, ghosts, edible ghosts and fruit with each agent given a reward or a punishment. The agents then send their movements to an aggregator which determined the move that was most likely to succeed based on the possible reward or punishment. Simply put Maluuba broke the game down into smaller pieces which made it easier to master and then used a reward system to determine the best way to move.

If that sounds very human to you, that’s because it is.

Associate professor of computer science at McGill University told Microsoft that the method Maluuba used was very similar to theories about how the brain works and suggested that this might be a step toward “more general intelligence”.

If the professor is right this seemingly insignificant feat of teaching a machine to play a game could put more humans at risk of losing their jobs to AI. The reason for this is that reinforcement learning is more complex than supervised learning when it comes to AI.

To explain this as simple as possible Microsoft writes,”An AI-based system that uses supervised learning would learn how to come up with a proper response in a conversation by feeding it examples of good and bad responses. A reinforcement learning system, on the other hand, would be expected to learn appropriate responses from only high-level feedback, such as a person saying she enjoyed the conversation–a much more difficult task.”

The team says that the same technology could be used in a firm’s sales division to make more accurate predictions about potential customers and when to contact them to get the most success. While the team says that this would free up a salesperson to focus on making a sale we have our tin foil hats on and suspect that its a short hop and a jump from AI doing the heavy lifting to AI just doing the job on its own.

Sure a robot might not be able to speak well (right now) but chatbots are becoming more prolific and wouldn’t you rather instant message a salesperson than talk to them on a phone?

If you think we’re being a bit paranoid check out this video from Kurzgesagt which lays out why our dreams of AI doing the rubbish work while we enjoy high-level jobs is flawed, very flawed.

[Source – Microsoft] [Image – CC BY SA 3.0 Alexisrael]

About Author

Brendyn Lotz

Brendyn Lotz writes news, reviews, and opinion pieces for Hypertext. His interests include SMEs, innovation on the African continent, cybersecurity, blockchain, games, geek culture and YouTube.