Motivation. Suppose you have to choose which route to take every day driving to work. You devise a complicated strategy, but one day your neighbor and coworker, who takes the same route to work every day, says “I don’t have a strategy, I just take this one route every day.” Wouldn’t it be regretful if it turns out, in total, your route took more time than your co-worker?

def. Online Decision Making Game.

  • Player has actions to choose from,
  • At time
    1. Player constructs a distribution over the set of actions
    2. Adversary chooses a loss for each action taken, , for every action where represents no loss and represents a loss. (In our example, think of it as traffic conditions causing a delay.)
    3. Player’s distribution is realized into action . The player incurs a loss of .
  • The player’s goal is to minimize total loss, which we will define shortly.
  • We play for time . Number of iterations is predetermined.

We characterize the loss if we always chose action (like how our neighbor always takes the same route) for time as:

Thus we can define also:

  • is the minimum total loss if we could only choose one action every time
  • is the total loss of player playing strategy of , .

def. External Regret. For a player playing strategy , the regret for this strategy is:

i.e. the difference between total loss of algorithm and the best total loss of one-action-every-time strategy.

Algorithms for Online Games

alg. Greedy algorithm. This algorithm chooses the action whose one-action-fits-all-time loss is smallest

  1. Initially:
  2. At time , choose such that we take the minimum possible . In other words:
  • Breaks ties determinimistically, with the action with lowest index.

Motivation. This algorithm is really bad. Instead, we can try to confuse our adversary by mixing our strategy, i.e. a randomized algorithm.

alg. Randomized Weighted Majority.

  1. Initially, play with probability for all
  2. At time
  • where is the discount factor
  1. Calculate this new weight for every strategy , and then play with probability
  • where

This is a much better algorithm; in fact we can show how small its regret is.

thm. (Regret Bound of Randomized Weighted Majority) For , the loss of RWM algorithm satisfies:

Proof. Let denote the fraction of weights that are discounted because they incurred a loss. We show this is equal to the expected loss at timestep :

Now, we can express using way, by splitting the summation into those that incurred a loss and those that didn’t.

We can now construct the inequality:

And with some algebra and inequality: