No-Regret Dynamics

Motivation. Suppose you have to choose which route to take every day driving to work. You devise a complicated strategy, but one day your neighbor and coworker, who takes the same route to work every day, says “I don’t have a strategy, I just take this one route every day.” Wouldn’t it be regretful if it turns out, in total, your route took more time than your co-worker?

def. Online Decision Making Game.

Player has $N$ actions to choose from, $X = {1, \dots, N}$
At time $t$ …
1. Player constructs a distribution $p^{t}$ over the set of actions $X$
2. Adversary chooses a loss for each action taken, $l_{i} \in {0, 1}$ , for every action $i \in X$ where $0$ represents no loss and $1$ represents a loss. (In our example, think of it as traffic conditions causing a delay.)
3. Player’s distribution $p^{t}$ is realized into action $k^{t} \in X$ . The player incurs a loss of $l_{k^{t}}^{t}$ .
The player’s goal is to minimize total loss, which we will define shortly.
We play for time $t = 0, \dots, T$ . Number of iterations $T$ is predetermined.

We characterize the loss if we always chose action $i$ (like how our neighbor always takes the same route) for time $t = 1, \dots, T$ as:

L_{i}^{T} : = t = 1 \sum T l_{i}^{t}

Thus we can define also:

$L_{min}^{T}$ is the minimum total loss if we could only choose one action every time
$L_{A L G}^{T} : = \sum_{t = 1}^{T} l_{A L G}^{t}$ is the total loss of player playing strategy of $A L G$ , $l_{A L G}^{t = 1}, l_{A L G}^{t = 2}, \dots$ .

def. External Regret. For a player playing strategy $A L G$ , the regret for this strategy is:

R_{A L G} : = L_{A L G}^{T} - L_{min}^{T}

i.e. the difference between total loss of algorithm and the best total loss of one-action-every-time strategy.

Algorithms for Online Games

alg. Greedy algorithm. This algorithm chooses the action whose one-action-fits-all-time loss is smallest

Initially: $x^{1} = 1$
At time $t$ , choose $x_{i}$ such that we take the minimum possible $L_{i}^{T}$ . In other words:

x_{i}^{t} such that = argmin_{i} L_{i}^{T}

Breaks ties determinimistically, with the action with lowest index.

Motivation. This algorithm is really bad. Instead, we can try to confuse our adversary by mixing our strategy, i.e. a randomized algorithm.

alg. Randomized Weighted Majority.

Initially, play $i$ with probability $p_{i}^{1} = \frac{1}{N}$ for all $i \in X$
At time $t$ …

w_{i}^{t} = {w_{i}^{t - 1} (1 - η) w_{i}^{t - 1} if incurred loss, i.e. l_{i}^{t - 1} = 1 if incurred loss, i.e. l_{i}^{t - 1} = 1

where $η$ is the discount factor

Calculate this new weight for every strategy $i$ , and then play $i$ with probability

p_{i}^{t} = \frac{w _{i}^{t}}{W ^{t}}

where $W^{t} : = \sum_{i \in X} w_{i}^{t}$

This is a much better algorithm; in fact we can show how small its regret is.

thm. (Regret Bound of Randomized Weighted Majority) For $η \leq \frac{1}{2}$ , the loss of RWM algorithm satisfies:

L_{RWM}^{T} \leq (1 + η) L_{min}^{T} + \frac{ln N}{η}

Proof. Let $F^{t}$ denote the fraction of weights that are discounted because they incurred a loss. We show this is equal to the expected loss at timestep $t$ :

F^{t} : = \frac{\sum _{i; l_{i} = 1} w _{i}^{t}}{W ^{t}} = i; l_{i} = 1 \sum \frac{w _{i}^{t}}{W ^{t}} = i \in X \sum \frac{w _{i}^{t}}{W ^{t}} l_{i}^{t} = i \in X \sum p_{i}^{t} l_{i}^{t} = E (l^{t})

Now, we can express $W^{t + 1}$ using $F^{t}$ way, by splitting the summation into those that incurred a loss and those that didn’t.

W^{t + 1} : = i \in X \sum t w_{i}^{t} = (1 - η) \sum_{i; l_{i} = 1} w_{i}^{t} i; l_{i} = 1 \sum w_{i}^{t} (1 - η) W^{t} - \sum_{i; i l_{i} = 1} i; l_{i} = 0 \sum w_{i}^{t} = (1 - η) F^{t} W^{t} + (W^{t} - F^{t} W^{t}) = W^{t} - η F^{t} W^{t} = W^{t} (1 - η F^{t})

We can now construct the inequality:

max_{i} w_{i}^{t + 1} (1 - η)^{L_{min}^{T}} L_{min}^{T} ln (1 - η) \leq W^{t + 1} \leq W^{t + 1} = W^{1} (1 - n F^{1}) \times \dots \times (1 - n F^{T}) = N t = 1 \prod T (1 - η F^{t}) = ln N + t = 1 \sum T ln (1 - η F^{t}) \leq ln N + t = 1 \sum T η F^{t} = ln N + η t = 1 \sum T F^{t} = ln N + η L_{RWM}^{T} sum is greater than max max weight is one with min loss W^{1} = N Taking log on both sides \forall x \in R, ln (1 - x) \leq - x

And with some algebra and inequality: $\forall z \in R, - ln (1 - z) \leq z + z^{2}$

L_{r w m}^{T} \leq \frac{ln N}{η} - \frac{L _{min}^{T} ln ( 1 - η )}{η} \leq \frac{ln N}{η} - (1 + η) L_{min}^{T}

■

PK's Notes

Explorer

No-Regret Dynamics

Algorithms for Online Games

Graph View

Backlinks

PK's Notes

Explorer

No-Regret Dynamics

Algorithms for Online Games §

Graph View

Backlinks

Algorithms for Online Games