Notation
- : a point
- : which center is assigned to point
- : centers
- : number of centers we want
K-Max
Q. K-Max. K-Clustering with cost as maximum distance from center
alg. K-Max approximation.
- Choose any point as one center
- Choose the point most distant from any chosen center points
- Repeat until you have any many centers as you wish
- Tight 2-approximation
- Use triangle inequality
K-Means
Q. K-Means. K-Clustering with cost total squared distance (equiv. to cost mean squared distance):
alg. Lloyd’s Approximation Algorithm. (=K-Means Approximation)
- Initialize centers arbitrarily
- Divide into clusters
- Recompute new centers as the mean point of each cluster
- Repeat as many times as you want Analysis
- Guaranteed that every iteration will only decrease cost (=will converge)
- Converged centers are not guaranteed to be global optimum
alg. K-Means++ Approximation Algorithm.
- Initialize centers by…
- Choose any first center
- Choose randomly another point as the next center, but the probabiliy of choosing point is proportional to distance between and
- Repeat until we have all centers
- Run Llyod’s algorithm
- Re-initialize, re-run Llyod for trials Analysis
- with probability
- i.e. after trials…
- the best trial run will very likely have cost