Notation

  • : a point
  • : which center is assigned to point
  • : centers
    • : number of centers we want

K-Max

Q. K-Max. K-Clustering with cost as maximum distance from center

alg. K-Max approximation.

  1. Choose any point as one center
  2. Choose the point most distant from any chosen center points
  3. Repeat until you have any many centers as you wish
  • Tight 2-approximation
    • Use triangle inequality

K-Means

Q. K-Means. K-Clustering with cost total squared distance (equiv. to cost mean squared distance):

alg. Lloyd’s Approximation Algorithm. (=K-Means Approximation)

  1. Initialize centers arbitrarily
  2. Divide into clusters
  3. Recompute new centers as the mean point of each cluster
  4. Repeat as many times as you want Analysis
  • Guaranteed that every iteration will only decrease cost (=will converge)
  • Converged centers are not guaranteed to be global optimum

alg. K-Means++ Approximation Algorithm.

  1. Initialize centers by…
    1. Choose any first center
    2. Choose randomly another point as the next center, but the probabiliy of choosing point is proportional to distance between and
    3. Repeat until we have all centers
  2. Run Llyod’s algorithm
  3. Re-initialize, re-run Llyod for trials Analysis
  • with probability
    • i.e. after trials…
    • the best trial run will very likely have cost