The k-Nearest Neighbour (kNN) is a very simple classification algorithm. Given a training set \(\mathcal{D}=((\mathbf{x}^{1}, y^{1}), ..., (\mathbf{x}^{\ell}, y^{\ell}))\), \(\mathbf{x}^{i}\in\mathbb{R}^n, y^{i}\in\mathbb{R}\), we can define a distance measure between two data points, such as the euclidean distance for two points:
\[d(\mathbf{x}^{i}, \mathbf{x}^{j}) = \|\mathbf{x}^{i} - \mathbf{x}^{j}\|^2 = \sum\limits_{k=1}^n(\mathbf{x}^i_k-\mathbf{x}^j_k)^2\]
Having this measure allows us to make a majority vote of the \(k\) nearest points.
Probabilistic Interpretation
\(y\sim p\) and \(p(y)\) is the fraction of points \(\mathbf{x}^i\) in \(N_k(\mathbf{x})\) (\(k\) nearest points to \(\mathbf{x}\)), such that \(y^i=y\). In machine learning syntax we express this property in \(p(y | \mathbf{x},\mathcal{D})\) and calculate
\[\hat{y} = \arg\max_y p(y | \mathbf{x}, \mathcal{D})\]
This makes clear that kNN only describes a discriminative model, since we have only one distribution to generate.