Definition
- KNN is a supervised machine studying algorithm used for classification and regression.
- It predicts the label of a brand new knowledge level by wanting on the ‘okay’ nearest knowledge factors within the coaching dataset and utilizing a majority vote(classification) or common(regression)
- Additionally it is a non-parametric mannequin: means it makes no assumption about knowledge distribution.
- It’s an instance-based studying(lazy studying) algorithm→it doesn’t construct an express mannequin throughout coaching as a substitute, it shops the coaching knowledge and solely computed when making predictions
[kehne ka mtlb jb bhi ek new data point aayega to KNN uske K nearest neighbors(training data ke sbse paas wale points) ko dekhta h aur unke basis pr prediction krta h. Agr classification ki h toh majority voting use krenge mean jo class zyada baar aati h usi ko final ans maan lete h, aur agr regression h toh neighbors ka avg nikal kar output dete h. Ye ek instance-based learning algorithm h which is also called as laxy learning qki training ke time par model train nhi krta, balki sara training data store krta h aur jb prediction krna hota h tb distance calculate krke ans deta h. In simple words, hmare aas-paas ke dost neighbours kon h, whi decide krenge ki hm kaisi prediction krenge]
- Ok (no of neighbours)
- Small Ok: delicate to noise(overfitting)
- massive Ok: smoother choice boundary (underfitting)
- Tune utilizing cross-validation
2. Distance Metric
3. Weights
- uniform: all neighbours have equally weight
- distance: nearer neighbours have increased affect
4. Algorithm(used for looking neighbours effectively)
- brute: easy, slower for giant datasets
- kd_treebor ball_tree: environment friendly for prime dimensions
- auto: robotically chooses finest.
- When knowledge isn’t too massive(since Ok is computationally heavy)
- When choice boundaries are irregular
- When interpretability is essential
- For suggestion techniques, medical Prognosis, sample recognition
- Easy, intuitive, straightforward to implement
- No coaching section: good for streaming/on-line knowledge
- Works properly with small to medium datasets
- Naturally handles multi-class classification
- Gradual for prediction: should compute distance to all coaching samples
- Reminiscence heavy
- Curse of dimensionality: efficiency drops in excessive dimensions
- Normalize/Standardize options
- Use options choice to cut back dimensionality
- Use cross-validation to decide on finest Ok.