sklearn.linear_model.RANSACRegressor
- 
class sklearn.linear_model.RANSACRegressor(base_estimator=None, *, min_samples=None, residual_threshold=None, is_data_valid=None, is_model_valid=None, max_trials=100, max_skips=inf, stop_n_inliers=inf, stop_score=inf, stop_probability=0.99, loss='absolute_loss', random_state=None)[source]
- 
RANSAC (RANdom SAmple Consensus) algorithm. RANSAC is an iterative algorithm for the robust estimation of parameters from a subset of inliers from the complete data set. Read more in the User Guide. - Parameters
- 
- 
base_estimatorobject, default=None
- 
Base estimator object which implements the following methods: - 
fit(X, y): Fit model to given training data and target values.
- 
score(X, y): Returns the mean accuracy on the given test data, which is used for the stop criterion defined bystop_score. Additionally, the score is used to decide which of two equally large consensus sets is chosen as the better one.
- 
predict(X): Returns predicted values using the linear model, which is used to compute residual error using loss function.
 If base_estimatoris None, thenLinearRegressionis used for target values of dtype float.Note that the current implementation only supports regression estimators. 
- 
- 
min_samplesint (>= 1) or float ([0, 1]), default=None
- 
Minimum number of samples chosen randomly from original data. Treated as an absolute number of samples for min_samples >= 1, treated as a relative numberceil(min_samples * X.shape[0]) formin_samples < 1. This is typically chosen as the minimal number of samples necessary to estimate the givenbase_estimator. By default asklearn.linear_model.LinearRegression()estimator is assumed andmin_samplesis chosen asX.shape[1] + 1.
- 
residual_thresholdfloat, default=None
- 
Maximum residual for a data sample to be classified as an inlier. By default the threshold is chosen as the MAD (median absolute deviation) of the target values y.
- 
is_data_validcallable, default=None
- 
This function is called with the randomly selected data before the model is fitted to it: is_data_valid(X, y). If its return value is False the current randomly chosen sub-sample is skipped.
- 
is_model_validcallable, default=None
- 
This function is called with the estimated model and the randomly selected data: is_model_valid(model, X, y). If its return value is False the current randomly chosen sub-sample is skipped. Rejecting samples with this function is computationally costlier than withis_data_valid.is_model_validshould therefore only be used if the estimated model is needed for making the rejection decision.
- 
max_trialsint, default=100
- 
Maximum number of iterations for random sample selection. 
- 
max_skipsint, default=np.inf
- 
Maximum number of iterations that can be skipped due to finding zero inliers or invalid data defined by is_data_validor invalid models defined byis_model_valid.New in version 0.19. 
- 
stop_n_inliersint, default=np.inf
- 
Stop iteration if at least this number of inliers are found. 
- 
stop_scorefloat, default=np.inf
- 
Stop iteration if score is greater equal than this threshold. 
- 
stop_probabilityfloat in range [0, 1], default=0.99
- 
RANSAC iteration stops if at least one outlier-free set of the training data is sampled in RANSAC. This requires to generate at least N samples (iterations): N >= log(1 - probability) / log(1 - e**m) where the probability (confidence) is typically set to high value such as 0.99 (the default) and e is the current fraction of inliers w.r.t. the total number of samples. 
- 
lossstring, callable, default=’absolute_loss’
- 
String inputs, “absolute_loss” and “squared_loss” are supported which find the absolute loss and squared loss per sample respectively. If lossis a callable, then it should be a function that takes two arrays as inputs, the true and predicted value and returns a 1-D array with the i-th value of the array corresponding to the loss onX[i].If the loss on a sample is greater than the residual_threshold, then this sample is classified as an outlier.New in version 0.18. 
- 
random_stateint, RandomState instance, default=None
- 
The generator used to initialize the centers. Pass an int for reproducible output across multiple function calls. See Glossary. 
 
- 
- Attributes
- 
- 
estimator_object
- 
Best fitted model (copy of the base_estimatorobject).
- 
n_trials_int
- 
Number of random selection trials until one of the stop criteria is met. It is always <= max_trials.
- 
inlier_mask_bool array of shape [n_samples]
- 
Boolean mask of inliers classified as True.
- 
n_skips_no_inliers_int
- 
Number of iterations skipped due to finding zero inliers. New in version 0.19. 
- 
n_skips_invalid_data_int
- 
Number of iterations skipped due to invalid data defined by is_data_valid.New in version 0.19. 
- 
n_skips_invalid_model_int
- 
Number of iterations skipped due to an invalid model defined by is_model_valid.New in version 0.19. 
 
- 
 ReferencesExamples>>> from sklearn.linear_model import RANSACRegressor >>> from sklearn.datasets import make_regression >>> X, y = make_regression( ... n_samples=200, n_features=2, noise=4.0, random_state=0) >>> reg = RANSACRegressor(random_state=0).fit(X, y) >>> reg.score(X, y) 0.9885... >>> reg.predict(X[:1,]) array([-31.9417...]) Methodsfit(X, y[, sample_weight])Fit estimator using RANSAC algorithm. get_params([deep])Get parameters for this estimator. predict(X)Predict using the estimated model. score(X, y)Returns the score of the prediction. set_params(**params)Set the parameters of this estimator. - 
fit(X, y, sample_weight=None)[source]
- 
Fit estimator using RANSAC algorithm. - Parameters
- 
- 
Xarray-like or sparse matrix, shape [n_samples, n_features]
- 
Training data. 
- 
yarray-like of shape (n_samples,) or (n_samples, n_targets)
- 
Target values. 
- 
sample_weightarray-like of shape (n_samples,), default=None
- 
Individual weights for each sample raises error if sample_weight is passed and base_estimator fit method does not support it. New in version 0.18. 
 
- 
- Raises
- 
- ValueError
- 
If no valid consensus set could be found. This occurs if is_data_validandis_model_validreturn False for allmax_trialsrandomly chosen sub-samples.
 
 
 - 
get_params(deep=True)[source]
- 
Get parameters for this estimator. - Parameters
- 
- 
deepbool, default=True
- 
If True, will return the parameters for this estimator and contained subobjects that are estimators. 
 
- 
- Returns
- 
- 
paramsdict
- 
Parameter names mapped to their values. 
 
- 
 
 - 
predict(X)[source]
- 
Predict using the estimated model. This is a wrapper for estimator_.predict(X).- Parameters
- 
- 
Xnumpy array of shape [n_samples, n_features]
 
- 
- Returns
- 
- 
yarray, shape = [n_samples] or [n_samples, n_targets]
- 
Returns predicted values. 
 
- 
 
 - 
score(X, y)[source]
- 
Returns the score of the prediction. This is a wrapper for estimator_.score(X, y).- Parameters
- 
- 
Xnumpy array or sparse matrix of shape [n_samples, n_features]
- 
Training data. 
- 
yarray, shape = [n_samples] or [n_samples, n_targets]
- 
Target values. 
 
- 
- Returns
- 
- 
zfloat
- 
Score of the prediction. 
 
- 
 
 - 
set_params(**params)[source]
- 
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters
- 
- 
**paramsdict
- 
Estimator parameters. 
 
- 
- Returns
- 
- 
selfestimator instance
- 
Estimator instance. 
 
- 
 
 
Examples using sklearn.linear_model.RANSACRegressor
 
    © 2007–2020 The scikit-learn developers
Licensed under the 3-clause BSD License.
    https://scikit-learn.org/0.24/modules/generated/sklearn.linear_model.RANSACRegressor.html