خطوط الإنتاج (Pipelines)

نتعلم في هذا الدرس: التعامل مع الميزات العددية والوصفية معًا في خط معالجة واحد.

تعويض القيم المفقودة
توحيد القياس للميزات العددية
ترميز الميزات الوصفية

المعالجة القبلية (Preprocessing)

يسبق أي نموذج معالجة قبليَّة (Preprocessing) للخصائص العددية والوصفية كالتقييس والترميز وملء الفراغات، وقد تتعدد هذه الخطوات لكل خاصية.

خط الإنتاج (Pipeline)

خط الإنتاج (Pipeline) هو صياغة للخطوات المتفرقة على نسق واحد، يكون في آخره النموذج؛ بحيث يتم التعامل مع الخط كاملاً كما لو أنه هو النموذج. وذلك ليتم ضبط النموذج مع جميع المعالجات القبليَّة له جُملَةً واحدة؛ فالتغيير فيها أو في عوامل النموذج، كلاهُما يؤثر في النتيجة النهائية: التنبؤ.

أولها: الإجمال (عكس التفصيل): فبدلاً من كتابة كود طويل لكل خطوة كهذا:

step_1 = SimpleImputer()
X_train = step_1.fit_transform(X_train)

step_2 = StandardScaler()
X_train = step_2.fit_transform(X_train)

model = SGDRegressor()
model.fit(X_train, y_train)

.. المسار يجمع الخطوات في سلسلة واحدة، يتم فيها تحديد كل خطوة وتسميتها، على النحو التالي

pipe = Pipeline(
    steps=[
        ('imputer', SimpleImputer()),
        ('scaler', StandardScaler()),
        ('regressor', SGDRegressor()),
    ]
)

.. فيكفي استدعاء fit() مرة واحدة على بياناتك لملاءمة سلسلة الخطوات جملةً واحدة:

pipe.fit(X_train, y_train)

أما transform() فيتم استدعاؤها عند الحاجة: عند التنبؤ أو التقييم.

وكذلك عند التنبؤ، لا حاجة لمعرفة تفاصيل التحويلات ثم كتابتها واحدة واحدة، بل هي محفوظة كقطعة متكاملة:

pipe.predict(X_new)

وكذلك التقييم:

pipe.score(X_test, y_test)

ما هي فلسفة مكونات خط المعالجة (Pipeline)؟

نتعلم في هذا الدرس: الانتقال من النموذج إلى خط المعالجة (Pipeline)، حيث جميع خطواته هي أحد أمرين لا ثالث لهما. ففي مكتبة sklearn تم تسمية الوحدة في مسار المعالجة باسم المقدِّر (Estimator)، وهي إما:

محوِّل (Transformer):
- يُحصي مقادير من البيانات: .fit(X)
- يُحوِّل البيانات بناءً على المقادير المحسوبة: .transform(X)
متنبئ (Predictor):
- يلائم البيانات: .fit(X, y)
- يتنبأ بالنتائج: .predict(X)
- يُقيَّم: .score(X, y)

صورة توضح كون المحول والمتبنئ كلاهما مقدران

لنفصّل الأمر:

المحوّل (Transformer)

ملاحظة: قد تضيف المحولات أعمدة أو تزيلها. لكنها لا تضيف بيانات جديدة (صفوف) أو تزيلها.

وإليك أمثلة للمحولات وعمل كلا الإجرائين فيها:

المحول	التقدير: `fit`	التحويل: `transform`
`SimpleImputer('mean')`	يحصي المتوسط.	يملأ الفراغات بالمتوسط.
`StandardScaler`	يحسب المتوسط والانحراف المعياري.	يحول البيانات لتكون على المقياس الطبيعي.
`OneHotEncoder`	يحدد الصفات الفريدة ويخصص ترميزاً أحاديًّا لكل منها.	يطبق الترميزات على العينات ليحولها إلى أعمدة أحادية.
`OrdinalEncoder`	يحدد الصفات الفريدة ويخصص لها ترتيباً عدديًا.	يطبق الترميز ويحول الصفات إلى قيم عددية مرتبة.

المتنبئ (Predictor)

مثل النماذج: LinearRegression, SGDRegressor, KNeighborsClassifier

فأما إجراءاته فهي:

التقدير: fit(X, y) وهي ملاءمة النموذج للبيانات وفق خوارزمية التعلم الآلي المحددة. وهنا يحصل التعليم / التدريب.
التنبؤ: predict(X) تطبّق النموذج على الحالات الجديدة فقط.

الهيكل التنظيمي للمكتبة

وبهذا يتبيَّنُ لك كيف تم تقسيم المكتبة على نحو هذا الشكل:

"""
sklearn/
├── preprocessing/       <-- (Transformers)
│   ├── StandardScaler
│   ├── OneHotEncoder
│   └── ...
├── impute/              <-- (Transformers)
│   ├── SimpleImputer
│   └── ...
├── linear_model/        <-- (Predictors)
│   ├── LinearRegression
    ├── SGDRegressor
│   └── ...
├── neighbors/           <-- (Predictors)
│   ├── KNeighborsClassifier
│   └── ...
├── ensemble/            <-- (Predictors)
│   ├── RandomForestClassifier
│   └── ...
└── pipeline/            <-- (The Glue)
    └── Pipeline
"""

حيث يتم استيرادها على هذا النحو:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

خلاصة خط الإنتاج

العنصر	الإجراء	الغرض
المُقدِّر (Estimator)	`fit()`	لملاءمة المعاملات من البيانات.
المُحوِّل (Transformer)	`transform()`	لاستخدام المعاملات التي تم تعلمها أو حسابها على البيانات.
المُتنبِّئ (Predictor)	`predict()`	لاستخدام المعاملات التي تم تعلمها للتنبؤ.
المُتنبِّئ (Predictor)	`score()`	لتقييم جودة الملاءمة (الأعلى أفضل).

فيما يلي مثال لكيفية ذلك ..

1. تحميل مجموعة البيانات

نحتاج إلى تعريف البيانات والهدف (target). هنا نبني نموذج انحدار (regression model).

from sklearn.datasets import fetch_openml

ames_housing = fetch_openml(name="house_prices", as_frame=True)
data = ames_housing.data
target = ames_housing.target

نستعرض الصفوف الأولى من إطار البيانات (dataframe).

data.head()

	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	PoolQC	Fence	MiscFeature	MoSold	YrSold	SaleType	SaleCondition
0	1	60	RL	65.0	8450	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	2	2008	WD	Normal
1	2	20	RL	80.0	9600	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	5	2007	WD	Normal
2	3	60	RL	68.0	11250	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	9	2008	WD	Normal
3	4	70	RL	60.0	9550	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	2	2006	WD	Abnorml
4	5	60	RL	84.0	14260	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	12	2008	WD	Normal

5 rows × 80 columns

2. أخذ مجموعة فرعية من الميزات

من أجل البساطة، يمكننا اختيار بعض الميزات (features) والاكتفاء بهذه المجموعة الفرعية من البيانات:

numeric_features = ["LotArea", "FullBath", "HalfBath"]
categorical_features = ["Neighborhood", "HouseStyle"]
data = data[numeric_features + categorical_features]

3. المعالجة الأولية (Pre-processing)

نادراً ما تكون البيانات الخام جاهزة للاستخدام الفوري في نماذج تعلم الآلة. نحتاج غالباً إلى خطوات معالجة مثل:

التعامل مع القيم المفقودة.
ترميز المتغيرات الوصفية (Categorical Encoding).
تقييس المتغيرات العددية (Scaling).

لنسهل الأمر، سنختر مجموعة محددة من الأعمدة للعمل عليها:

أ. التعامل مع القيم المفقودة (Imputation)

القيم المفقودة مشكلة شائعة. لا تقبل معظم الخوارزميات وجود فراغات في البيانات.

استراتيجيات التعويض (SimpleImputer):

mean (المتوسط): تُستخدم مع البيانات العددية ذات التوزيع الطبيعي.
median (الوسيط): الأفضل للبيانات العددية التي تحتوي على قيم شاذة (Outliers).
most_frequent (الأكثر تكراراً): تُستخدم غالباً مع البيانات الوصفية (Categorical).
constant (قيمة ثابتة): تعويض القيمة المفقودة بقيمة محددة (مثلاً “Unknown” أو 0).

كيف ندمجها في خط المعالجة؟ عند استخدام Pipeline، نضمن أن عملية التعويض (imputer.fit) تتم فقط على بيانات التدريب، ثم تُطبق القواعد المستخلصة على بيانات الاختبار. هذا يمنع تسرب المعلومات.

3.1 الميزات العددية

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

numeric_transformer = Pipeline(
    steps=[
        ("imputer", SimpleImputer(strategy="median")),
        (
            "scaler",
            StandardScaler(),
        ),
    ]
)

3.2 الميزات الوصفية

from sklearn.preprocessing import OneHotEncoder

categorical_transformer = OneHotEncoder(handle_unknown="ignore")

4. تعيين التحويل للأعمدة المراد تحويلها

ثلاث نقاط يجب تذكرها:

محوّل الأعمدة (ColumnTransformer) رابط يُستخدم لاختيار الميزات التي تُطبَّق عليها التحويلات المحددة.
اتحاد الميزات (FeatureUnion) رابط يربط مخرجات المحوّلات في فضاء ميزات مركّب (composite feature space).
أخيراً، لتحويل الهدف (target) (مثلاً تحويل لوغاريتمي لـ y) استخدم متنبئ الهدف المحوّل (TransformedTargetRegressor) رابط.

from sklearn.compose import ColumnTransformer

preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_features),
        ("cat", categorical_transformer, categorical_features),
    ]
)

5. تعريف المسار

تعريف نموذج الانحدار (regression model).

أحياناً نسمّي المسار بالكامل النموذج (model). لذلك نسمي المتنبئ (predictor) بهذا الاسم.

from sklearn.linear_model import SGDRegressor

predictor = SGDRegressor(loss="squared_error")

ربط الخطوات بالمتنبئ (predictor).

pipe = Pipeline(
    steps=[
        ("preprocessor", preprocessor),
        ("regressor", predictor),
    ]
)
pipe

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('num',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(strategy='median')),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  ['LotArea', 'FullBath',
                                                   'HalfBath']),
                                                 ('cat',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['Neighborhood',
                                                   'HouseStyle'])])),
                ('regressor', SGDRegressor())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

قياس التعميم

أحد أهم المفاهيم في تعلم الآلة هو قدرة النموذج على التعميم (Generalization)، أي أداؤه على بيانات جديدة لم يرها أثناء التدريب.

يقع الكثير من المبتدئين في خطأ فادح وهو تقييم النموذج على نفس البيانات التي تدرب عليها. هذا يعطي نتائج مضللة وتفاؤلية جداً.

# WRONG
# pipe.fit(data, target)

وضع مجموعة للاختبار

يجب تقسيم البيانات إلى قسمين:

مجموعة التدريب (Training Set): تُستخدم لتدريب النموذج.
مجموعة الاختبار (Test Set): تُحجب عن النموذج تماماً، وتُستخدم فقط في النهاية لتقييم أدائه.

from sklearn.model_selection import train_test_split

# تقسيم البيانات: 80% للتدريب و 20% للاختبار
X_train, X_test, y_train, y_test = train_test_split(
    data, target, test_size=0.2, random_state=42
)

print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of X_test: {X_test.shape}")

Shape of X_train: (1168, 5)
Shape of X_test: (292, 5)

والآن: الملاءمة، لاحظ كيف يتغيّر لون الرسم بعد التدريب.

pipe.fit(X_train, y_train)

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('num',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(strategy='median')),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  ['LotArea', 'FullBath',
                                                   'HalfBath']),
                                                 ('cat',
                                                  OneHotEncoder(handle_unknown='ignore'),
                                                  ['Neighborhood',
                                                   'HouseStyle'])])),
                ('regressor', SGDRegressor())])

6. تقييم الأداء على بيانات الاختبار

score = pipe.score(X_test, y_test)
print(f"R² Score on Test Set: {score:.3f}")

R² Score on Test Set: 0.663

ملاحظة: تقييم النموذج (Model Evaluation) يحتاج نقاشاً مستقلاً، لكن نعرضه هنا للتكميل.

طريقة أخرى لإنشاء خط الإنتاج: `make_pipeline`

إضافة: يمكننا بدلاً من ذلك استخدام واجهة الدوال (functional API): make_column_transformer و make_pipeline، ولا تتطلبان ولا تسمحان بتسمية المُقدّرات يدوياً؛ بل تُعيَّن أسماؤها تلقائياً بأحرف صغيرة من أنواعها.

from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

numeric_transformer = make_pipeline(
    SimpleImputer(strategy="median"), StandardScaler()
)
categorical_transformer = OneHotEncoder(handle_unknown="ignore")

preprocessor = make_column_transformer(
    (numeric_transformer, numeric_features),
    (categorical_transformer, categorical_features),
)
pipe = make_pipeline(preprocessor, SGDRegressor())
pipe.fit(X_train, y_train)

المصدر: توثيق scikit-learn الرسمي لتصميم الواجهة البرمجية
للمزيد انظر: Predictive machine learning pipeline with mixed data types.

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators ` for more details.	[('preprocessor', ...), ('regressor', ...)]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing `. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

	transformers transformers: list of tuples List of (name, transformer, columns) tuples specifying the transformer objects to be applied to subsets of the data. name : str Like in Pipeline and FeatureUnion, this allows the transformer and its parameters to be set using ``set_params`` and searched in grid search. transformer : {'drop', 'passthrough'} or estimator Estimator must support :term:`fit` and :term:`transform`. Special-cased strings 'drop' and 'passthrough' are accepted as well, to indicate to drop the columns or to pass them through untransformed, respectively. columns : str, array-like of str, int, array-like of int, array-like of bool, slice or callable Indexes the data on its second axis. Integers are interpreted as positional columns, while strings can reference DataFrame columns by name. A scalar string or int should be used where ``transformer`` expects X to be a 1d array-like (vector), otherwise a 2d array will be passed to the transformer. A callable is passed the input data `X` and can return any of the above. To select multiple columns by name or dtype, you can use :obj:`make_column_selector`.	[('num', ...), ('cat', ...)]
	remainder remainder: {'drop', 'passthrough'} or estimator, default='drop' By default, only the specified columns in `transformers` are transformed and combined in the output, and the non-specified columns are dropped. (default of ``'drop'``). By specifying ``remainder='passthrough'``, all remaining columns that were not specified in `transformers`, but present in the data passed to `fit` will be automatically passed through. This subset of columns is concatenated with the output of the transformers. For dataframes, extra columns not seen during `fit` will be excluded from the output of `transform`. By setting ``remainder`` to be an estimator, the remaining non-specified columns will use the ``remainder`` estimator. The estimator must support :term:`fit` and :term:`transform`. Note that using this feature requires that the DataFrame columns input at :term:`fit` and :term:`transform` have identical order.	'drop'
	sparse_threshold sparse_threshold: float, default=0.3 If the output of the different transformers contains sparse matrices, these will be stacked as a sparse matrix if the overall density is lower than this value. Use ``sparse_threshold=0`` to always return dense. When the transformed output consists of all dense data, the stacked result will be dense, and this keyword will be ignored.	0.3
	n_jobs n_jobs: int, default=None Number of jobs to run in parallel. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary ` for more details.	None
	transformer_weights transformer_weights: dict, default=None Multiplicative weights for features per transformer. The output of the transformer is multiplied by these weights. Keys are transformer names, values the weights.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each transformer will be printed as it is completed.	False
	verbose_feature_names_out verbose_feature_names_out: bool, str or Callable[[str, str], str], default=True - If True, :meth:`ColumnTransformer.get_feature_names_out` will prefix all feature names with the name of the transformer that generated that feature. It is equivalent to setting `verbose_feature_names_out="{transformer_name}__{feature_name}"`. - If False, :meth:`ColumnTransformer.get_feature_names_out` will not prefix any feature names and will error if feature names are not unique. - If ``Callable[[str, str], str]``, :meth:`ColumnTransformer.get_feature_names_out` will rename all the features using the name of the transformer. The first argument of the callable is the transformer name and the second argument is the feature name. The returned string will be the new feature name. - If ``str``, it must be a string ready for formatting. The given string will be formatted using two field names: ``transformer_name`` and ``feature_name``. e.g. ``"{feature_name}__{transformer_name}"``. See :meth:`str.format` method from the standard library for more info. .. versionadded:: 1.0 .. versionchanged:: 1.6 `verbose_feature_names_out` can be a callable or a string to be formatted.	True
	force_int_remainder_cols force_int_remainder_cols: bool, default=False This parameter has no effect. .. note:: If you do not access the list of columns for the remainder columns in the `transformers_` fitted attribute, you do not need to set this parameter. .. versionadded:: 1.5 .. versionchanged:: 1.7 The default value for `force_int_remainder_cols` will change from `True` to `False` in version 1.7. .. deprecated:: 1.7 `force_int_remainder_cols` is deprecated and will be removed in 1.9.	'deprecated'

	missing_values missing_values: int, float, str, np.nan, None or pandas.NA, default=np.nan The placeholder for the missing values. All occurrences of `missing_values` will be imputed. For pandas' dataframes with nullable integer dtypes with missing values, `missing_values` can be set to either `np.nan` or `pd.NA`.	nan
	strategy strategy: str or Callable, default='mean' The imputation strategy. - If "mean", then replace missing values using the mean along each column. Can only be used with numeric data. - If "median", then replace missing values using the median along each column. Can only be used with numeric data. - If "most_frequent", then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned. - If "constant", then replace missing values with fill_value. Can be used with strings or numeric data. - If an instance of Callable, then replace missing values using the scalar statistic returned by running the callable over a dense 1d array containing non-missing values of each column. .. versionadded:: 0.20 strategy="constant" for fixed value imputation. .. versionadded:: 1.5 strategy=callable for custom value imputation.	'median'
	fill_value fill_value: str or numerical value, default=None When strategy == "constant", `fill_value` is used to replace all occurrences of missing_values. For string or object data types, `fill_value` must be a string. If `None`, `fill_value` will be 0 when imputing numerical data and "missing_value" for strings or object data types.	None
	copy copy: bool, default=True If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if `copy=False`: - If `X` is not an array of floating values; - If `X` is encoded as a CSR matrix; - If `add_indicator=True`.	True
	add_indicator add_indicator: bool, default=False If True, a :class:`MissingIndicator` transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time.	False
	keep_empty_features keep_empty_features: bool, default=False If True, features that consist exclusively of missing values when `fit` is called are returned in results when `transform` is called. The imputed value is always `0` except when `strategy="constant"` in which case `fill_value` will be used instead. .. versionadded:: 1.2	False

	categories categories: 'auto' or a list of array-like, default='auto' Categories (unique values) per feature: - 'auto' : Determine categories automatically from the training data. - list : ``categories[i]`` holds the categories expected in the ith column. The passed categories should not mix strings and numeric values within a single feature, and should be sorted in case of numeric values. The used categories can be found in the ``categories_`` attribute. .. versionadded:: 0.20	'auto'
	drop drop: {'first', 'if_binary'} or an array-like of shape (n_features,), default=None Specifies a methodology to use to drop one of the categories per feature. This is useful in situations where perfectly collinear features cause problems, such as when feeding the resulting data into an unregularized linear regression model. However, dropping one category breaks the symmetry of the original representation and can therefore induce a bias in downstream models, for instance for penalized linear classification or regression models. - None : retain all features (the default). - 'first' : drop the first category in each feature. If only one category is present, the feature will be dropped entirely. - 'if_binary' : drop the first category in each feature with two categories. Features with 1 or more than 2 categories are left intact. - array : ``drop[i]`` is the category in feature ``X[:, i]`` that should be dropped. When `max_categories` or `min_frequency` is configured to group infrequent categories, the dropping behavior is handled after the grouping. .. versionadded:: 0.21 The parameter `drop` was added in 0.21. .. versionchanged:: 0.23 The option `drop='if_binary'` was added in 0.23. .. versionchanged:: 1.1 Support for dropping infrequent categories.	None
	sparse_output sparse_output: bool, default=True When ``True``, it returns a :class:`scipy.sparse.csr_matrix`, i.e. a sparse matrix in "Compressed Sparse Row" (CSR) format. .. versionadded:: 1.2 `sparse` was renamed to `sparse_output`	True
	dtype dtype: number type, default=np.float64 Desired dtype of output.	<class 'numpy.float64'>
	handle_unknown handle_unknown: {'error', 'ignore', 'infrequent_if_exist', 'warn'}, default='error' Specifies the way unknown categories are handled during :meth:`transform`. - 'error' : Raise an error if an unknown category is present during transform. - 'ignore' : When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None. - 'infrequent_if_exist' : When an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will map to the infrequent category if it exists. The infrequent category will be mapped to the last position in the encoding. During inverse transform, an unknown category will be mapped to the category denoted `'infrequent'` if it exists. If the `'infrequent'` category does not exist, then :meth:`transform` and :meth:`inverse_transform` will handle an unknown category as with `handle_unknown='ignore'`. Infrequent categories exist based on `min_frequency` and `max_categories`. Read more in the :ref:`User Guide `. - 'warn' : When an unknown category is encountered during transform a warning is issued, and the encoding then proceeds as described for `handle_unknown="infrequent_if_exist"`. .. versionchanged:: 1.1 `'infrequent_if_exist'` was added to automatically handle unknown categories and infrequent categories. .. versionadded:: 1.6 The option `"warn"` was added in 1.6.	'ignore'
	min_frequency min_frequency: int or float, default=None Specifies the minimum frequency below which a category will be considered infrequent. - If `int`, categories with a smaller cardinality will be considered infrequent. - If `float`, categories with a smaller cardinality than `min_frequency * n_samples` will be considered infrequent. .. versionadded:: 1.1 Read more in the :ref:`User Guide `.	None
	max_categories max_categories: int, default=None Specifies an upper limit to the number of output features for each input feature when considering infrequent categories. If there are infrequent categories, `max_categories` includes the category representing the infrequent categories along with the frequent categories. If `None`, there is no limit to the number of output features. .. versionadded:: 1.1 Read more in the :ref:`User Guide `.	None
	feature_name_combiner feature_name_combiner: "concat" or callable, default="concat" Callable with signature `def callable(input_feature, category)` that returns a string. This is used to create feature names to be returned by :meth:`get_feature_names_out`. `"concat"` concatenates encoded feature name and category with `feature + "_" + str(category)`.E.g. feature X with values 1, 6, 7 create feature names `X_1, X_6, X_7`. .. versionadded:: 1.3	'concat'

	loss loss: str, default='squared_error' The loss function to be used. The possible values are 'squared_error', 'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive' The 'squared_error' refers to the ordinary least squares fit. 'huber' modifies 'squared_error' to focus less on getting outliers correct by switching from squared to linear loss past a distance of epsilon. 'epsilon_insensitive' ignores errors less than epsilon and is linear past that; this is the loss function used in SVR. 'squared_epsilon_insensitive' is the same but becomes squared loss past a tolerance of epsilon. More details about the losses formulas can be found in the :ref:`User Guide `.	'squared_error'
	penalty penalty: {'l2', 'l1', 'elasticnet', None}, default='l2' The penalty (aka regularization term) to be used. Defaults to 'l2' which is the standard regularizer for linear SVM models. 'l1' and 'elasticnet' might bring sparsity to the model (feature selection) not achievable with 'l2'. No penalty is added when set to `None`. You can see a visualisation of the penalties in :ref:`sphx_glr_auto_examples_linear_model_plot_sgd_penalties.py`.	'l2'
	alpha alpha: float, default=0.0001 Constant that multiplies the regularization term. The higher the value, the stronger the regularization. Also used to compute the learning rate when `learning_rate` is set to 'optimal'. Values must be in the range `[0.0, inf)`.	0.0001
	l1_ratio l1_ratio: float, default=0.15 The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Only used if `penalty` is 'elasticnet'. Values must be in the range `[0.0, 1.0]` or can be `None` if `penalty` is not `elasticnet`. .. versionchanged:: 1.7 `l1_ratio` can be `None` when `penalty` is not "elasticnet".	0.15
	fit_intercept fit_intercept: bool, default=True Whether the intercept should be estimated or not. If False, the data is assumed to be already centered.	True
	max_iter max_iter: int, default=1000 The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the ``fit`` method, and not the :meth:`partial_fit` method. Values must be in the range `[1, inf)`. .. versionadded:: 0.19	1000
	tol tol: float or None, default=1e-3 The stopping criterion. If it is not None, training will stop when (loss > best_loss - tol) for ``n_iter_no_change`` consecutive epochs. Convergence is checked against the training loss or the validation loss depending on the `early_stopping` parameter. Values must be in the range `[0.0, inf)`. .. versionadded:: 0.19	0.001
	shuffle shuffle: bool, default=True Whether or not the training data should be shuffled after each epoch.	True
	verbose verbose: int, default=0 The verbosity level. Values must be in the range `[0, inf)`.	0
	epsilon epsilon: float, default=0.1 Epsilon in the epsilon-insensitive loss functions; only if `loss` is 'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive'. For 'huber', determines the threshold at which it becomes less important to get the prediction exactly right. For epsilon-insensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold. Values must be in the range `[0.0, inf)`.	0.1
	random_state random_state: int, RandomState instance, default=None Used for shuffling the data, when ``shuffle`` is set to ``True``. Pass an int for reproducible output across multiple function calls. See :term:`Glossary `.	None
	learning_rate learning_rate: str, default='invscaling' The learning rate schedule: - 'constant': `eta = eta0` - 'optimal': `eta = 1.0 / (alpha * (t + t0))` where t0 is chosen by a heuristic proposed by Leon Bottou. - 'invscaling': `eta = eta0 / pow(t, power_t)` - 'adaptive': eta = eta0, as long as the training keeps decreasing. Each time n_iter_no_change consecutive epochs fail to decrease the training loss by tol or fail to increase validation score by tol if early_stopping is True, the current learning rate is divided by 5. - 'pa1': passive-aggressive algorithm 1, see [1]_. Only with `loss='epsilon_insensitive'`. Update is `w += eta y x` with `eta = min(eta0, loss/\|\|x\|\|2)`. - 'pa2': passive-aggressive algorithm 2, see [1]_. Only with `loss='epsilon_insensitive'`. Update is `w += eta y x` with `eta = hinge_loss / (\|\|x\|\|2 + 1/(2 eta0))`. .. versionadded:: 0.20 Added 'adaptive' option. .. versionadded:: 1.8 Added options 'pa1' and 'pa2'	'invscaling'
	eta0 eta0: float, default=0.01 The initial learning rate for the 'constant', 'invscaling' or 'adaptive' schedules. The default value is 0.01. Values must be in the range `(0.0, inf)`. For PA-1 (`learning_rate=pa1`) and PA-II (`pa2`), it specifies the aggressiveness parameter for the passive-agressive algorithm, see [1] where it is called C: - For PA-I it is the maximum step size. - For PA-II it regularizes the step size (the smaller `eta0` the more it regularizes). As a general rule-of-thumb for PA, `eta0` should be small when the data is noisy.	0.01
	power_t power_t: float, default=0.25 The exponent for inverse scaling learning rate. Values must be in the range `[0.0, inf)`. .. deprecated:: 1.8 Negative values for `power_t` are deprecated in version 1.8 and will raise an error in 1.10. Use values in the range [0.0, inf) instead.	0.25
	early_stopping early_stopping: bool, default=False Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a fraction of training data as validation and terminate training when validation score returned by the `score` method is not improving by at least `tol` for `n_iter_no_change` consecutive epochs. See :ref:`sphx_glr_auto_examples_linear_model_plot_sgd_early_stopping.py` for an example of the effects of early stopping. .. versionadded:: 0.20 Added 'early_stopping' option	False
	validation_fraction validation_fraction: float, default=0.1 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if `early_stopping` is True. Values must be in the range `(0.0, 1.0)`. .. versionadded:: 0.20 Added 'validation_fraction' option	0.1
	n_iter_no_change n_iter_no_change: int, default=5 Number of iterations with no improvement to wait before stopping fitting. Convergence is checked against the training loss or the validation loss depending on the `early_stopping` parameter. Integer values must be in the range `[1, max_iter)`. .. versionadded:: 0.20 Added 'n_iter_no_change' option	5
	warm_start warm_start: bool, default=False When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See :term:`the Glossary `. Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled. If a dynamic learning rate is used, the learning rate is adapted depending on the number of samples already seen. Calling ``fit`` resets this counter, while ``partial_fit`` will result in increasing the existing counter.	False
	average average: bool or int, default=False When set to True, computes the averaged SGD weights across all updates and stores the result in the ``coef_`` attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches `average`. So ``average=10`` will begin averaging after seeing 10 samples.	False