parquet_dataset
Parquet datasets, to be used with raw data under the ".parquet" format.
Classes:
Name | Description |
---|---|
BaseParquetDataset |
Base Parquet Dataset class. |
ParquetDataset |
Parquet Dataset class, to be analyzed via the AutoAnalyzer. |
AnalyzedParquetDataset |
Analyzed Parquet Dataset class to be created from an existing analyzed schema. |
FittedParquetDataset |
Fitted Parquet Dataset class to be created from an existing fitted schema. |
BaseParquetDataset
#
Base Parquet Dataset class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
str
|
A split name, for example, "train", only used for the visualization XpViz. |
required |
|
str
|
A key to group each dataset, only used in the visualization platform XpViz. |
required |
|
str
|
The directory artifact path. |
required |
|
dict[str, Any]
|
Optional storage options to stream data from a cloud storage instance. |
required |
Attributes:
Name | Type | Description |
---|---|---|
split_name |
str
|
|
identifier_name |
str
|
|
path |
str
|
|
storage_options |
dict[str, object]
|
|
ParquetDataset
#
Parquet Dataset class, to be analyzed via the AutoAnalyzer.
Methods:
Name | Description |
---|---|
analyze |
Analyze the dataset and create an Analyzed Schema. |
analyze(*forced_type: Feature, target_names: list[str] | None = None) -> AnalyzedParquetDataset
#
Analyze the dataset and create an Analyzed Schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Feature
|
Features objects to force custom feature type for specific column names in the Arrow Table. |
()
|
|
list[str] | None
|
Optional list of column names indicating which columns should be considered targets. Default None. |
None
|
Returns:
Type | Description |
---|---|
AnalyzedParquetDataset
|
The analyzed dataset, a parquet dataset with an analyzed schema attached. |
Source code in src/xpdeep/dataset/parquet_dataset.py
AnalyzedParquetDataset
#
Analyzed Parquet Dataset class to be created from an existing analyzed schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
AnalyzedSchema
|
|
required |
Methods:
Name | Description |
---|---|
fit |
Create a Fitted Parquet Dataset object. |
Attributes:
Name | Type | Description |
---|---|---|
analyzed_schema |
AnalyzedSchema
|
|
analyzed_schema: AnalyzedSchema = field(kw_only=True)
#
fit() -> FittedParquetDataset
#
Create a Fitted Parquet Dataset object.
Source code in src/xpdeep/dataset/parquet_dataset.py
FittedParquetDataset
#
Fitted Parquet Dataset class to be created from an existing fitted schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
FittedSchema
|
|
required |
Attributes:
Name | Type | Description |
---|---|---|
fitted_schema |
FittedSchema
|
|
artifact_id |
str
|
Get artifact id. |