parquet_dataset
Parquet datasets, to be used with raw data under the ".parquet" format.
Classes:
| Name | Description | 
|---|---|
BaseParquetDataset | 
            
               Base Parquet Dataset class.  | 
          
ParquetDataset | 
            
               Parquet Dataset class, to be analyzed via the AutoAnalyzer.  | 
          
AnalyzedParquetDataset | 
            
               Analyzed Parquet Dataset class to be created from an existing analyzed schema.  | 
          
FittedParquetDataset | 
            
               Fitted Parquet Dataset class to be created from an existing fitted schema.  | 
          
            BaseParquetDataset
#
    Base Parquet Dataset class.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                               | 
            
                  str
             | 
            
               A split name, for example, "train", only used for the visualization XpViz.  | 
            required | 
                               | 
            
                  str
             | 
            
               A key to group each dataset, only used in the visualization platform XpViz.  | 
            required | 
                               | 
            
                  str
             | 
            
               The directory artifact path.  | 
            required | 
                               | 
            
                  dict[str, Any]
             | 
            
               Optional storage options to stream data from a cloud storage instance.  | 
            required | 
Attributes:
| Name | Type | Description | 
|---|---|---|
split_name | 
            
                  str
             | 
            
               | 
          
identifier_name | 
            
                  str
             | 
            
               | 
          
path | 
            
                  str
             | 
            
               | 
          
storage_options | 
            
                  dict[str, object]
             | 
            
               | 
          
            ParquetDataset
#
    Parquet Dataset class, to be analyzed via the AutoAnalyzer.
Methods:
| Name | Description | 
|---|---|
analyze | 
              
                 Analyze the dataset and create an Analyzed Schema.  | 
            
            analyze(*forced_type: Feature, target_names: list[str] | None = None) -> AnalyzedParquetDataset
#
    Analyze the dataset and create an Analyzed Schema.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                               | 
            
                  Feature
             | 
            
               Features objects to force custom feature type for specific column names in the Arrow Table.  | 
            
                  ()
             | 
          
                               | 
            
                  list[str] | None
             | 
            
               Optional list of column names indicating which columns should be considered targets. Default None.  | 
            
                  None
             | 
          
Returns:
| Type | Description | 
|---|---|
                  AnalyzedParquetDataset
             | 
            
               The analyzed dataset, a parquet dataset with an analyzed schema attached.  | 
          
Source code in src/xpdeep/dataset/parquet_dataset.py
              
            AnalyzedParquetDataset
#
    Analyzed Parquet Dataset class to be created from an existing analyzed schema.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                               | 
            
                  AnalyzedSchema
             | 
            
               | 
            required | 
Methods:
| Name | Description | 
|---|---|
fit | 
              
                 Create a Fitted Parquet Dataset object.  | 
            
Attributes:
| Name | Type | Description | 
|---|---|---|
analyzed_schema | 
            
                  AnalyzedSchema
             | 
            
               | 
          
            analyzed_schema: AnalyzedSchema = field(kw_only=True)
#
    
            fit() -> FittedParquetDataset
#
    Create a Fitted Parquet Dataset object.
Source code in src/xpdeep/dataset/parquet_dataset.py
              
            FittedParquetDataset
#
    Fitted Parquet Dataset class to be created from an existing fitted schema.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                               | 
            
                  FittedSchema
             | 
            
               | 
            required | 
Attributes:
| Name | Type | Description | 
|---|---|---|
fitted_schema | 
            
                  FittedSchema
             | 
            
               | 
          
artifact_id | 
            
                  str
             | 
            
               Get artifact id.  |