Skip to content

upload

Tools to upload a dataset to xpdeep server.

Functions:

Name Description
upload

Upload raw data to be iterated from on the xpdeep server side. Raw data can be local or in the customer cloud.

upload(directory_name: str, *, relative_paths: bool = True, **dataset_paths: str) -> DirectoryArtifact #

Upload raw data to be iterated from on the xpdeep server side. Raw data can be local or in the customer cloud.

Parameters:

Name Type Description Default

directory_name #

str

Name of your artifact directory.

required

relative_paths #

bool

Indicate if dataset_paths should be considered as relative paths (default: True) or as absolute.

True

dataset_paths #

str

Local paths to the data. https://huggingface.co/docs/datasets/filesystems

{}

Returns:

Type Description
DirectoryArtifact

A link representing the uploaded data (size, checksum, etc...).

Raises:

Type Description
FileNotFoundError

If no local files are found.

Source code in src/xpdeep/dataset/upload.py
@initialized_client_verification
@initialized_project_verification
def upload(directory_name: str, *, relative_paths: bool = True, **dataset_paths: str) -> DirectoryArtifact:
    """Upload raw data to be iterated from on the xpdeep server side. Raw data can be local or in the customer cloud.

    Parameters
    ----------
    directory_name : str
        Name of your artifact directory.
    relative_paths : bool
        Indicate if dataset_paths should be considered as relative paths (default: True) or as absolute.
    dataset_paths : str
        Local paths to the data. https://huggingface.co/docs/datasets/filesystems

    Returns
    -------
    DirectoryArtifact
        A link representing the uploaded data (size, checksum, etc...).

    Raises
    ------
    FileNotFoundError
        If no local files are found.
    """
    dataset_paths = {key: Path(value) for key, value in dataset_paths.items()}

    absolute_paths: dict[str, Path] = (
        dataset_paths if not relative_paths else {file: path.resolve() for file, path in dataset_paths.items()}
    )

    for path in absolute_paths.values():
        if not path.exists():
            message = f"file {path} does not exist"
            raise FileNotFoundError(message)

    relative_path = {file: path.parts[-1] for file, path in absolute_paths.items()}

    directory_artifact = DirectoryArtifact.create_directory_artifact(directory_name, relative_path)

    _copy_files_to_remote(directory_artifact, absolute_paths)

    return DirectoryArtifact.from_dict(directory_artifact.to_dict())