Way to register custom datapipe functional method to pyi?

Kiuk_Chung · October 6, 2022, 6:23pm

If I implemented a custom DataPipe (iter or map) as:

@functional_datapipe("foo")
class FooPipe(IterDataPipe[str]):
    ...

Is there a way I can add the functional datapipe method foo torchdata.datapipes.iter.IterDataPipe so that auto-complete would work in an IDE?

I see that the functional methods for pipes supported natively by torchdata are added in torchdata.datapipes.iter|map.__init__.pyi and hence I get nice TAB-completion support in an IDE/interpreter.

Additionally for datapipes that are supposed to be used at the head of the chain (e.g. those that don’t take a source pipe). Is there a way to access them functionally (see example below):

@functional_datapipe("s3_list")
class CustomS3ListDatapipe(IterDataPipe[str]):
    def __init__(bucket: str, key: str):
         # no source data pipe required
         ...

# any way I can use CustomS3ListDatapipe as
import torchdata.datapipes.iter as pipe

pipe.s3_list(bucket="foo", key="bar").map(...).load_files_by_s3()

# instead of

CustomS3ListDatapipe(bucket="foo", key="bar").map(...).load_files_by_s3()

nivek · October 7, 2022, 8:46pm

Adding custom DataPipe functional method to .pyi

It is currently not possible because we generate the .pyi ahead of time. If you are building from source, you can potentially add your custom DataPipe to torchdata.datapipes.iter and run python setup.py develop to update the relevant .pyi file . I understand that this is cumbersome and unlikely to be useful for you.

The better approach that we have considered is to update the .pyi file when register_datapipe_as_function is called, but we have not found the time to do that yet. Since that is called by the decorator @functional_datapipe, it will ensure that user defined DataPipes can be added for IDE auto completion as well.

We are happy to accept PR/suggestion related to this issue. See the related open issue here.

Using pipe.s3_list instead of the class name

We likely will not provide that option since @functional_datapipe is meant to be applied onto DataPipe and using the class name may only use a few more characters. Let us know if there is any reason why that may be needed.

Kiuk_Chung · October 10, 2022, 5:40pm

Thanks for pointing to the related issue! As for #2, mostly just wondering whether its possible for consistency sake - for those datapipes that do not accept source pipes as ctor args, if there was a way to register them as functions to torchdata.datapipes.iter (or map) then we can go purely functional. But not a big deal if not supported.