pypeln - Python data pipeline library

查看原文

Pypeln 是一款基于 Processes, Threads and asyncio.Tasks 进行并发任务操作的库,旨在提供优雅的 API,而需求又不至于到要去用 Spark or Dask 的程度。

from pypeln import process as pr
stage = pr.map(task1, data, workers = 3, maxsize = 4)
stage = pr.filter(task2, stage, workers = 2)
data = list(stage)

或者

from pypeln import thread as th
stage = th.map(task1, data, workers = 3, maxsize = 4)
stage = th.filter(task2, stage, workers = 2)
data = list(stage)

或者

from pypeln import asyncio_task as aio
stage = aio.map(async_task1, data, workers = 3, maxsize = 4)
stage = aio.filter(async_task2, stage, workers = 2)
data = list(stage)

以上各种 worker 可以串起来用:

data = aio.map(f1, data, workers = 100)
data = th.flat_map(f2, data, workers = 10)
data = filter(f3, data)
data = pr.map(f4, data, workers = 5, maxsize = 200)

```