Linux 为什么导入 numpy 后多处理只使用一个核心?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15639779/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does multiprocessing use only a single core after I import numpy?
提问by ali_m
I am not sure whether this counts more as an OS issue, but I thought I would ask here in case anyone has some insight from the Python end of things.
我不确定这是否更像是一个操作系统问题,但我想我会在这里问,以防有人对 Python 的事情有一些了解。
I've been trying to parallelise a CPU-heavy for
loop using joblib
, but I find that instead of each worker process being assigned to a different core, I end up with all of them being assigned to the same core and no performance gain.
我一直在尝试使用 并行化一个 CPU 密集型for
循环joblib
,但我发现不是每个工作进程被分配到不同的内核,我最终将它们全部分配到同一个内核并且没有性能提升。
Here's a very trivial example...
这是一个非常简单的例子......
from joblib import Parallel,delayed
import numpy as np
def testfunc(data):
# some very boneheaded CPU work
for nn in xrange(1000):
for ii in data[0,:]:
for jj in data[1,:]:
ii*jj
def run(niter=10):
data = (np.random.randn(2,100) for ii in xrange(niter))
pool = Parallel(n_jobs=-1,verbose=1,pre_dispatch='all')
results = pool(delayed(testfunc)(dd) for dd in data)
if __name__ == '__main__':
run()
...and here's what I see in htop
while this script is running:
...这是我在htop
此脚本运行时看到的内容:
I'm running Ubuntu 12.10 (3.5.0-26) on a laptop with 4 cores. Clearly joblib.Parallel
is spawning separate processes for the different workers, but is there any way that I can make these processes execute on different cores?
我在 4 核的笔记本电脑上运行 Ubuntu 12.10 (3.5.0-26)。显然joblib.Parallel
是为不同的工作人员生成单独的进程,但是有什么方法可以让这些进程在不同的内核上执行?
采纳答案by ali_m
After some more googling I found the answer here.
经过更多的谷歌搜索,我在这里找到了答案。
It turns out that certain Python modules (numpy
, scipy
, tables
, pandas
, skimage
...) mess with core affinity on import. As far as I can tell, this problem seems to be specifically caused by them linking against multithreaded OpenBLAS libraries.
事实证明,某些 Python 模块(numpy
, scipy
, tables
, pandas
, skimage
...)在导入时会混淆核心关联。据我所知,这个问题似乎是由它们链接到多线程 OpenBLAS 库引起的。
A workaround is to reset the task affinity using
一种解决方法是使用重置任务关联
os.system("taskset -p 0xff %d" % os.getpid())
With this line pasted in after the module imports, my example now runs on all cores:
在模块导入后粘贴这一行后,我的示例现在可以在所有内核上运行:
My experience so far has been that this doesn't seem to have any negative effect on numpy
's performance, although this is probably machine- and task-specific .
到目前为止,我的经验是这似乎对性能没有任何负面影响numpy
,尽管这可能是特定于机器和任务的。
Update:
更新:
There are also two ways to disable the CPU affinity-resetting behaviour of OpenBLAS itself. At run-time you can use the environment variable OPENBLAS_MAIN_FREE
(or GOTOBLAS_MAIN_FREE
), for example
还有两种方法可以禁用 OpenBLAS 本身的 CPU 关联性重置行为。在运行时,您可以使用环境变量OPENBLAS_MAIN_FREE
(或GOTOBLAS_MAIN_FREE
),例如
OPENBLAS_MAIN_FREE=1 python myscript.py
Or alternatively, if you're compiling OpenBLAS from source you can permanently disable it at build-time by editing the Makefile.rule
to contain the line
或者,如果您从源代码编译 OpenBLAS,您可以在构建时通过编辑Makefile.rule
包含该行的内容来永久禁用它
NO_AFFINITY=1
回答by NPE
This appears to be a common problem with Python on Ubuntu, and is not specific to joblib
:
这似乎是 Ubuntu 上 Python 的常见问题,并非特定于joblib
:
- Both multiprocessing.map and joblib use only 1 cpu after upgrade from Ubuntu 10.10 to 12.04
- Python multiprocessing utilizes only one core
- multiprocessing.Pool processes locked to a single core
- 从 Ubuntu 10.10 升级到 12.04 后,multiprocessing.map 和 joblib 都只使用 1 个 cpu
- Python 多处理仅使用一个核心
- multiprocessing.Pool 进程锁定到单个核心
I would suggest experimenting with CPU affinity (taskset
).
我建议尝试 CPU 亲和性 ( taskset
)。
回答by WoJ
Python 3 now exposes the methodsto directly set the affinity
Python 3 现在公开了直接设置关联的方法
>>> import os
>>> os.sched_getaffinity(0)
{0, 1, 2, 3}
>>> os.sched_setaffinity(0, {1, 3})
>>> os.sched_getaffinity(0)
{1, 3}
>>> x = {i for i in range(10)}
>>> x
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> os.sched_setaffinity(0, x)
>>> os.sched_getaffinity(0)
{0, 1, 2, 3}