Linux 为什么导入 numpy 后多处理只使用一个核心？

Question

提问by ali_m

I am not sure whether this counts more as an OS issue, but I thought I would ask here in case anyone has some insight from the Python end of things.

我不确定这是否更像是一个操作系统问题，但我想我会在这里问，以防有人对 Python 的事情有一些了解。

I've been trying to parallelise a CPU-heavy forloop using joblib, but I find that instead of each worker process being assigned to a different core, I end up with all of them being assigned to the same core and no performance gain.

我一直在尝试使用并行化一个 CPU 密集型for循环joblib，但我发现不是每个工作进程被分配到不同的内核，我最终将它们全部分配到同一个内核并且没有性能提升。

Here's a very trivial example...

这是一个非常简单的例子......

from joblib import Parallel,delayed
import numpy as np

def testfunc(data):
    # some very boneheaded CPU work
    for nn in xrange(1000):
        for ii in data[0,:]:
            for jj in data[1,:]:
                ii*jj

def run(niter=10):
    data = (np.random.randn(2,100) for ii in xrange(niter))
    pool = Parallel(n_jobs=-1,verbose=1,pre_dispatch='all')
    results = pool(delayed(testfunc)(dd) for dd in data)

if __name__ == '__main__':
    run()

...and here's what I see in htopwhile this script is running:

...这是我在htop此脚本运行时看到的内容：

htop

I'm running Ubuntu 12.10 (3.5.0-26) on a laptop with 4 cores. Clearly joblib.Parallelis spawning separate processes for the different workers, but is there any way that I can make these processes execute on different cores?

我在 4 核的笔记本电脑上运行 Ubuntu 12.10 (3.5.0-26)。显然joblib.Parallel是为不同的工作人员生成单独的进程，但是有什么方法可以让这些进程在不同的内核上执行？

Answer 1

采纳答案by ali_m

After some more googling I found the answer here.

经过更多的谷歌搜索，我在这里找到了答案。

It turns out that certain Python modules (numpy, scipy, tables, pandas, skimage...) mess with core affinity on import. As far as I can tell, this problem seems to be specifically caused by them linking against multithreaded OpenBLAS libraries.

事实证明，某些 Python 模块（numpy, scipy, tables, pandas, skimage...）在导入时会混淆核心关联。据我所知，这个问题似乎是由它们链接到多线程 OpenBLAS 库引起的。

A workaround is to reset the task affinity using

一种解决方法是使用重置任务关联

os.system("taskset -p 0xff %d" % os.getpid())

With this line pasted in after the module imports, my example now runs on all cores:

在模块导入后粘贴这一行后，我的示例现在可以在所有内核上运行：

htop_workaround

My experience so far has been that this doesn't seem to have any negative effect on numpy's performance, although this is probably machine- and task-specific .

到目前为止，我的经验是这似乎对性能没有任何负面影响numpy，尽管这可能是特定于机器和任务的。

Update:

更新：

There are also two ways to disable the CPU affinity-resetting behaviour of OpenBLAS itself. At run-time you can use the environment variable OPENBLAS_MAIN_FREE(or GOTOBLAS_MAIN_FREE), for example

还有两种方法可以禁用 OpenBLAS 本身的 CPU 关联性重置行为。在运行时，您可以使用环境变量OPENBLAS_MAIN_FREE（或GOTOBLAS_MAIN_FREE），例如

OPENBLAS_MAIN_FREE=1 python myscript.py

Or alternatively, if you're compiling OpenBLAS from source you can permanently disable it at build-time by editing the Makefile.ruleto contain the line

或者，如果您从源代码编译 OpenBLAS，您可以在构建时通过编辑Makefile.rule包含该行的内容来永久禁用它

NO_AFFINITY=1

Answer 2

回答by NPE

This appears to be a common problem with Python on Ubuntu, and is not specific to joblib:

这似乎是 Ubuntu 上 Python 的常见问题，并非特定于joblib：

I would suggest experimenting with CPU affinity (taskset).

我建议尝试 CPU 亲和性 ( taskset)。

Answer 3

回答by WoJ

Python 3 now exposes the methodsto directly set the affinity

Python 3 现在公开了直接设置关联的方法

>>> import os
>>> os.sched_getaffinity(0)
{0, 1, 2, 3}
>>> os.sched_setaffinity(0, {1, 3})
>>> os.sched_getaffinity(0)
{1, 3}
>>> x = {i for i in range(10)}
>>> x
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> os.sched_setaffinity(0, x)
>>> os.sched_getaffinity(0)
{0, 1, 2, 3}

Linux 为什么导入 numpy 后多处理只使用一个核心？

提问by ali_m

采纳答案by ali_m

Update:

更新：

回答by NPE

回答by WoJ

相关推荐

最近更新

标签

Linux 为什么导入 numpy 后多处理只使用一个核心？

提问by ali_m

采纳答案by ali_m

Update:

更新：

回答by NPE

回答by WoJ

相关推荐

C# 欧拉到矩阵和矩阵到欧拉的转换

Linux sed 在特定行号处内嵌替换特定列号值

Linux 从一台服务器监听多个端口

C# 如何使用 MSBuild 引用不同版本的 dll

相关推荐

最近更新

标签