Linux 让 GNU Octave 与多核处理器一起工作。(多线程)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11889118/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 14:02:59  来源:igfitidea点击:

Get GNU Octave to work with a multicore processor. (Multithreading)

linuxmultithreadingmulticoreoctave

提问by Eric Leschinski

I want to be able to program multiple threads with gnu octave so it will utilize multiple processors.

我希望能够使用 gnu Octave 编程多个线程,以便它可以使用多个处理器。

I installed GNU Octave on Fedora 17 Linux and did the following:

我在 Fedora 17 Linux 上安装了 GNU Octave 并执行了以下操作:

yum install octave

Which installed on my computer the latest version of octave, 3.6.2. It works great, however when you multiply two huge matrices together it bogs down the one CPU that octave uses. It would be nice if the matrix multiplication utilizes all of the cores, since in this case the CPU is obviously the bottleneck.

其中我的电脑上安装了最新版本的八度音程,3.6.2。它工作得很好,但是当您将两个巨大的矩阵相乘时,它会使八度使用的一个 CPU 陷入困境。如果矩阵乘法利用所有内核,那就太好了,因为在这种情况下,CPU 显然是瓶颈。

Can octave fully utilize multi-core processors and run on multiple threads? Is there a library or compile time flag for this?

Octave 能否充分利用多核处理器并在多线程上运行?是否有用于此的库或编译时标志?

采纳答案by Eric Leschinski

Solution

解决方案

Octave itself is a single-thread application that runs on one core. You can get octave to use some libraries like ATLAS which utilize multiple cores. So while Octave only uses one core, when you encounter a heavy operation, octave calls functions in ATLAS that utilize many CPU's.

Octave 本身是一个运行在一个内核上的单线程应用程序。您可以使用八度音程来使用一些库,例如使用多核的 ATLAS。因此,虽然 Octave 仅使用一个内核,但当您遇到繁重的操作时,octave 会调用 ATLAS 中使用许多 CPU 的函数。

I was able to do this. First compile 'ATLAS' from source code and make it available to your system so that octave can find it and use those library functions. ATLAS tunes itself to your system and number of cores. When you install octave from source and specify ATLAS, it uses it, so when octave does a heavy operation like a huge matrix multiplication, ATLAS decides how many cpu's to use.

我能够做到这一点。首先从源代码编译“ATLAS”并使其可用于您的系统,以便八度可以找到它并使用这些库函数。ATLAS 会根据您的系统和内核数量自行调整。当您从源安装八度并指定 ATLAS 时,它会使用它,因此当八度执行像巨大矩阵乘法这样的繁重运算时,ATLAS 决定使用多少 CPU。

I was unable to get this to work for Fedora, but on Gentoo I could get it to work.

我无法让它在 Fedora 上工作,但在 Gentoo 上我可以让它工作。

I used these two links: ftp://ftp.gnu.org/gnu/octave/

我使用了这两个链接:ftp: //ftp.gnu.org/gnu/octave/

http://math-atlas.sourceforge.net/

http://math-atlas.sourceforge.net/

I ran the following octave core before and after ATLAS install:

我在安装 ATLAS 之前和之后运行了以下八度核心:

tic
bigMatrixA = rand(3000000,80);
bigMatrixB = rand(80,30);
bigMatrixC = bigMatrixA * bigMatrixB;
toc
disp("done");

The matrix multiplication goes much faster using multiple processors, which was 3 times faster than before with single core:

使用多个处理器时,矩阵乘法的速度要快得多,比以前使用单核时快 3 倍:

Without Atlas: Elapsed time is 3.22819 seconds.
With Atlas:    Elapsed time is 0.529 seconds.

The three libraries I am using which speed things up are blas-atlas, cblas-atlas, lapack-atlas.

我正在使用的三个可以加快速度的库是 blas-atlas, cblas-atlas, lapack-atlas

If octave can use these instead of the default blas, and lapack libraries, then it will utilize multi core.

如果八度可以使用这些而不是默认的 blas 和 lapack 库,那么它将使用多核。

It is not easy and takes some programming skill to get octave to compile from source with ATLAS.

使用 ATLAS 从源代码编译 Octave 并不容易,需要一些编程技巧。

Drabacks to using Atlas:

使用 Atlas 的缺点:

This Atlas software uses a lot of overhead to split your octave program into multiple threads. Sure it goes much faster if all you are doing is huge matrix multiplications, but most commands can't be multi-threaded by atlas. If extracting every bit of processing power/speed out of your cores is top priority then you'll have much better luck just writing your program to be run in parallel with itself. (Split your program into 8 equivalent programs that work on 1/8th of the problem and run them all simultaneously, when all are done, reassemble the results).

这个 Atlas 软件使用大量开销将您的八度音程程序拆分为多个线程。如果你所做的只是巨大的矩阵乘法,它肯定会更快,但是大多数命令不能被 atlas 多线程处理。如果从您的内核中提取每一点处理能力/速度是重中之重,那么您将有更好的运气,只需编写您的程序使其与自身并行运行即可。(将您的程序拆分为 8 个等效的程序,这些程序可以解决问题的 1/8,并同时运行它们,全部完成后,重新组合结果)。

Atlas helps a single threaded octave program behave a little bit more like a multi-threaded app but it is no silver bullet. Atlas won't make your single threaded Octave program max out your 2,4,6,8 core processor. You'll notice a performance boost, but the boost will leave you searching for a better way to use all the processor. The answer is writing your program to run in parallel with itself, and this takes a lot of programming skill.

Atlas 帮助单线程八度音程程序表现得更像一个多线程应用程序,但它不是灵丹妙药。Atlas 不会让您的单线程 Octave 程序最大化您的 2、4、6、8 核处理器。您会注意到性能有所提升,但这种提升会让您寻找一种更好的方式来使用所有处理器。答案是编写程序以使其与自身并行运行,这需要大量的编程技巧。

Suggestion

建议

Put your energy into vectorizing your heaviest operations and distributing the process over n simultaneous running threads. If you are waiting too long for a process to run, most likely the lowest hanging fruit to speed it up is using a more efficient algorithm or data structure.

将您的精力用于向量化您最繁重的操作,并将进程分布在 n 个同时运行的线程上。如果您等待进程运行的时间太长,最有可能加速它的最简单的方法是使用更有效的算法或数据结构。

回答by Twonky

On Octave-Forgeare two packages dealing with parallel computing:

倍频锻造两个包处理并行计算:

It is also possible to spawn subprocesses using the fork()function.

也可以使用该fork()函数生成子进程。

回答by fala

As suggested by Eric I tried using ATLASand it improved my performance 3x (in NN learning application, main cost being matrix multiplication). Surprisingly it seemed still to use only one core. After further research I stumbled upon OpenBLASand it started to use multiple cores out of the box and improved the performance further 2 times (I had only 2 cores though). If you want to squeeze out more you can also try using MKL, but it is heavy on the disk space due to dependencies.

I was using Arch Linux with packages community/atlas-lapack-baseand aur/openblas-lapack. Installing each of them switched the default one used in Octave.

Here is a nice benchmark comparing those libraries: http://www.tcm.phy.cam.ac.uk/~mjr/linpack/

正如 Eric 所建议的,我尝试使用ATLAS并将我的性能提高了 3 倍(在 NN 学习应用程序中,主要成本是矩阵乘法)。令人惊讶的是,它似乎仍然只使用一个内核。经过进一步研究,我偶然发现了OpenBLAS,它开始使用开箱即用的多个内核,并将性能进一步提高了 2 倍(尽管我只有 2 个内核)。如果你想挤出更多,你也可以尝试使用MKL,但由于依赖关系,它的磁盘空间很重。

我使用 Arch Linux 和包 community/ atlas-lapack-base和 aur/ openblas-lapack。安装它们中的每一个都切换了 Octave 中使用的默认值。

这是比较这些库的一个很好的基准:http://www.tcm.phy.cam.ac.uk/~mjr/linpack/