Linux 为什么 perf 不报告缓存未命中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14674463/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 18:54:33  来源:igfitidea点击:

Why doesn't perf report cache misses?

linuxcachingoptimizationprofilingperf

提问by static_rtti

According to perf tutorials, perf statis supposed to report cache misses using hardware counters. However, on my system (up-to-date Arch Linux), it doesn't:

根据perf 教程perf stat应该使用硬件计数器报告缓存未命中。但是,在我的系统(最新的 Arch Linux)上,它没有:

[joel@panda goog]$ perf stat ./hash

 Performance counter stats for './hash':

    869.447863 task-clock                #    0.997 CPUs utilized          
            92 context-switches          #    0.106 K/sec                  
             4 cpu-migrations            #    0.005 K/sec                  
         1,041 page-faults               #    0.001 M/sec                  
 2,628,646,296 cycles                    #    3.023 GHz                    
   819,269,992 stalled-cycles-frontend   #   31.17% frontend cycles idle   
   132,355,435 stalled-cycles-backend    #    5.04% backend  cycles idle   
 4,515,152,198 instructions              #    1.72  insns per cycle        
                                         #    0.18  stalled cycles per insn
 1,060,739,808 branches                  # 1220.015 M/sec                  
     2,653,157 branch-misses             #    0.25% of all branches        

   0.871766141 seconds time elapsed

What am I missing? I already searched the man page and the web, but didn't find anything obvious.

我错过了什么?我已经搜索了手册页和网络,但没有找到任何明显的东西。

Edit: my CPU is an Intel i5 2300K, if that matters.

编辑:如果重要的话,我的 CPU 是 Intel i5 2300K。

采纳答案by amdn

On my system, an Intel Xeon X5570 @ 2.93 GHzI was able to get perf statto report cache references and misses by requesting those events explicitly like this

在我的系统上,Intel Xeon X5570 @ 2.93 GHz我能够perf stat通过像这样明确请求这些事件来报告缓存引用和未命中

perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations sleep 5
Performance counter stats for 'sleep 5':

         10573 cache-references                                            
          1949 cache-misses              #   18.434 % of all cache refs    
       1077328 cycles                    #    0.000 GHz                    
        715248 instructions              #    0.66  insns per cycle        
        151188 branches                                                    
           154 faults                                                      
             0 migrations                                                  

   5.002776842 seconds time elapsed

The default set of events did not include cache events, matching your results, I don't know why

默认的事件集不包括缓存事件,匹配你的结果,我不知道为什么

perf stat -B sleep 5

Performance counter stats for 'sleep 5':

      0.344308 task-clock                #    0.000 CPUs utilized          
             1 context-switches          #    0.003 M/sec                  
             0 CPU-migrations            #    0.000 M/sec                  
           154 page-faults               #    0.447 M/sec                  
        977183 cycles                    #    2.838 GHz                    
        586878 stalled-cycles-frontend   #   60.06% frontend cycles idle   
        430497 stalled-cycles-backend    #   44.05% backend  cycles idle   
        720815 instructions              #    0.74  insns per cycle        
                                         #    0.81  stalled cycles per insn
        152217 branches                  #  442.095 M/sec                  
          7646 branch-misses             #    5.02% of all branches        

   5.002763199 seconds time elapsed

回答by SamGamgee

I've spent some minutes trying to understand perf. I found out the cache-misses by first recording and then reporting the data (both perftools).

我花了几分钟试图理解perf. 我通过首先记录然后报告数据(两种perf工具)发现了缓存未命中。

To see a list of events:

要查看事件列表:

perf list

For example, in order to check the last-level-cache load misses, you will need to use the event LLC-loads-misseslike this

例如,为了检查最后一级缓存加载未命中,你将需要使用的情况下LLC-loads-misses这样

perf record -e LLC-loads-misses ./your_program

then report the results

然后报告结果

perf report -v

回答by acgtyrant

In the latest source code, the default event does not include cache-missesand cache-referencesagain:

最新的源代码中,默认事件不包括cache-missescache-references再次:

struct perf_event_attr default_attrs[] = {

  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK      },
  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES    },
  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS      },
  { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS     },

  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES      },
  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND  },
  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS        },
  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES       },

};

So the man and most web are out of date as so far.

所以到目前为止,人和大多数网络都已经过时了。