LLVM-Mca – LLVM Machine Code Analyzer
For example, you can compile code with clang, output assembly, and pipe it directly into llvm-mca for analysis:
Given an assembly code sequence, llvm-mca estimates the Instructions Per Cycle (IPC), as well as hardware resource pressure. A delta between Dispatch Width and the theoretical maximum uOps per Cycle (computed by dividing the number of uOps of a single iteration by the Block RTrhoughput) is an indicator of a performance bottleneck caused by the lack of hardware resources. This view reports the average number of resource cycles consumed every iteration by instructions for every processor resource unit available on the target.
Source: llvm.org