Why is Cantabile 30x slower than Forte? (TL;DR

The other day I received an email from someone saying that they’d decided to go with Brainspawn Forte instead of Cantabile because Forte’s CPU load was so much lower. He included the following two screenshots that show both products running with 10 instances of Omnisphere with Forte’s CPU load at 1% and Cantabile’s at nearly 35%.

Forte at 1%:

Cantabile at 34.5%:

The short answer: they’re measuring completely different things.

Read on for the longer answer…

Levelling the Playing Field

Before diving into this too deeply the first thing I did was to level the playing field a little:

Cantabile creates audio ports for all audio channels supported by the plugin. For Omnisphere this means 8 stereo output ports on each instance. To bring this back to a single stereo output I removed the 7 additional ports on each instance and this dropped the load by a little — not much.
By default Cantabile’s multi-processor support runs in “Compatible” mode. In this mode instances of the same plugin are processed in serial and not across multiple CPU cores. With 10x instances of the one plugin this is an obvious disadvantage. Switching to “Aggressive” mode also dropped the load.

Even after these changes though the differences were still quite dramatic (roughly 1% vs 20%)

How Cantabile Measures Load

Before continuing it’s important to understand how Cantabile measures load. It’s pretty simple — it’s the amount of time it takes to process one audio cycle expressed as a fraction of the duration of the audio buffer.

Load = TimeTakenToProcess / AudioBufferDuration

With a buffer size of 128 samples at 44,100Hz the duration of the audio buffer in milliseconds is:

AudioBufferDuration = 128 / 44100 * 1000 = 2.9ms

Now suppose it takes 1ms to process one audio cycle (plugin processing, audio mixing, MIDI processing etc…) then the load displayed by Cantabile will be:

Load = 1.0 / 2.9 = 0.345 (or 34.5%)

(It’s only by sheer conindicence this example matches the figure in the above screen shot).

Also, Cantabile is pessimistic with what it shows — it updates every one second and shows the highest load over the previous one second. It could show the minimum or the average but that’d be cheating and would miss spikes.

Time Load vs CPU Utilization

The figure Cantabile shows for load is what I call “Time Load” — it’s completely dependant on the time taken to process the audio cycle and the duration of the audio buffer.

Here’s the key point — this is only very loosely coupled to “CPU Utilization”.

If you’ve ever looked in Windows Task Manager you’ll notice it displays a CPU percentage. This is the CPU Utilization — the percentage of the time the CPU is busy doing real work vs time spent in an idle state waiting for things to do.

Consider this: when a program reads from disk it starts the read operation and then the CPU effectively goes to sleep until the disk has been read and the data returned. In this case time has passed but CPU Utilization has been essentially zero.

There are many situations like this where the processor runs out of things to do and goes idle — so many that most of the time most of the processor cores are idle.

When it comes to realtime audio the important factor here is time passed — not how busy the CPU is. With realtime audio if the software doesn’t deliver the next buffer in time then there’s nothing the sound card to do to fill in the blanks and you’ll get a glitch of some sort.

That’s why Cantabile displays time load and not CPU utilization.

Sure a faster (or less busy) processor can help reduce that time load but that’s just one of many factors determining how close you are to an audio drop out.

Experimenting with Forte

I couldn’t find any publicly documented information on exactly what Forte is measuring so I ran some experiments.

I wrote a proxy plugin to sit between Forte and Omnisphere and do the same timing measurement as Cantabile does. I disabled Forte’s multi-core support to prevent obscuring the numbers and ran up 10 instances of this proxied Omnisphere. The result was a 0.7% load on each instance. Given 10 instances, the time load is at least 7%. Forte was showing between 0 and 1%.
I wrote a plugin that deliberately sleeps for 2ms on each audio cycle. With a buffer length of 1.5ms (64 samples at 44,100Hz) a 2ms stall should produce a time load in excess of 100% even with just one instance. Forte with 10 instances showed 4% load. Cantabile showed well in excess of 1000% as expected. Also as expected both Forte and Cantabile were glitching heavily.
I fired up Forte with no plugins and started a CPU intensive operation in another program and Forte’s load meter went to 100%. Cantabile’s load meter went up but only by the amount that the other operation was impacting Cantabile’s audio processing. (In this case it went from 20% to about 40%).

That last point is the give away — Forte is most likely showing system-wide CPU utilization.

What Does All This Mean?

OK, so what does all this mean? Not much really except that you can’t compare apples to oranges.

Cantabile and Forte are displaying completely different measurements and this is even reflected in the labelling — Forte labels it “CPU” while Cantabile labels it “Load” (in the monitor panel).

None of this is intended to dis Forte — I think it’s a fine product. In fact if someone at Brainspawn wants to write a clarification or response to any of the above I’ll happily post it here. The whole point is just to explain why two such similarly perceived numbers might be so radically different.

UPDATE: Response and confirmation from Brainspawn:

Why is Cantabile 30x slower than Forte? (TL;DR — it’s not)

Brad