[Stk] problems implementing polyphony...

Morgan Packard morgan at morganpackard.com
Wed Sep 19 07:35:16 PDT 2012


I'd love to see a performance-oriented, pure C++ synthesis library come to
prominence. STK gets partway there, with the inclusion of the vectorized
tick method, but in order to be performant and highly modular, parameter
setting needs to be vectorized as well. I've taken steps toward this goal
with my EZPlug wrapper for STK. In my own code, I use a sine oscillator
which can take an stk::generator as a frequency input.

https://github.com/morganpackard/EZPlug/blob/master/EZPlug/EZPlugGenerators/SineWaveMod.h

This way, I can work at a bit of a higher level, more "patching style", but
without having the added layer of complexity and obfuscation that would
come with using PD. Creating an efficient FM synth becomes a matter of just
patching together the right sine waves and helper generators (Multiplier,
Adder, FixedValue).

-Morgan

On Tue, Sep 18, 2012 at 6:37 PM, Stephen Sinclair
<sinclair at music.mcgill.ca>wrote:

> This is difficult.  I have been playing with the code and gcc options
> and gprof, and it seems there is no specific bottleneck.  HevyMetl is
> about 15% the speed of Clarinet.  I managed to get it down to about
> the same speed, but I had to do several things:
>
> - moved several functions into their headers for inlining
> (FileLoop::setRate, FileWvIn::tick, etc.. anything that is referenced
> from HevyMetl::tick.)
>
> - set some gcc options to force inlining of as much as possible,
> e.g. -Winline --param inline-unit-growth=65536 -finline-limit=65536
>
> - used link-time optimisation available in gcc 3.6 and up  (-flto on all
> code)
>
> - set -ffast-math
>
> Even then it is not quite as fast.  I found fmod used in FileLoop was
> a bit of a bottleneck.
>
> In general I find it pretty surprising that gcc doesn't succeed in
> speeding this up more, but FileLoop seems to be a bit of a problem for
> reasons that aren't clear to me. I sprinkled the code with checks for
> denormals and came up empty.  I checked the assembler and used gprof
> and -Winline to make sure inlining was working as expected.
>
> Oh, I should mention this was on my fast desktop computer, not an ARM
> tablet, so proper profiling on the target hardware may be warranted.
>
> In any case, Morgan's right in that the vectorised versions are
> probably better to use "in production," since the whole "inline" thing
> in C/C++ is not supposed to be fully relied on for efficiency.  (e.g.
> the compiler might choose not to inline due to code size rather than
> speed.)  The per-sample tick functions are however important for
> certain algorithms, and generally useful as a teaching tool, so their
> presence in STK is desirable.  That said, usually a vectorised
> approach is preferred in application code.
>
> Just a couple of notes...
>
>
> On Tue, Sep 18, 2012 at 11:41 AM, Morgan Packard
> <morgan at morganpackard.com> wrote:
> > I did some of my own experimentation, which seems to point to method
> calls
> > themselves (even with all of the calculation inside them commented out)
> > being responsible for much of the cpu use of HevyMetl.
> >
> > I've been using STK all along with the assumption that all calls to
> tick()
> > without frames, or any other per-sample function call was going to be
> > significantly less efficient than operating on buffers. I'm aware of the
> > existence of inlining, but not savvy enough to understand if it's
> happening
> > or not, and under what conditions it can happen. It seems suspicious to
> me
> > to think that inlining could happen on pointers. I mean, if you have a
> > pointer to an stk::Generator, and you call tick() on it, I don't see how
> the
> > compiler could know ahead of time which subclass of Generator it should
> be
> > inlining.
>
> That's true, but most STK code uses the final child class, not the
> superclass, so the compiler should have all the information it needs.
>
> > I'd love to find out that all my meticulous buffer passing in order to
> get
> > reasonably performant code is unnecessary, but until I understand
> otherwise,
> > or better, I'm working with the assumption that method calls are
> expensive
> > and best to minimize. Another thing I like about calculating samples in
> > batches/buffers/stkframes is it allows me to use Apple's accelerate
> > framework, which offers some very nice performance boosts.
> >
> > However, I certainly trust Gary's assertion that this HevyMetl ran just
> fine
> > on 90's machines, and I'm very curious about what has changed. Has the
> code
> > itself changed, breaking inlining? Is there something about method calls
> on
> > the Apple hardware that makes them much more expensive than on Gary's
> 90's
> > hardware?
>
> I think one experiment would be to compare run-times of
> HevyMetl::tick() for previous versions of STK.  My meticulous building
> of a git archive of the previous tarballs might finally pay off!
>
> https://github.com/radarsat1/stk/commits/upstream
>
> If a really old version does turn out to be faster, I was thinking
> maybe a well-crafted "git bisect" command might help get to the bottom
> of this.
>
> Steve
>
> _______________________________________________
> Stk mailing list
> Stk at ccrma.stanford.edu
> http://ccrma-mail.stanford.edu/mailman/listinfo/stk
>



-- 
===============
Morgan Packard
cell: (720) 891-0122
aim: mpackardatwork
twitter: @morganpackard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ccrma-mail.stanford.edu/pipermail/stk/attachments/20120919/ddda17eb/attachment.html 


More information about the Stk mailing list