buf2nmf
buf2nmf(
@buffer ## llll (required)
@bases null
@activations null
@numcomponents 2
@iterations 100
@basesmode 0
@actmode 0
@resynth 1
@winsize 1024
@hopsize -1
@fftsize -1
@maxfftsize -1
@useseed 0
) -> llll
Applies spectral decomposition of a buffer into a number of components using Non-Negative Matrix Factorization.
The algorithm takes a buffer in and divides it into a number of components, determined by the @numcomponents argument.
It works iteratively, by trying to find a combination of spectral templates (i.e., bases) and envelopes (i.e., activations) that yield the original magnitude spectrogram when added together.
By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes).
In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimises the distance between its generated data and the original:
it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
The bases and activations can be used to make a kind of vocoder based on what NMF has 'learned' from the original data. Alternatively, taking the matrix product of a basis and an activation will yield a synthetic magnitude spectrogram of a component (which could be reconstructed, given some phase information from somewhere).
Some additional options and flexibility can be found through combinations of the @bases and @activations arguments. These can be passed buffers containing pre-formed spectra (or envelopes) that will be used as seeds for the decomposition, providing more guided results. When set to 1, the supplied buffers won't be updated, so they become templates to match against instead. Note that having both @basesmode and @actmode set to 1 doesn't make sense, so an error is raised.
In this implementation, the components are reconstructed by masking the original spectrum, such that they will sum to yield the original sound.
The whole process can be related to a channel vocoder where, instead of fixed bandpass filters, we get more complex filter shapes that are learned from the data, and the activations correspond to channel envelopes.
Arguments
@buffer[llll]: Buffer to decompose (required)@bases[llll/null]: Optional decomposition bases (i.e., spectral envelopes), as buffers. (default:null).@activations[llll/null]: Optional decomposition activations (i.e., temporal envelopes), as buffers. (default:null).@numcomponents[int]: Number of elements the NMF algorithm will try to divide the spectrogram of the source in. (default:2).@iterations[int]: Number of iterations in factorization to converge to the smallest error, and adjust final bases and activations. Higher numbers will be more CPU expensive, lower numbers will be more unpredictable in quality. (default:100).@basesmode[int]: Bases mode. Ignored if@basesisnull. (default:0).0: If buffers are provided as@bases, they are considered as initial seed for the activations. The resulting activations will be based on the the seed ones.1: If buffers are provided as@bases, they are considered as a template for the activations, and will be returned unchanged.
@actmode[int]: Activations mode. Ignored if@activationsisnull. (default:0).0: If@activationsis a buffer, it is considered as seed for the activations. Its dimensions should match the values above. The resulting activations will replace the seed ones.1: If@activationsis a buffer, it is considered as a template for the activations, and will not be changed. Its dimensions should match the values above.
@resynth[int]: Resynth mode (default:1).0: Off1: On
@winsize[int]: Window size (default:1024).@hopsize[int]: Hop size.-1is equivalent to@winsize / 2(default:-1).@fftsize[int]: FFT size.-1is equivalent to@winsizesnapped to the nearest equal or greater power of 2. (e.g.@winsize 1024=>@fftsize 1024but@winsize 1000also =>@fftsize 1024) (default:-1).@maxfftsize[int]: Max. FFT size.@maxfftsize -1is equivalent to whatever the initial FFT size is. (default:-1).@useseed[int]: Use random seed for parameter initialization. (default:0).0: Off1: On
Output
NMF in the form of the following llll:
[
[ 'components' <buffers> ] ## NMF components as buffers (only present when @resynth is 1)
[ 'bases' <buffers> ] ## NMF bases as buffers
[ 'activations' <buffers> ] ## NMF activations as buffers
]
'components': a list of buffers, one for each component.'bases': a list of buffers representing the spectral contour of each component in the form of a magnitude spectrogram (called a basis in NMF lingo). Each basis will have(@fftsize / 2) + 1samples in length and@numcomponents * numchannelschannels.'activations': a list of buffers representing the amplitude envelope of each component in the form of gains for each consecutive frame of the underlying spectrogram (called an activation in NMF lingo). Each activation will have(numsamples / hopsize) + 1samples and (components * numchannels) channels. [llll]