Hi Buggins! It sounds like you are dealing with many of the same issues as I am over on the Digital Theremin thread. So much of it, as you understand, comes down to SNR, so you need as many significant digits of data as possible as cleanly as possible.
If you aren't averse to using a plate antenna, this is one way to multiply the sensitivity several times right at the source.
And I would highly recommend using a software CIC filter to kill mains hum and low pass filter the data.
"Is it ok to use pitch antenna oscillator powered by 3.3V? Any problems with noise expected?" - Buggins
3.3V should be OK if you use high Q to get the voltage swing at the antenna high enough (some tens of volts minimum). The 3.3V oscillator supply should be independently regulated from the digital stuff.
"Can I have some benefits from increasing oscillator voltage to 5/9/12 Volts?"
Possibly. A higher supply voltage could help you process a larger oscillator swing, and so give you somewhat better internal SNR in that stage if all other things are equal. IMO this is diminishing returns, so probably not worth it as higher voltages generally aren't present around digital circuitry.
"My goal is to achieve pitch antenna sensitivity at least 1/100 semitones with hand far from antenna (~80cm)."
This is a reasonable goal, and very similar to my own. If you think about it, it's pretty amazing that physics allows this. One technique for longer distances would be to track the distance in SW and apply more filtering to longer distances. I may try this at some point.
"BTW, does someone know what is max latency value which still allows to play vibrato? I hope 1ms is good enough. I hate how my Theremini reacts on hand movements."
There's latency (transport delay) and there's bandwidth (inertial delay), and they are generally related - particularly in a first-order filter. Though they can be decoupled too - particularly in digital systems that do batch data processing. Assuming you don't have significant throughput times (transport) you're then talking bandwidth. I think 100Hz or even a little lower is probably OK, though I haven't experimented with this much. The Theremini bandwidth is something like 2Hz which is way too low. I used to have a inertial target of 1kHz (1ms) but I now think that's overkill. 10ms of transport delay should be OK, and 10x this inertial delay (100Hz) should probably be OK too. Inertial delay isn't as easily perceived IMO.
If you are using the offset heterodyning approach you might be able to use that to do your linearization. If you set the variable and fixed frequencies correctly you can get a pretty linear response in the exponential domain. I.e, you can feed the number directly to a linear numeric oscillator. The downside to this is it makes further processing more mathematically difficult because it is non-linear: varying pitch sensitivity is a power function (and varying pitch offset is multiplication).
"With signal from heterodine in range 2..10KHz for working hand location range, it should be possible to have about 13 meaningful bits to be used for pitch value."
13 real bits should be enough if optimally arranged, but as you say, it all depends on how those 13 bits are arranged in the pitch space. And the ear can tolerate a fair amount of FM if the bits are dithered by noise in a way that isn't obvious. When the resolution is low I generally prefer dithered data over hard steps.
I suggest you start designing with real (or accurately simulated) hand / antenna capacitance data, and go from there. I use spreadsheets to do much of this sort of thing.
Also, you might want to simulate D latch heterodyning to see if it is sufficient for your purposes. The timing is dictated by the latching oscillator, rather than the digital system clock, so timing resolution isn't the best, and the resulting aliasing noise may be impossible to sufficiently remove via downstream filtering. Multiplicative heterodyning followed by an LPF can give you much better timing resolution, but has its own problems. Multiplying square waves via XOR requires at least a 3rd order filter to get rid of ripple, and this filter restricts the operating range because of the amplitude variation it induces. Multiplying sine waves requires less filtering, but can be difficult to do in hardware. A high speed A/D (or high speed sample and hold with a slow A/D) with sub-Nyquest sampling is another heterodyning possibility.