The vepro buffer size per instance only applies while the track is live. So there is not really any benefit to separating them out to seperate instances for that reason.
Also, there is no truth at all that macs like more instances and PC's like less. In the past some people had trouble getting LogicPro to work with a single instance, but there is absolutely no truth that many instances performs better in terms of CPU...and these days with AU3 you can use a single instance on LogicPro much easier then before. But there are still some other workflow considerations that have nothing to do with CPU, for deciding to use many instances, one instance per instrument track...in LogicPro specifically. Pros and Cons either way.
Regarding performance....
The way VePro is designed, each instance is a seperate server under the covers..which means each instance consumes some overhead and resources. That alone means single is better then many. But more importantly is the thread allocation. VePro has a configuration for how many threads PER INSTANCE. If you use one and only one instance, then its pretty easy to calculate how many threads to allocate for your system, based on the number of cores...and VePro will then make every effort within that instance to allocate work to as many threads as your cores will support optimally. Simple guess, you have 12 cores, so so set threads to 20, that leaves a few threads around for your DAW and OSX and 2 threads per core is a good guess. But its not always that simple, but still....generally in this case you're allocating as mnay threads to VePro as you can, and VePro will manage which tasks go to which threads and optimize the core usage very well.
Conversely, let's say you take the classic LogicPro approach of one instance per instrument track. That means you have potentially maybe 30 or 50 or 100 VePro instances. In that case you would set hte thread allocation PER INSTANCE to 1 most likely, maybe 2. You would still get good core utilization in this case because at any one instance you're likely that at least as many tracks as you have cores are playing at once...thus making full use of all cores... This approach might impose a little bit more overhead because each instance is essentially its own server, which requires some overhead, but its a small amount. The bigger concern in my mind is the workflow concerns, in this mode you don't have one mixer, one MIR stage, etc. So I feel this gives up a huge part of the VePro advantage, particularly if you aren't using a VePro slave.
However what if you are in between those two scenarios. Not a single instance and not 100. What if you decide to use, let's say 3 instances . Then how many threads should you configure VePro to use for each instance? Should you configure them to 1/3 as many threads as if you were using a single instance? One might think so, but that only results in optimal multi-threading performance if and when all three instances are being used equally at all times..thus spreading the load across all cores. If you have moments during playback when 1 or 2 of the 3 instances happens to be kind of idle, then during those moments, the one remaining, and busy instance will have been under-configured and probably under-utilizing cores during that time with not enough threads active.
So IMHO, the current multi-thread scheme of VePro works very well for a single instance, very well for many instances, but is not ideal for a few multiple instances, because of the way that thread preference is set PER INSTANCE.
All that being said, I think you should make your choice based on workflow rather then CPU because I think most of the time it probably doesn't make that much difference on the CPU.
Getting back to the comment about the buffer size. That setting is only on live tracks. Non-live tracks are automatically bumped up to a very large buffer size whether you like it or not. But that's a good thing. So the end result is that whatever particular track you are recording, you can set the buffer wize to none or x1 or x2 or whatever you think you'll need for the part you are recording while you're recording it. All the other tracks playing back will be using a much much larger buffer anyway. There is nothing gained by seperating different instruments to different instances.