Updating...
Found what I thought was a decent price on a card, factory refurbished by MSi, and stuffed it in my case. After a little computer surgery (big card!). MSi RTX 2060 Ventus GP 12GB OC. This was a somewhat newer release by Nvidia and I guess falls in between the 2060 and 2060S.
Mainly this was needed to drive a 4K monitor (still in the box), but was hoping it would also give a big kick to ST. Again, replaces the machine's stock GTX 745 that had a mild overclock in Afterburner. I have not yet OC'd the 2060, and actually am not sure if I need/want to, or how? Apparently this generation really stepped up the factory overclocking (GPU Boost 4.0?), so putting a monitor on it while running ST showed it going way past even the rated "boost" speeds. I guess it internally senses what it can get away with based on sensors like GPU temp.
Before I changed the card I timed some modules using 1.9.541a. Nothing fancy, just the normal stuff, but I did load up full size 2600MM mono files into L, R, G, B, and Ha for NB Accent, just to try to stress it and make clicking the stopwatch a little easier.
I was actually hoping for more of an improvement and GPU/GPU memory usage.
Though I realize from the earlier discussion a lot of CPU work is also involved, along with back-and-forth, and that (plus the mobo it's on) may be bottlenecking me.
I saw little to no change in performance of OptiDev, Wipe, HDR, and Color. Though perhaps if I changed default settings it might have required some, or some more, GPU computations?
Surprisingly, SVD seemed to have the weakest performance improvement of the modules that supposedly really use GPU acceleration. The initial generation of the Deep Space Mask and first uninterrupted calculation of synthetic deconvolution improved from 1:54.8 to 1:47.1, or 7%. Clicking a couple samples to run spatial variance was a wash, as was deringing. Switching on intra-iteration + centroid improved from 3:23.3 to 3:14.8, 4% better.
Contrast defaults went from 21.5 to 18.0 seconds, improving 16%.
SS-Dimsmall on this data, seemingly the heaviest user of GPU, improved from 7:43.6 (yikes!) to 5:40.7, or 27% better.
Denoise, after hitting the Next button, lowered from 57.8 to 40.0 seconds, better by 31%.
Sharp, both stages from Next through Init and Mask generation to finish, improved the most. 12 to 7 seconds first stage, 22.8 to 12.7 seconds second stage, better by about 43%.
Maybe I'll notice things more at the typical resolutions I process at.

And again it'll allow me to drive a big screen (will see if that chokes things down though). I also noticed none of the overall "mini freezes" that would sometimes happen before when ST has the computer thinking too hard. That's nice, but again the overall shortening of module processing times was only fair, I'd say.
But, it's only a Turing and not top-end, or any kind of Ampere.
Finally, as I noted earlier I believe, I still do not notice significant use of dedicated GPU RAM. These test modules, with 5 pretty large files composed, maybe brushed 1.0GB used at a couple points? The frequent comments that ST needs big "VRAM" was one of the reasons I was happy to get the special 12GB version, but I'm not seeing that it's being used. Any reason why? Again my prior card with only 4GB was never threatened either. Maybe I'll load in a few JWST files and see what happens. Would the onboard GPU memory really only be utilized for, say, giant mosaics?
Anyway, now that I have this thing, ST needs heavier GPU usage.
