Another graphics card question

admin · Post by **admin** » Mon Sep 26, 2022 12:29 am

This is very interesting. I did not really expect LHR cards to be impacted this much and I am a admittedly a little skeptical that LHR would hobble OpenCL performance in this way...

Multi-core performance between a 5600X and 5900X should be definitely be noticeable in general usage.

The GPU version of StarTools still uses the CPU in many parts of algorithms where GPU acceleration is not possible or desirable (e.g. cases where GPU performance would be inferior to using the CPU). There are many such cases, usually involving complex branching and execution paths, or where it is not possible to efficiently parallelize calculations.

Modules in StarTools almost always consist of multiple steps, filters, convolutions, etc. It wholly depends on the constituent steps and algorithms how such a step is executed. For example, a convolution with a custom kernel (lots of brute-force multiplies, easy to parallelize) is much faster on the GPU, whereas a median filter (lots of branching, sorting, harder to parallelize efficiently) is much faster on the CPU.

Decon is indeed the most CPU and GPU demanding module, and you should see both CPU and GPU getting a workout, taking turns during every iteration.

Hope this helps,

riverpoet · Post by **riverpoet** » Tue Sep 27, 2022 12:05 am

Mike in Rancho wrote: ↑Sat Sep 24, 2022 8:37 pm Is that 3070 an older model, perhaps FE? I wasn't aware that you could get these in FHR anymore, but I should look. I don't know if I want to trust eBay though on something that could be a few hundy.

Older 3060, 3070, 3080 and 3080Ti are "FHR" alias full hash rate, and the newer ones have the limiter, LHR. All 3060Ti and 3070Ti are LHR. 3090 and 3090 TI do not have the limiter. You can get plenty of "FHR" cards on eBay these days are Ethereum mining is over, but the vast majority of them have been running 24/7 for years.
I'm looking for either 3090 or AMD 6900XT as my "astro processing card".

Mike in Rancho wrote: ↑Sat Sep 24, 2022 8:37 pm New 1.8 HDR is very heavily CPU dependent, without a lot of GPU offloading. Just the type of calculations required, according to Ivo. There's a big thread here on it where several of us ran timed experiments trying out various module parameters to see their effect on completion times.

Intresting, I'm still using 1.7 as I have the compulsive need to reprocess all older data when I "discover" some better processing technique, and I also miss Reveal DSO Core in 1.8 HDR and that is a very important step in my workflow.

Peter

hixx · Post by **hixx** » Tue Sep 27, 2022 11:41 am

Hi Peter,

I also miss Reveal DSO Core in 1.8 HDR and that is a very important step in my workflow.

As per inofficial manual / Guy's User note, there is a way to achieve the old V1.7 HDR settings in HDR 1.8.... :

Mimicking the Reveal DSO Core preset, as per
reduce Context Size, and only set Shadow Detail Boost, while leaving all other parameters (e.g. Gamma Highlight, Gamma Shadow and Highlights Detail Boost) alone. One difference though compared to 1.7, is that even at max settings, the detail recovered in 1.8 is much more carefully placed in its surroundings, and does not look artificial, while also perfectly conforming to the noise levels of its surroundings (the latter extends also covers the interaction with the Gamma Highlight/Shadow parameter - consistency is maintained).

You'll find more tips in the Inofficial manual (under downloads) or in User Notes (under Forum)
clear skies,
Jochen

admin · Post by **admin** » Wed Sep 28, 2022 7:27 am

hixx wrote: ↑Tue Sep 27, 2022 11:41 am Hi Peter,
I also miss Reveal DSO Core in 1.8 HDR and that is a very important step in my workflow.
As per inofficial manual / Guy's User note, there is a way to achieve the old V1.7 HDR settings in HDR 1.8.... :

Mimicking the Reveal DSO Core preset, as per
reduce Context Size, and only set Shadow Detail Boost, while leaving all other parameters (e.g. Gamma Highlight, Gamma Shadow and Highlights Detail Boost) alone. One difference though compared to 1.7, is that even at max settings, the detail recovered in 1.8 is much more carefully placed in its surroundings, and does not look artificial, while also perfectly conforming to the noise levels of its surroundings (the latter extends also covers the interaction with the Gamma Highlight/Shadow parameter - consistency is maintained).
You'll find more tips in the Inofficial manual (under downloads) or in User Notes (under Forum)
clear skies,
Jochen

I'll bring back a DSO Core preset in 1.9's HDR module.

Mike in Rancho · Post by **Mike in Rancho** » Thu Jan 26, 2023 7:13 pm

Updating...

Found what I thought was a decent price on a card, factory refurbished by MSi, and stuffed it in my case. After a little computer surgery (big card!). MSi RTX 2060 Ventus GP 12GB OC. This was a somewhat newer release by Nvidia and I guess falls in between the 2060 and 2060S.

Mainly this was needed to drive a 4K monitor (still in the box), but was hoping it would also give a big kick to ST. Again, replaces the machine's stock GTX 745 that had a mild overclock in Afterburner. I have not yet OC'd the 2060, and actually am not sure if I need/want to, or how? Apparently this generation really stepped up the factory overclocking (GPU Boost 4.0?), so putting a monitor on it while running ST showed it going way past even the rated "boost" speeds. I guess it internally senses what it can get away with based on sensors like GPU temp.

Before I changed the card I timed some modules using 1.9.541a. Nothing fancy, just the normal stuff, but I did load up full size 2600MM mono files into L, R, G, B, and Ha for NB Accent, just to try to stress it and make clicking the stopwatch a little easier.

I was actually hoping for more of an improvement and GPU/GPU memory usage.

Though I realize from the earlier discussion a lot of CPU work is also involved, along with back-and-forth, and that (plus the mobo it's on) may be bottlenecking me.

I saw little to no change in performance of OptiDev, Wipe, HDR, and Color. Though perhaps if I changed default settings it might have required some, or some more, GPU computations?

Surprisingly, SVD seemed to have the weakest performance improvement of the modules that supposedly really use GPU acceleration. The initial generation of the Deep Space Mask and first uninterrupted calculation of synthetic deconvolution improved from 1:54.8 to 1:47.1, or 7%. Clicking a couple samples to run spatial variance was a wash, as was deringing. Switching on intra-iteration + centroid improved from 3:23.3 to 3:14.8, 4% better.

Contrast defaults went from 21.5 to 18.0 seconds, improving 16%.

SS-Dimsmall on this data, seemingly the heaviest user of GPU, improved from 7:43.6 (yikes!) to 5:40.7, or 27% better.

Denoise, after hitting the Next button, lowered from 57.8 to 40.0 seconds, better by 31%.

Sharp, both stages from Next through Init and Mask generation to finish, improved the most. 12 to 7 seconds first stage, 22.8 to 12.7 seconds second stage, better by about 43%.

Maybe I'll notice things more at the typical resolutions I process at.

And again it'll allow me to drive a big screen (will see if that chokes things down though). I also noticed none of the overall "mini freezes" that would sometimes happen before when ST has the computer thinking too hard. That's nice, but again the overall shortening of module processing times was only fair, I'd say.

But, it's only a Turing and not top-end, or any kind of Ampere.

Finally, as I noted earlier I believe, I still do not notice significant use of dedicated GPU RAM. These test modules, with 5 pretty large files composed, maybe brushed 1.0GB used at a couple points? The frequent comments that ST needs big "VRAM" was one of the reasons I was happy to get the special 12GB version, but I'm not seeing that it's being used. Any reason why? Again my prior card with only 4GB was never threatened either. Maybe I'll load in a few JWST files and see what happens. Would the onboard GPU memory really only be utilized for, say, giant mosaics?

Anyway, now that I have this thing, ST needs heavier GPU usage.

admin · Post by **admin** » Sun Jan 29, 2023 10:09 pm

Mike in Rancho wrote: ↑Thu Jan 26, 2023 7:13 pm Updating...

Found what I thought was a decent price on a card, factory refurbished by MSi, and stuffed it in my case. After a little computer surgery (big card!). MSi RTX 2060 Ventus GP 12GB OC. This was a somewhat newer release by Nvidia and I guess falls in between the 2060 and 2060S.

Mainly this was needed to drive a 4K monitor (still in the box), but was hoping it would also give a big kick to ST. Again, replaces the machine's stock GTX 745 that had a mild overclock in Afterburner. I have not yet OC'd the 2060, and actually am not sure if I need/want to, or how? Apparently this generation really stepped up the factory overclocking (GPU Boost 4.0?), so putting a monitor on it while running ST showed it going way past even the rated "boost" speeds. I guess it internally senses what it can get away with based on sensors like GPU temp.

Before I changed the card I timed some modules using 1.9.541a. Nothing fancy, just the normal stuff, but I did load up full size 2600MM mono files into L, R, G, B, and Ha for NB Accent, just to try to stress it and make clicking the stopwatch a little easier.

I was actually hoping for more of an improvement and GPU/GPU memory usage.

Though I realize from the earlier discussion a lot of CPU work is also involved, along with back-and-forth, and that (plus the mobo it's on) may be bottlenecking me.

I saw little to no change in performance of OptiDev, Wipe, HDR, and Color. Though perhaps if I changed default settings it might have required some, or some more, GPU computations?

Surprisingly, SVD seemed to have the weakest performance improvement of the modules that supposedly really use GPU acceleration. The initial generation of the Deep Space Mask and first uninterrupted calculation of synthetic deconvolution improved from 1:54.8 to 1:47.1, or 7%. Clicking a couple samples to run spatial variance was a wash, as was deringing. Switching on intra-iteration + centroid improved from 3:23.3 to 3:14.8, 4% better.

Contrast defaults went from 21.5 to 18.0 seconds, improving 16%.

SS-Dimsmall on this data, seemingly the heaviest user of GPU, improved from 7:43.6 (yikes!) to 5:40.7, or 27% better.

Denoise, after hitting the Next button, lowered from 57.8 to 40.0 seconds, better by 31%.

Sharp, both stages from Next through Init and Mask generation to finish, improved the most. 12 to 7 seconds first stage, 22.8 to 12.7 seconds second stage, better by about 43%.

Maybe I'll notice things more at the typical resolutions I process at. And again it'll allow me to drive a big screen (will see if that chokes things down though). I also noticed none of the overall "mini freezes" that would sometimes happen before when ST has the computer thinking too hard. That's nice, but again the overall shortening of module processing times was only fair, I'd say.

But, it's only a Turing and not top-end, or any kind of Ampere.

Finally, as I noted earlier I believe, I still do not notice significant use of dedicated GPU RAM. These test modules, with 5 pretty large files composed, maybe brushed 1.0GB used at a couple points? The frequent comments that ST needs big "VRAM" was one of the reasons I was happy to get the special 12GB version, but I'm not seeing that it's being used. Any reason why? Again my prior card with only 4GB was never threatened either. Maybe I'll load in a few JWST files and see what happens. Would the onboard GPU memory really only be utilized for, say, giant mosaics?

Anyway, now that I have this thing, ST needs heavier GPU usage.

Thanks for the reports Mike. Great data points!
Lots of factors affect run times, and as your surmise, it appears your system is now increasingly bottlenecked by CPU and bus/RAM speeds.

The conditions under which you test can show dramatic differences in results.

The jump from a GTX 745 to a RTX 2060 in terms of compute power should be substantial, but if routines spend a lot of time in "CPU land", then total improvements may appear more limited than expected.

The Sharp datapoint is interesting actually. I might have a look at why this appears to be performing so much better than most other modules with you particular config. Thank you!

Official StarTools support forum

Another graphics card question

Re: Another graphics card question

Re: Another graphics card question

Re: Another graphics card question

Re: Another graphics card question

Re: Another graphics card question

Re: Another graphics card question