Message boards :
Number crunching :
Not Suspending Properly (Universe BHspin v2)
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
I have a task from your project, and the app version is listed as "Universe BHspin v2 0.01" running on Windows 10 x64. It has a BUG! It is not suspending, when I request it to suspend. This causes my CPU to be overloaded, which then causes problems with UI responsiveness and interaction. It also throws off Benchmarks. Can you please fix your app? I may have to set your project for "No New Work" until you get it resolved. Thanks, Jacob |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Do you have set "Leave application in memory while suspended" in BOINC Manager options? Edit: I just check that on my Windows 7 it is suspend correctly and release CPU cores. Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Yes, I have the "Leave non-GPU tasks in memory while suspended" option checked. I rely on that, to not waste work, when I suspend and resume. I hope you can repro the problem. |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Unfortunately I can't. Also, since 22 July 2016 you are first person who inform me about this behavior... Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Maybe it's related to the specific task? Can you take a look -- it is: universe_bh2_160803_59_3_20000_1-999999_190000_0 http://universeathome.pl/universe/result.php?resultid=19531577 http://universeathome.pl/universe/workunit.php?wuid=8665237 |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
It looks like other user finish it without problems,,, Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
What does another user finishing the task, have anything to do with whether your app responds to suspend requests??? |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Nothing really, but in case if something was wrong with WU I will see this in database or in result files. Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
This task: - did not suspend correctly (when I suspended BOINC or the task) - did not exit correctly (when I exited BOINC) - did not checkpoint anymore (with several hours of runtime wasted, in between BOINC exits) - did not abort correctly (when I aborted it, it continued to run until the heartbeat check failed). I don't know the cause, but something was really messed up with it. I aborted it. |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
It's really something with your computer config. Others not report such problems, also on my computers all this functions works (both: Linux and Windows). Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Thanks. I'll keep an eye out, in case it happens again. Personally, I feel that the task got into a bad state somehow. |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
How long are these tasks supposed to take? Do they complete properly, when resumed from a checkpoint? I ask, because I have another one that is misbehaving. Please see all details below. http://universeathome.pl/universe/result.php?resultid=19571218 "CPU time at last checkpoint" is 02:07:46 (2 hrs) "CPU time" is 14:32:06 (14.5 hrs) Estimated time remaining: 03:19:50 (AND INCREASING) Fraction done: 77.415% I'm not sure if this was resumed from checkpoint or not. That's 12.5 hrs without checkpointing, and without completion. It is still using a full CPU core. Is that expected? I'm on a high-end i7-5960X CPU, using Windows 10 Insider Slow Build 14986. log.txt has several "making checkpoint" entries, but the last one was at: 02:08:41 00:08:21 making checkpoint: j: 15000; iidd: 4305399 checkpoint.dat has: 15000 4305399 0 1 2 error.dat has: error: in Renv_con() unknown Ka type: 1, iidd_old: 4436666error: in Menv_con() unknown Ka type: 1, iidd_old: 4436666 error.dat2 and error.dat3 both have: warning: derivative from dlnRdt() not accurate, error: 1e+030, K: 8, 501481 warning: derivative from dlnRdlnM() not accurate, error: 1e+030, K: 8, 501481 warning: derivative from dlnRdt() not accurate, error: 1e+030, K: 8, 1081777 warning: derivative from dlnRdlnM() not accurate, error: 1e+030, K: 8, 1081777 warning: derivative from dlnRdt() not accurate, error: 1e+030, K: 8, 1748698 warning: derivative from dlnRdlnM() not accurate, error: 1e+030, K: 8, 1748698 warning: derivative from dlnRdt() not accurate, error: 1e+030, K: 8, 4203947 warning: derivative from dlnRdlnM() not accurate, error: 1e+030, K: 8, 4203947 |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Definately you can stop it. Is something wrong with Work Unit (not application). Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Is it possible for you to isolate which tasks/batches are problematic, then cancel them server-side? |
Send message Joined: 5 Nov 15 Posts: 1 Credit: 8,191,756 RAC: 0 |
I also got a task that didn't seem to end... It was "stuck" at 98% progress. http://universeathome.pl/universe/workunit.php?wuid=8416651 http://universeathome.pl/universe/result.php?resultid=18991326 It ran for 37,099.25 cpu sec before I aborted it (it had timed out anyway), three times more than my last successful task. Its progress wasn't saved when exiting Boinc. |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
I think we are tracking these problems in this thread, now: http://universeathome.pl/universe/forum_thread.php?id=199 So far as I know, it is not yet fixed. I have set No New Tasks, until a fix is confirmed, to prevent wasting my resources. |
Send message Joined: 29 May 15 Posts: 2 Credit: 3,523,519 RAC: 0 |
These jobs are still not suspending properly. My cpu runs at 99% even while jobs show suspended. I have to cancel via task manager to reclaim my cpu. And please don't patronize me with settings suggestions. This is a bug that needs fixing. I am stopping all jobs until fixed. |
Send message Joined: 29 May 15 Posts: 2 Credit: 3,523,519 RAC: 0 |
I set my main computer to no new tasks and it still gets tasks and still runs them 100%when nothing should be running. Can one of you please fix your code? |
Send message Joined: 4 Feb 15 Posts: 49 Credit: 15,956,546 RAC: 0 |
I have a few WUs do this as well, even downloading after the project was suspended, then taking forever to run. Had to abort them more than once before they stopped downloading. Happened on both Windows and Linux Conan |
Send message Joined: 2 Jun 16 Posts: 169 Credit: 317,253,046 RAC: 0 |
Quarks is also screwed up. Suspending the project did not stop the tasks from running at 100%. I had to detach the whole project. Please fix. |