Message boards : Number crunching : Extremely long Elapsed and Remaining Times
Message board moderation

To post messages, you must log in.

AuthorMessage
Fritzr

Send message
Joined: 1 Nov 15
Posts: 2
Credit: 3,128,019
RAC: 0
Message 2551 - Posted: 8 Jan 2018, 4:10:48 UTC

I have 7 WUs now with elapsed time >1 day.
https://universeathome.pl/universe/result.php?resultid=31705335
https://universeathome.pl/universe/result.php?resultid=31705337
https://universeathome.pl/universe/result.php?resultid=30975828
https://universeathome.pl/universe/result.php?resultid=31705296
https://universeathome.pl/universe/result.php?resultid=30975696
https://universeathome.pl/universe/result.php?resultid=30975574
https://universeathome.pl/universe/result.php?resultid=31705255

Both elapsed time and remaining time are incrementing. The last task listed is the oldest with 2d:04:11:35 elapsed and 92d:04:31:29 remaining

Win 10 ... latest update
intel 6700 CPU
BOINC 7.8.3 (x64), Widgets 3.0.1
ID: 2551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 0
Message 2552 - Posted: 8 Jan 2018, 11:34:31 UTC - in response to Message 2551.  
Last modified: 8 Jan 2018, 11:36:12 UTC

ID: 2552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 169
Credit: 317,253,046
RAC: 142
Message 2553 - Posted: 8 Jan 2018, 18:35:48 UTC

There are posts going back 2+ years about the issue with BHSpin (the 1st version). Just abort them.
ID: 2553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fritzr

Send message
Joined: 1 Nov 15
Posts: 2
Credit: 3,128,019
RAC: 0
Message 2554 - Posted: 9 Jan 2018, 5:22:16 UTC

I aborted those right after posting the original to bring this problem back to the top

4 more today. Not as bad for me as it is for those who have dedicated machines that are checked on at long intervals as I generally check to see what's processing once or twice a day.
ID: 2554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile pututu
Avatar

Send message
Joined: 7 Jun 16
Posts: 9
Credit: 121,795,337
RAC: 0
Message 2555 - Posted: 9 Jan 2018, 5:55:25 UTC

I think I'm seeing the same problem too.

Here is my observation: I experimented with a 12C/24T xeon system (Win 7) and dedicated to run only Universe project. Sometimes there are one or two WUs that takes forever to finish. When I suspended this project, I should expect the CPU utilization to go down to 1% or less (started off with 100% CPU utilization, load 24 threads) but it shows one or two threads still running after suspending the project via BOINC manager. The windows task manager shows one or two "BHspin2_1_windows_intel86.exe" tasks still running outside of BOINC manager control. There are now a total of 25 or 26 BHspin2 tasks instead of 24 tasks. Perhaps these one or two tasks running outside the control of BOINC manager is robbing the CPU time for the other one or two tasks controlled by the BOINC manager, hence that one or two tasks never complete. I'm not sure if anyone is seeing the same thing. I don't know what causes this.

I'm currently experimenting running 22 out of 24 threads on that machine since yesterday to see if this makes any difference.
ID: 2555 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 169
Credit: 317,253,046
RAC: 142
Message 2557 - Posted: 9 Jan 2018, 18:34:17 UTC - in response to Message 2555.  

I think I'm seeing the same problem too.

Here is my observation: I experimented with a 12C/24T xeon system (Win 7) and dedicated to run only Universe project. Sometimes there are one or two WUs that takes forever to finish. When I suspended this project, I should expect the CPU utilization to go down to 1% or less (started off with 100% CPU utilization, load 24 threads) but it shows one or two threads still running after suspending the project via BOINC manager. The windows task manager shows one or two "BHspin2_1_windows_intel86.exe" tasks still running outside of BOINC manager control. There are now a total of 25 or 26 BHspin2 tasks instead of 24 tasks. Perhaps these one or two tasks running outside the control of BOINC manager is robbing the CPU time for the other one or two tasks controlled by the BOINC manager, hence that one or two tasks never complete. I'm not sure if anyone is seeing the same thing. I don't know what causes this.

I'm currently experimenting running 22 out of 24 threads on that machine since yesterday to see if this makes any difference.


Another issue. Maybe related.
ID: 2557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile pututu
Avatar

Send message
Joined: 7 Jun 16
Posts: 9
Credit: 121,795,337
RAC: 0
Message 2563 - Posted: 10 Jan 2018, 15:19:40 UTC

When running the CPU with one full core free (22 out of 24 in my case), after more than three full days, I've not seen any task with extreme long times. Maybe lucky or the problem has just disappeared. I'll monitor this closely.

Keeping my fingers crossed.

PS. I use to see one or two very long tasks every one or two days.

Here is my rig: https://universeathome.pl/universe/results.php?hostid=41909

Last aborted WU was on Jan 5th.
ID: 2563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile pututu
Avatar

Send message
Joined: 7 Jun 16
Posts: 9
Credit: 121,795,337
RAC: 0
Message 2564 - Posted: 11 Jan 2018, 4:19:19 UTC

Just got one task with very long hours..... :(
ID: 2564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 0
Message 2565 - Posted: 11 Jan 2018, 13:58:17 UTC - in response to Message 2563.  

When running the CPU with one full core free (22 out of 24 in my case), after more than three full days, I've not seen any task with extreme long times. Maybe lucky or the problem has just disappeared. I'll monitor this closely.

That is good information. I sometimes think I see patterns like that, but have not been able to nail it down. But I do know that my Haswell processors (i7-4770 and i7-4790) do much better. I can go for several weeks without a long runner. But my Ivy Bridge (i7-3770) and Ryzen 1700 chips can go for only a few days. All are on Ubuntu 16 or 17.
ID: 2565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Extremely long Elapsed and Remaining Times




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek