Message boards : Number crunching : 0.07 Universe BHspin v2 "Long Runner"
Message board moderation

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 23
Message 3002 - Posted: 11 Aug 2018, 20:12:24 UTC

Two more stuck on 0.07. It appears that nothing has been fixed, in this regard at least.
https://universeathome.pl/universe/result.php?resultid=40628485
https://universeathome.pl/universe/result.php?resultid=40628487
ID: 3002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sesson

Send message
Joined: 24 Jul 18
Posts: 6
Credit: 28,586,427
RAC: 0
Message 3003 - Posted: 12 Aug 2018, 9:16:01 UTC

In one of the two workunits, the other computer finishes the problematic task anyway. I would like to ask a question. Does the problem depend on platform and/or computer, or is solely the result of some Monte-Carlo method, where some random factor directs the program to enter a state that never halts? If that's the case, I would doubt about the scientific value of our results, as they are biased towards the situation where the tasks do halt.
ID: 3003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 23
Message 3004 - Posted: 12 Aug 2018, 11:51:10 UTC - in response to Message 3003.  
Last modified: 12 Aug 2018, 12:26:23 UTC

Those are good questions. Having investigated and posted on this over the years, I have looked into running different number of work units at a time, and could not detect an effect.

As for OS or hardware, that is more likely, but I have been running only Ubuntu, usually on Intel, and don't know much about the long-runners on Windows or AMD. But I see that the first of the ones that I posted this time was completed by a Windows 10 machine, so maybe that helps, though I think that sometimes they are completed OK by other Linux machines. Someone who can see the overall statistics would have to determine that though. We don't get much feedback from the developer, except that they can't find the problem. As for Monte-Carlo, that could explain the randomness, but I have no idea if they use it.

If I ever come up with a cure, I will post it. But it is curious that two work units were stuck this time, which almost suggests an interaction between them. I will investigate a bit further.

PS I have a Ryzen 1700 (Ubuntu 18.04), and will try 4 cores on it for a while.
ID: 3004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 23
Message 3030 - Posted: 4 Sep 2018, 22:34:27 UTC - in response to Message 3004.  

PS I have a Ryzen 1700 (Ubuntu 18.04), and will try 4 cores on it for a while.

FWIW, the Ryzen locked up on all cores, apparently due to BHSpin v2, after about a week. I was away, so could not delete the stuck work units as usual. Not only did the BHSpin lock up, but all the other cores running WCG, Rosetta and even supporting a GTX 1070 on Folding.

When I returned, I found that one BHSpin had errored, and also a Rosetta about an hour later, possibly caused by the BHSpin failure, or maybe just by itself. But a re-boot allowed all the other stuck work units to complete normally.

I normally can go for a month on my Intel machines without any stuck work units, so I think that the Ryzen is even more susceptible.
ID: 3030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : 0.07 Universe BHspin v2 "Long Runner"




Copyright © 2023 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek