Message boards : Number crunching : Those ever-restarting UV-reionization tasks...
Message board moderation

To post messages, you must log in.

AuthorMessage
Gunnar Hjern

Send message
Joined: 4 Nov 16
Posts: 20
Credit: 118,453,585
RAC: 0
Message 2287 - Posted: 6 Jul 2017, 0:40:28 UTC

Today I saw a UV-task in the task list of one of my computers, and although it had a deadline of several month ahead I was far too curious not to run it at once! :-)

Unfortunately though, it behaved the same way as they did a couple of month before: That is they run for about 14 minutes (although they are listed to run in about one minute) and then the same task starts all over again, and again, and again, and again.....

I have taken ten screen dumps of the properties for that one task as it restarted all over again (see below) and the task had already restarted a several times before that.

What makes these tasks restart instead of reporting themselves to the server???

Could anything be done to the very erratic runtime prognosis, that says they should only take on minute although they in reality take more than 14 minutes to complete?

It would be very interesting to run them, but now I don't even dare to run the normal BH-spin, because I might catch one or two UV-reion.-tasks and then be standing there forever without doing any good at all. :-(

//Gunnar



















ID: 2287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aaron

Send message
Joined: 16 Apr 17
Posts: 36
Credit: 39,603,949
RAC: 0
Message 2288 - Posted: 6 Jul 2017, 1:02:59 UTC - in response to Message 2287.  
Last modified: 6 Jul 2017, 1:05:54 UTC

Until they fix this you could unselect the UV tasks in your Universe@Home preferences. That's what I did.

Also, the server says there are 211 in progress. I hope they're not all in this same status. It's equivalent to a power virus at that point.
ID: 2288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gunnar Hjern

Send message
Joined: 4 Nov 16
Posts: 20
Credit: 118,453,585
RAC: 0
Message 2289 - Posted: 6 Jul 2017, 2:14:27 UTC - in response to Message 2288.  

Hi Aaron!

Thanks for your advice on unselecting different task-types!
I've done so now, and allowed new tasks on the computer that is currently running U@H.

About those 211 tasks that's out there, I hope that the admins can revoke them, so they won't become eternal show-stoppers for hundreds of cpu-cores.

If the admins put out a notice when they fixed the issue with the endless restaring, I'll gladly try it again. That'll have to be sometime when I can supervise the computers and stop the tasks if they misbehave.

//Gunnar
ID: 2289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aaron

Send message
Joined: 16 Apr 17
Posts: 36
Credit: 39,603,949
RAC: 0
Message 2290 - Posted: 6 Jul 2017, 3:32:08 UTC - in response to Message 2289.  

Hello to you as well Gunnar!

I've read other threads on here regarding issues with the BHSpin tasks as well. But, I've crunched over 8,000 of them in the last few months without finding a single one that keeps restarting itself.

Although, I do check in with my machines a few times a week, so it wouldn't be a very big waste compared to others that don't babysit their machines like I do. :-)
ID: 2290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 0
Message 2291 - Posted: 6 Jul 2017, 10:33:01 UTC - in response to Message 2290.  
Last modified: 6 Jul 2017, 11:02:33 UTC

I've read other threads on here regarding issues with the BHSpin tasks as well. But, I've crunched over 8,000 of them in the last few months without finding a single one that keeps restarting itself.

I haven't had an error on BHspin v2 in several months either, though I used to see a long runner about once every two weeks. Maybe updating the Linux version helped? But they were easy to spot and abort if they ran over 24 hours, though if you left your machines entirely unattended they could be a problem, as they could tie up all your cores eventually.

Another possibility is that I run GPU projects also on that machine, currently with a GTX 1070. In the past it was Folding, which I know can interact with VirtualBox projects (either way, depending on the project). However in the past couple of months I have switched to GPUGrid, which does not show any interaction. And it is a dedicated machine that runs only BOINC 24/7. So the simpler your setup, the fewer problems you are likely to have.

(Are your numbers correct? I have a total credit of 6,194,456, almost all BHspin v2 and have crunched or have in progress 219 of them, whereas your total credit is about the same, at 5,573,667. Though my numbers may be a little low. I am not sure the status page is correct.)
ID: 2291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Aaron

Send message
Joined: 16 Apr 17
Posts: 36
Credit: 39,603,949
RAC: 0
Message 2292 - Posted: 6 Jul 2017, 15:46:31 UTC - in response to Message 2291.  

I haven't had an error on BHspin v2 in several months either, though I used to see a long runner about once every two weeks. Maybe updating the Linux version helped? But they were easy to spot and abort if they ran over 24 hours, though if you left your machines entirely unattended they could be a problem, as they could tie up all your cores eventually.


I have seen some tasks take a long time to finish, but I haven't had one yet that didn't finish eventually. I think the longest I've seen a task run is around 50 hours, but that was on a Raspberry Pi 2 core (~900MhZ).

(Are your numbers correct? I have a total credit of 6,194,456, almost all BHspin v2 and have crunched or have in progress 219 of them, whereas your total credit is about the same, at 5,573,667. Though my numbers may be a little low. I am not sure the status page is correct.)


When you view your task list, it only shows recent tasks. After some amount of time, they drop off the list. I know I receive 666.67 credits per task, so 5,600,000/666.67 = 8,400.
ID: 2292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 841
Credit: 144,180,465
RAC: 2
Message 2298 - Posted: 6 Jul 2017, 17:37:39 UTC

If you have self-restarting tasks - please abort them and send me PM with link to host where that happens, please.

Sometimes, on Linux machines some very long tasks. I don't know reason why they are so long (sometimes even 60h) but wingman on Windows do same tasks in "normal" time.

It's something with Linux libraries I think.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 2298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 4 Feb 15
Posts: 48
Credit: 15,956,546
RAC: 60
Message 2321 - Posted: 10 Aug 2017, 8:00:32 UTC

Have just aborted 3 ultaviolet work units, the system aborted 3 before that. They just keep restarting. They are short runners of less that 2 minutes.
I can see from the failed work units that everyone is having the same problem and they have been aborted or failed multiple times.

Conan
ID: 2321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Those ever-restarting UV-reionization tasks...




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek