Message boards : Number crunching : Computer not receiving Work units
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
BetelgeuseFive

Send message
Joined: 8 Feb 16
Posts: 19
Credit: 22,093,689
RAC: 0
Message 1475 - Posted: 31 Aug 2016, 15:30:48 UTC

I recently switched both my Raspberry Pi's (ARM/Linux) to Ultraviolet reionization.
One of them only received a couple of tasks so may not yet qualify as reliable.

But this host http://universeathome.pl/universe/show_host_detail.php?hostid=30574 has 47 consecutive valid tasks, so why is it not receiving new tasks ?

Also, please check this unit (for the same host): http://universeathome.pl/universe/workunit.php?wuid=6629026. Why did it take 9 days for the second task for this unit to be sent ?

Tom
ID: 1475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ryan

Send message
Joined: 31 Jul 16
Posts: 4
Credit: 321,900
RAC: 0
Message 1476 - Posted: 31 Aug 2016, 16:31:13 UTC
Last modified: 31 Aug 2016, 16:37:24 UTC

My question is this,if my computer(s)have been labled BAD computers and then receive little or no work,and is bypassed by the GOOD computers,will it not take an extremely long time to get the 10 valid WUs needed to transition to a GOOD computer?? What with the month or more wait for a WU to be resent to a wingman,and the large amount of CANCELLED BY SERVER WUs this project has??Which seem to count against you?? Just sayin... Thanks
ID: 1476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Hawker*

Send message
Joined: 10 Mar 15
Posts: 11
Credit: 309,361
RAC: 0
Message 1477 - Posted: 31 Aug 2016, 22:48:07 UTC - in response to Message 1470.  
Last modified: 31 Aug 2016, 22:48:52 UTC

Definately there is HR problem...
There is another problem as well, about half of tasks is created for "reliable hosts" which means that only hosts with min. 10 consecutive correct tasks done gets new jobs.


Well, that's not working either. I created a Linux VM to run UV tasks. A mere 2 errors from 405 more than meets your standard of reliability. But of course, no tasks are available despite the server status.

Quarkstar WUs are even harder to get, even though 57 are "available".

With the failing Win tasks, the rapid retirement/turnover of apps, retiring the Android app without warning, lack of OSX support and now this debacle, this project is rapidly becoming my least favorite.
ID: 1477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 1478 - Posted: 1 Sep 2016, 8:51:52 UTC - in response to Message 1477.  

Definately there is HR problem...
There is another problem as well, about half of tasks is created for "reliable hosts" which means that only hosts with min. 10 consecutive correct tasks done gets new jobs.


Well, that's not working either. I created a Linux VM to run UV tasks. A mere 2 errors from 405 more than meets your standard of reliability. But of course, no tasks are available despite the server status.

Quarkstar WUs are even harder to get, even though 57 are "available".

With the failing Win tasks, the rapid retirement/turnover of apps, retiring the Android app without warning, lack of OSX support and now this debacle, this project is rapidly becoming my least favorite.

Firstly, Android app was separate before because it computes smaller number of simulations then other systems in the past, now is available with BHspin2 application.
QuarkStars have no available new tasks because previous batch isn't finished and we don't want generate tasks without real science target and waste your power. When all results comes back and we finish analyse it, new tasks will be generated.

As you can see with UV application - connection of HR, reliability hosts and priorities doesn't work as expected (probably nobody before join all this requirements together) and we looking all the time to find proper balance between them.

Also, Mac support will be added soon, but I can't do everything in same time and also adding next platform to HR tasks will (probably) increase current problem. A t least I have to be sure how to resolve current problem before I add next...

Apologise if not everything working smooth but I'm trying my best in this situation...
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 1478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 16 Apr 16
Posts: 15
Credit: 4,409,800
RAC: 0
Message 1479 - Posted: 1 Sep 2016, 9:47:17 UTC - in response to Message 1478.  

Perhaps you should stop generation of new work for each of the applications until the ready to send queue for a application is completely empty. Hopefully this will flush out all the unsent results that seem to be stuck in there or at least show how big this problem is.

Once the queue is empty create a small batch of new work and then wait untill the queue is empty again.
This will help you in managing what is in progess and prevent users from having a growing number of waiting for validation results.

This is not a permanent solution but it will give you time to fix the settings problem
ID: 1479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 1481 - Posted: 1 Sep 2016, 13:31:53 UTC - in response to Message 1479.  

The new batches are helping computers to get reliable status (is no special conditions for it except HR), so I need to generate them anyway.
Also I see on server side that number of computers without tasks are going lover every day.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 1481 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 16 Apr 16
Posts: 15
Credit: 4,409,800
RAC: 0
Message 1482 - Posted: 1 Sep 2016, 14:29:05 UTC - in response to Message 1481.  

The new batches are helping computers to get reliable status (is no special conditions for it except HR), so I need to generate them anyway.
Also I see on server side that number of computers without tasks are going lover every day.

You are missing the point. It looks to me that there are a lot of older unsent results. At the same time new work is generated and send out.
If you let the queue run dry then all those old results will flush from the queue or show that there is a problem with sending them.
You are avoiding fixing the problem that some resuls do not get send out and stay in the queue forever.
It is beter to have some users without work for a short while then increasing the problems indefinitly.
ID: 1482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ryan

Send message
Joined: 31 Jul 16
Posts: 4
Credit: 321,900
RAC: 0
Message 1483 - Posted: 1 Sep 2016, 15:45:34 UTC - in response to Message 1478.  
Last modified: 1 Sep 2016, 15:47:06 UTC

Just want to say THANKS for all the work you do for the project.krzyszp
ryan
ID: 1483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 1484 - Posted: 1 Sep 2016, 16:00:20 UTC - in response to Message 1482.  


You are missing the point. It looks to me that there are a lot of older unsent results. At the same time new work is generated and send out.
If you let the queue run dry then all those old results will flush from the queue or show that there is a problem with sending them.
You are avoiding fixing the problem that some resuls do not get send out and stay in the queue forever.
It is beter to have some users without work for a short while then increasing the problems indefinitly.

No, I didn't miss the point. I'm constantly checking database and filter problematic WU to see if there is clear pattern, I don't need (at the moment) to dry server to find them, SQL queries doing it for me :)
Obviously is still possibility that it will be necessary but I believe it is not a time to do it now.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 1484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 1485 - Posted: 1 Sep 2016, 16:04:23 UTC - in response to Message 1483.  

Just want to say THANKS for all the work you do for the project.krzyszp
ryan

Thank you :)
I'm really enjoy doing something for real science project :)
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 1485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 3 Apr 15
Posts: 2
Credit: 27,565,438
RAC: 0
Message 1487 - Posted: 1 Sep 2016, 20:16:27 UTC

Well, I don't want to add to the discontent about not getting WUs (but I'm going to :) ). I can't seem to get any UV WUs. I get plenty of spin2. At one time I did get both UV and Quark, but nothing for a month. Did I get marked as unreliable somehow?
ID: 1487 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 1488 - Posted: 1 Sep 2016, 20:33:40 UTC - in response to Message 1487.  

Your computer is marked as not reliable OR is now proper wingman available for it OR all tasks for particular app are already designed for other platforms (like e.g. QuarkStars).
And I see - you have BHspin2 on your machines.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 1488 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 4 Feb 15
Posts: 49
Credit: 15,956,546
RAC: 0
Message 1494 - Posted: 2 Sep 2016, 0:30:00 UTC
Last modified: 2 Sep 2016, 0:45:38 UTC

Thanks Krzyszp for all the behind the scenes work that you are doing.

I noticed today that when I requested UV tasks (server shows over 79,000 at the moment) I get the message that there is no work available.
I doubt that 79,000 work units are pre-assigned to other platforms (I am using Linux) and have gotten work before (last lot a few days ago).
I have stopped BHSpinv2 at the moment as I wanted to build up my hour total on the UV tasks, but now I can't get them.

The 9 work units I received on the 30th are all still in Pending as I am the only person to have run these work units. No work has been sent any other volunteers (Wingmen) as yet.

All the errors that I have (total 15 for UV and BHSpin out of hundreds run) are due to the Server cancelling work units (even 1 that had started running), not due to any error on my host.

I will wait (nothing else to do really) and run some Primegrid in the mean time.

Thanks again
Conan
ID: 1494 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 169
Credit: 317,253,046
RAC: 0
Message 1501 - Posted: 2 Sep 2016, 19:00:35 UTC - in response to Message 1472.  

For analysing purpose we need whole batch of results in particular series because even if we have 95% of the serie computed we still need to wait for last 5% to start analysis...
In some WU's is happened that particular WU isn't computed by first host in time, then it going to second one where happened same and sometimes for few WU's we need to wait months... This is a reason why we had starting to use reliable hosts for some of batches (in fact, every 4th batch now base on reliable hosts) including all UV tasks (as they are short).


Is this really working? Or is the validation task not part of this. Many times tasks wait several days after my task has been sent back in just to get the wingman task sent out. If science needs to be validated, waiting so long to send out work slows down progress. I'd imaging also adds load to the servers with more work generations out there. I've got 33 tasks still that I've completed over 2 weeks ago. The wingman timeout several days ago but it wasn't out out immediately. I've seen other projects send out a duplicate wingman task within an hour of timeout, some several hours prior.
ID: 1501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Papy3349

Send message
Joined: 28 Feb 15
Posts: 4
Credit: 25,011,420
RAC: 0
Message 1502 - Posted: 3 Sep 2016, 8:41:42 UTC

This morning (France)
State: All (1214) • In progress (128) • Validation pending (250) • Validation inconclusive (0) • Valid (781) • Invalid (0) • Error (55)
Application: All (1214) • Universe Ultraviolet reionization (273) • Universe BHspin (3) • Universe BHspin v2 (938) • Universe QuarkStars (0)
With :
Universe BHspin v2 (938) In progress (128) • Validation pending (172) • Validation inconclusive (0) • Valid (584) • Invalid (0) • Error (53)
Universe Ultraviolet reionization (273) In progress (0) • Validation pending (79) • Validation inconclusive (0) • Valid (192) • Invalid (0) • Error (2)

Oups!!! 10 minutes later : valid 781 becomes 779...

State: All (1213) • In progress (128) • Validation pending (251) • Validation inconclusive (0) • Valid (779) • Invalid (0) • Error (55)
Application: All (1213) • Universe Ultraviolet reionization (273) • Universe BHspin (3) • Universe BHspin v2 (937) • Universe QuarkStars (0)
State: All (273) • In progress (0) • Validation pending (79) • Validation inconclusive (0) • Valid (192) • Invalid (0) • Error (2)
Application: All (1213) • Universe Ultraviolet reionization (273) • Universe BHspin (3) • Universe BHspin v2 (937) • Universe QuarkStars (0)
State: All (937) • In progress (128) • Validation pending (172) • Validation inconclusive (0) • Valid (584) • Invalid (0) • Error (53)
Application: All (1213) • Universe Ultraviolet reionization (273) • Universe BHspin (3) • Universe BHspin v2 (937) • Universe QuarkStars (0)
ID: 1502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BetelgeuseFive

Send message
Joined: 8 Feb 16
Posts: 19
Credit: 22,093,689
RAC: 0
Message 1503 - Posted: 3 Sep 2016, 13:08:23 UTC

Received lots of new tasks on both my Raspberry Pi's about an hour ago. If you made changes server side: it worked !

Thanks,

Tom
ID: 1503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 1504 - Posted: 3 Sep 2016, 14:46:18 UTC - in response to Message 1503.  

Received lots of new tasks on both my Raspberry Pi's about an hour ago. If you made changes server side: it worked !

Thanks,

Tom

Yes, manually reversed priority for UV tasks helps.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 1504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 0
Message 1505 - Posted: 3 Sep 2016, 15:17:17 UTC - in response to Message 1504.  

On my Ubuntu 16.4 machine (i7-4770), I am now getting both the BHpin v2 and the UV reionization tasks, so it is working here too.
ID: 1505 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 4 Feb 15
Posts: 49
Credit: 15,956,546
RAC: 0
Message 1510 - Posted: 4 Sep 2016, 12:32:43 UTC

Thanks Krzyszp, also received quite a bit of work for the UV application.

Thank you

Conan
ID: 1510 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 21 Feb 15
Posts: 52
Credit: 318,272
RAC: 0
Message 1516 - Posted: 4 Sep 2016, 19:41:32 UTC - in response to Message 1510.  

Thanks Krzyszp, also received quite a bit of work for the UV application.


+1
But now: "Tasks are committed to other platforms"
ID: 1516 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Computer not receiving Work units




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek