1) Message boards : Number crunching : Computer not receiving Work units (Message 1552)
Posted 11 Sep 2016 by skgiven
Post:
Well, it was a guess and I can neither remember how the trusted/rating system works or think of anything else to try at this end.

If I look at my Consecutive valid tasks (1465) for the Ultraviolet reionization app (on one system) it doesn't suggest that it would have a low trust status with the server:
    Universe Ultraviolet reionization 0.02 x86_64-pc-linux-gnu
    Number of tasks completed 1466
    Max tasks per day 2465
    Number of tasks today 0
    Consecutive valid tasks 1465
    Average processing rate 0.14 GFLOPS
    Average turnaround time 0.10 days

The app details don't actually give the systems trusted rating or elude to the parameters the server might be enforcing to select systems and there is no mention under the app of failures. For example, there could be a bandwidth restriction enforcement that requires you to have a low contention and fast upload/download transfer rates or a turn around or runtime of no more than x minutes.

Server says:
Universe Ultraviolet reionization 1256 3779 0.83 (0.12 - 4.69) 145

Boinc Says:
No tasks are available for Universe Ultraviolet reionization

So, I reset the project from BM. Server says, Scheduler request failed: HTTP internal server error

My guess is that there is a bad pointer record on the server, but there could be other issues too. Don't know how that would only stop me getting tasks from the Ultraviolet reionization queue but one things for sure, there is a server issue.

Removed and re-added project, but still no UR tasks.

2) Message boards : Number crunching : Computer not receiving Work units (Message 1549)
Posted 11 Sep 2016 by skgiven
Post:
So the server aborts are negating the systems credit rating and good systems are getting no work?

My attempt to get work:
As I only had QuarkStars & Ultraviolet reionization apps selected and wasn't getting tasks on any system I added BHspin v2 (hoping that the system ratings are not applied to that queue) and received 16 new BHspin tasks on one system. I was banking on them completing quickly and improving the systems status in the hope that I would then be able to switch back to the 2 apps and get work for Ultraviolet reionization. Unfortunately they downloaded with an estimated runtime of 10min which is now looking more like 4h (with the remaining time going up and down). Guess it'll be a day or two before I get any Ultraviolet reionization tasks, if they are still around then. I'm assuming the computer status is calculated (and not displayed) on a system/project basis rather than app basis...

Tried the same on another system and initially got the "Tasks are committed to other platforms" message - so that has been implemented. When I manually asked again I got one task that will take 6h to run... I suspended other CPU tasks and waited in vain for BM to ask for work despite increasing my cache. It's a pain at this end too.
3) Message boards : Number crunching : Computer not receiving Work units (Message 1547)
Posted 11 Sep 2016 by skgiven
Post:
Server Status

T0 Universe Ultraviolet reionization 1587 3799 0.96 (0.08 - 4.95) 129
T1 Universe Ultraviolet reionization 1585 3794 0.96 (0.08 - 4.95) 131
T2 Universe Ultraviolet reionization 1462 3914 0.96 (0.08 - 4.95) 131
T3 Universe Ultraviolet reionization 1382 3922 0.93 (0.08 - 4.95) 135

Boinc Manager:
T0 No tasks are available for Universe Ultraviolet reionization
T1 No tasks are available for Universe Ultraviolet reionization
T2 No tasks are available for Universe Ultraviolet reionization
T3 No tasks are available for Universe Ultraviolet reionization

It appears that the server status is updating and that there are tasks available. While I guess the server status might include a bad batch, not deleted properly, it's more likely that I'm just not getting tasks on any of my systems, possibly because they don't match other systems architectures (CPU Family), though I'm not sure of that having just looked at a Xeon validate against an AMD Athlon. Lots of server side changes make it difficult to follow what's going on. If we're not getting tasks because of CPU Family pairing (or mixed-pairing) then an appropriate message would be helpful. Presently BM only gets sent the generic "No tasks are available" message which is misleading/confusing/not helpful (which is the purpose of a message); hence all the posts here and no tasks being sent to many peoples systems.

Noticed that I was getting tasks on one of my systems until 139 were tasks were "Cancelled by server" on the 4th Sept. 202 (0xca) EXIT_ABORTED_BY_PROJECT. Maybe there was a project change at that time? Before the server cancellations there were 522 consecutive successful tasks. Would having lots of errors against it prevent it getting new tasks? Others seem to be in the same situation but get tasks.
The tasks had a minimum quorum of 2 and an initial replication of 3, but the server was set to automatically abort the 3rd task if the first two were reported and validated (might only be if the 3rd task doesn't start). My tasks were cancelled within 2h of being sent, which is too tight even for my 0.01 cache and average turnaround time of 0.10 days for those WU's.
Seems odd that you would send out a task on 24th Aug, which got reported the same day, but not send out a second task until the 4th Sept and then send two tasks only to abort 1 after 1.5h. Guess you changed the parameters and they are produced in batches which initially had a 10day turn around.

Anyway, I've only received 1 task from 5 systems since 5th Sept, so is there anything I can do at my end to get tasks?







Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek