1) Message boards : Number crunching : Download errors (Message 1016)
Posted 11 Jan 2016 by Profile Ananas
Post:
It affected only very few workunits and not even all in a row, all that I reported above are abandoned now : "Too many errors (may have bug)" so it's sure not a host issue.

Maybe the HDD had been stuffed temporarily when it happened or /tmp went out of space or something like that.
2) Message boards : Number crunching : Download errors (Message 1013)
Posted 11 Jan 2016 by Profile Ananas
Post:
universe_bh_332_4288_20000_1-999999_293100
universe_bh_332_3598_20000_1-999999_602100
universe_bh_332_3610_20000_1-999999_614100
universe_bh_332_3740_20000_1-999999_744100

WU download error: couldn't get input files:
3) Message boards : Number crunching : Excessively Long Estimated Finish Times (Message 996)
Posted 31 Dec 2015 by Profile Ananas
Post:
For same batch of tasks, one of my machines shows 13 minutes, other one - 7 days... It's not project settings problem but different versions of manager differently calculate parameters sent by server...

You're right, the 7.x clients with the per-application correction factor handle things a lot different from the previous BOINC versions.

For me all workunits I received in the past few weeks had fairly good estimated runtimes - but I disabled that per-application factor with a patch.

My current (project-wide, not per app) DCF is 91 but my core client shows 8:19 hours for a 5:30 hours result so it seems to have adjusted the DCF not too bad. Currently this project together with SETI Beta are the ones with the best estimated runtime - but keep in mind that I'm using a patched client.

p.s.: Not to be misunderstood ... I do think that the per-application correction factor is a good thing - but only if it works properly. I don't think that the current core client does it right.
4) Message boards : Number crunching : Something is messy on the project prefs page (Message 989)
Posted 29 Dec 2015 by Profile Ananas
Post:
I'm not sure about all this preferences setting where one project affects them all. I understand the idea, but there is NO reason I can think of where a CPU running several projects has to use the SAME local preferences for EACH project.. Also, I am not 100% sure which preferences are considered GLOBAL and which LOCAL. ...

http://universeathome.pl/universe/prefs.php?subset=project
The local preferences are those where you decide about the percentage of time, this specific project is supposed to use from your total BOINC CPU time, which subprojects (or applications) you want to crunch for (if the project has several), so this is clearly all project stuff, not relevant for any other project.

http://universeathome.pl/universe/prefs.php?subset=global
In the global preferences you can decide how much HDD, memory and CPU shares your BOINC client is allowed to use for all projects combined. This setting is cross-project, your host transports it from the project where you made the last changes to all other projects it contacts. When another one of your hosts contacts a project that has a setting newer than the ones it has on HDD, it will take over the new settings and tell you (in a message) which project you had choosen to make the changes..

The following depends on your BOINC version :

In addition, you can create local preferences on your host, that override the global settings, without influencing the settings of other hosts.

And you can create some local files that decide about the numer of tasks you want to allow for each project and/or for each applications or replace an application with one that you compiled or downloaded somewhere else (optimized project clients or specific clients for exotic hardware).
5) Message boards : Number crunching : No New Work (Message 965)
Posted 22 Dec 2015 by Profile Ananas
Post:
Might be this:

Transitioner backlog (hours) 17.72

Yes, this is a problem. It also causes the validator to ignore WUs with 2 finished results that would be ready for validation.
6) Message boards : Number crunching : No New Work (Message 937)
Posted 16 Dec 2015 by Profile Ananas
Post:
This is the same problem as the one described here - or at least related I guess.

After receiving the "can't parse" message on the client, check the file sched_reply_universeathome.pl_universe.xml and see if it looks more like an error page rather than like a reply from the scheduler.
7) Message boards : Number crunching : Something is messy on the project prefs page (Message 928)
Posted 16 Dec 2015 by Profile Ananas
Post:
You did it :-)

It accepts the project settings for different venues now, I had set the share to 95 in order to see when the bug is gone and this morning my client showed those 95. Setting the share back to 100 currently doesn't take effect but that's caused by the HTTP error 500 (internal server error)
8) Message boards : Number crunching : i need help (Message 927)
Posted 16 Dec 2015 by Profile Ananas
Post:
yes, the problem is back :-(
9) Message boards : Number crunching : Something is messy on the project prefs page (Message 912)
Posted 14 Dec 2015 by Profile Ananas
Post:
This seems to be fixed now :-)

... partially fixed. The page looks good now but the sched_reply still has the single-quoted venues. This means that the settings still take no effect.
10) Message boards : Number crunching : i need help (Message 910)
Posted 13 Dec 2015 by Profile Ananas
Post:
It might be fixed, I didn't have HTTP errors in the last few replies anymore.
11) Message boards : Number crunching : i need help (Message 909)
Posted 13 Dec 2015 by Profile Ananas
Post:
Definitely not on your side, it is a server side issue.

Reporting finished results does work btw., they just do not disppear from your host as it doesn't recognize the confirmation.
12) Message boards : Number crunching : i need help (Message 907)
Posted 12 Dec 2015 by Profile Ananas
Post:
...
14974 Universe@Home 12/12/2015 12:10:51 AM [error] Can't parse file info in scheduler reply: unexpected XML tag or syntax
14975 Universe@Home 12/12/2015 12:10:51 AM [error] No close tag in scheduler reply

Good find, I posted in some other thread that the BOINC XML parser is very picky with missing linefeeds and this is what such a bad one looks like :
...
<html><head>
...
<address>Apache/2.2.22 (Debian) Server at debian1.universeathome.pl Port 80</address>
</body></html>

BOINC cannot handle two tags in one line (except for open/close of the same tag)

But this error is not the primary error, it is caused by a HTML error 500 (internal server error) and the default Apache page for that error is not BOINC compliant.
As soon as the server error 500 is fixed, that tag thing will disappear on its own.
13) Message boards : Number crunching : i need help (Message 904)
Posted 12 Dec 2015 by Profile Ananas
Post:
Here is one with a weird deadline :

http://universeathome.pl/universe/result.php?resultid=7168904

Sent = 12 Dec 2015, 8:00:12 UTC
Deadline = 12 Dec 2015, 8:57:52 UTC

It is a ghost, I don't have it on my host.

The others that I "received" with the same failed scheduler reply have a normal deadline (26 Dec 2015, 8:00:12 UTC)
14) Message boards : Number crunching : i need help (Message 895)
Posted 9 Dec 2015 by Profile Ananas
Post:
The Apache log must be full of HTTP 500 errors lately. Maybe the log contains more informations than the sched_reply file.

I still think that it is a database problem, somehow connected to the problems on http://universeathome.pl/universe/prefs.php?subset=global , maybe a damaged index or so.

It might as well be a heap size problem though as it seems to be worse when you still have a few workunits and less likely if the host ran empty and the sched_request file is smaller.

p.s.: Some projects keep a detailed log of the last scheduler contact of each host on server side. I don't know if this is a server side BOINC setting. I haven't seen it for quite a long time, SETI had it but it is gone there now so it might not be included in current versions of the BOINC server side anymore. Leiden Classical still has it, there the "Last contact" date in the host list is a hyperlink pointing to a file in the directory PROJECT_ROOT/sched_logs/.
15) Message boards : Number crunching : i need help (Message 876)
Posted 7 Dec 2015 by Profile Ananas
Post:
Tried that and it worked ... for one request.

After that first attempt, where I received a bunch of results, I'm getting "internal server error" and ghost WUs again :-(

Please verify, if the problem is back for you too?
____

One more really weird thing happens now and then. Two times, a result that had a normal deadline before went red and jumped to a really short (~3.5 hours) deadline in the result list (server side). But the scheduler doesn't send it out again, it still has "in progress" instead of a timeout.

There is one setting that I know about that might touch the deadline of a result that isn't even mentioned in the scheduler request. It is the feature that Einstein uses in order to get rid of ghost WUs, it keeps track of all files the host should have and compares this list to the files the host acually has. But it should resend lost results rather than messing them up.
____

This is a copy of such a sched_reply with an error 500 (internal server error) in it :

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
webmaster@localhost and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
<hr>
<address>Apache/2.2.22 (Debian) Server at debian1.universeathome.pl Port 80</address>
</body></html>


(unfortunately neither code tag nor pre tag work in this project)
____

I wonder if it might still be the "venue" thing as the host tries to update the global venue on server contact (the project still uses single quotes, like "venue name='school'"). So this might be a bad side effect of the messed up venues.

On the global settings page I get tons of those messages :

Notice: Trying to get property of non-object in /home/boincadm/projects/universe/html/inc/prefs_util.inc on line xxx (several lines mentioned there)
____

The web page for the project venues seem to be fixed btw., but the venue handling still suffers from the single quotes.
16) Message boards : Number crunching : i need help (Message 864)
Posted 4 Dec 2015 by Profile Ananas
Post:
I must of done something wrong,....

Not a problem on your side, maybe the project should reboot.
17) Message boards : Number crunching : i need help (Message 858)
Posted 4 Dec 2015 by Profile Ananas
Post:
In the past one or two days, the web server had some trouble with scheduler requests.

Even when a request from a host went through successfully, the web server replied with an internal server error. That means :

- Reported results appeared on the server side result list as finished but still had "ready to report" in the BOINC manager. On the next contact, the core client tried to report them again and got the message "already reported"
- Results have been sent out from the server to the core client but the core client did not recognize that the request for new work had been successful

This is how those "ghost WUs" are produced. I have 27 of them too, they will time out and be resent to a different host when the deadline has passed.

So the problem is not on your side, it is a project issue.

p.s.: The term "Ghost WUs" is not my invention, it is a very old one describing exactly this problem. Do a web search and you will find it only on BOINC related web pages.
18) Message boards : Number crunching : BHspin v0.05 (Message 840)
Posted 2 Dec 2015 by Profile Ananas
Post:
...
Maybe it's of interest that all they ran on an AMD cpu.
...

Not CPU type related, I had 5 of those on a Xeon.

Some wingmen had a proper stderr message saying that the application could not be found so it might have been a configuration problem on server side. The ones I got later did work.

I already found that my core client - just like yours - doesn't transfer stderr on certain errors. A new "feature" I guess :-(
19) Message boards : Number crunching : Validator gone crazy (Message 826)
Posted 28 Nov 2015 by Profile Ananas
Post:
http://universeathome.pl/universe/workunit.php?wuid=2941786

The broken result received 1333 credits, the two valid ones 952.38
20) Message boards : Number crunching : Something is messy on the project prefs page (Message 791)
Posted 22 Nov 2015 by Profile Ananas
Post:
bump

It is important to fix this as it does not only affect the project preferences but also the computing preferences - it affects other projects too.

p.s. @ALL : As long as you do not use this project to change your computing preferences, it will not delete your settings in other projects. Always use the projects with working venues to change the project independant settings.

If you already did change those settings in this project, it will delete all venue settings except for "default" as soon as a host contacts a different project. If this happens, you often still can find a project that has not been messed up yet (usually an inactive project). If you change and save any setting there and make your host contact that other project, it will restore the lost venues in the active projects too.


Next 20




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek