Message boards : Number crunching : i need help
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 853 - Posted: 4 Dec 2015, 4:33:43 UTC

I have 9 completed work units, and have no other work on universe@home. my account says that there are 28 units in progress 19 of these I do not have and will not update. if I do a reset will I lose the 9 completed units on this computer?
ID: 853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 854 - Posted: 4 Dec 2015, 4:56:48 UTC

ok, now I hit the no new task, and then the update and it worked, so now I will reset universe@home but my account still shows that there are 27 units in progress
ID: 854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JumpinJohnny
Avatar

Send message
Joined: 1 Dec 15
Posts: 8
Credit: 443,667
RAC: 0
Message 855 - Posted: 4 Dec 2015, 5:13:56 UTC - in response to Message 854.  

I just had the same thing happen. Had to detach and re-attach the project.
The WU you see in tasks will (eventually) be marked as abandoned by the server.
After they have been marked as abandoned, you will be able to get more work.
ID: 855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 856 - Posted: 4 Dec 2015, 6:27:44 UTC

what I still can't understand is were 8 of the 9 w/u went. I had 28 w/u in my account. I updated 9 completed units and that should of left 19 that were abandoned not the 27 units that were abandoned. so it seems that 8 units were lost but thanks for your help JumpinJohnny.
ID: 856 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 26 Mar 15
Posts: 52
Credit: 1,737,270
RAC: 0
Message 858 - Posted: 4 Dec 2015, 9:10:08 UTC
Last modified: 4 Dec 2015, 9:19:46 UTC

In the past one or two days, the web server had some trouble with scheduler requests.

Even when a request from a host went through successfully, the web server replied with an internal server error. That means :

- Reported results appeared on the server side result list as finished but still had "ready to report" in the BOINC manager. On the next contact, the core client tried to report them again and got the message "already reported"
- Results have been sent out from the server to the core client but the core client did not recognize that the request for new work had been successful

This is how those "ghost WUs" are produced. I have 27 of them too, they will time out and be resent to a different host when the deadline has passed.

So the problem is not on your side, it is a project issue.

p.s.: The term "Ghost WUs" is not my invention, it is a very old one describing exactly this problem. Do a web search and you will find it only on BOINC related web pages.
ID: 858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 859 - Posted: 4 Dec 2015, 10:27:30 UTC

now that the problem has ben taken care of I'm not able to get any work on that unit. but my other one seems to get work just fine.
ID: 859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 862 - Posted: 4 Dec 2015, 15:42:09 UTC

I must of done something wrong, the ghost are back. my account has 23 units in it for this computer, but there is just two units in it.
ID: 862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 26 Mar 15
Posts: 52
Credit: 1,737,270
RAC: 0
Message 864 - Posted: 4 Dec 2015, 19:26:14 UTC - in response to Message 862.  

I must of done something wrong,....

Not a problem on your side, maybe the project should reboot.
ID: 864 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JumpinJohnny
Avatar

Send message
Joined: 1 Dec 15
Posts: 8
Credit: 443,667
RAC: 0
Message 866 - Posted: 4 Dec 2015, 21:58:23 UTC - in response to Message 864.  

I must of done something wrong,....

Not a problem on your side, maybe the project should reboot.


I'm not sure if I just got lucky or did something right -- but I have been running my Windoz machine for a full day now with no "ghosts" or other errors.
What I did:
I Suspended the project in BOINC Manager then Removed the project from the manager.
I then Closed the manager and shut off the machine and Rebooted cold.
When I turned on the BOINC Manager again I re-attached Universe@Home project AFTER stopping other CPU tasks from other projects.
So far all the ghosts were marked as "Abandoned" and I am getting 2 WU per core and all is running smooth.
It may be that Windoz needs to close to remove any open slots used by U@H ??? and remove the associated xml files as well???
I hate to jinx myself by posting this, but whatever I did, it seems to be working.
ID: 866 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 26 Mar 15
Posts: 52
Credit: 1,737,270
RAC: 0
Message 876 - Posted: 7 Dec 2015, 7:36:16 UTC - in response to Message 866.  
Last modified: 7 Dec 2015, 8:29:57 UTC

Tried that and it worked ... for one request.

After that first attempt, where I received a bunch of results, I'm getting "internal server error" and ghost WUs again :-(

Please verify, if the problem is back for you too?
____

One more really weird thing happens now and then. Two times, a result that had a normal deadline before went red and jumped to a really short (~3.5 hours) deadline in the result list (server side). But the scheduler doesn't send it out again, it still has "in progress" instead of a timeout.

There is one setting that I know about that might touch the deadline of a result that isn't even mentioned in the scheduler request. It is the feature that Einstein uses in order to get rid of ghost WUs, it keeps track of all files the host should have and compares this list to the files the host acually has. But it should resend lost results rather than messing them up.
____

This is a copy of such a sched_reply with an error 500 (internal server error) in it :

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
webmaster@localhost and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
<hr>
<address>Apache/2.2.22 (Debian) Server at debian1.universeathome.pl Port 80</address>
</body></html>


(unfortunately neither code tag nor pre tag work in this project)
____

I wonder if it might still be the "venue" thing as the host tries to update the global venue on server contact (the project still uses single quotes, like "venue name='school'"). So this might be a bad side effect of the messed up venues.

On the global settings page I get tons of those messages :

Notice: Trying to get property of non-object in /home/boincadm/projects/universe/html/inc/prefs_util.inc on line xxx (several lines mentioned there)
____

The web page for the project venues seem to be fixed btw., but the venue handling still suffers from the single quotes.
ID: 876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JumpinJohnny
Avatar

Send message
Joined: 1 Dec 15
Posts: 8
Credit: 443,667
RAC: 0
Message 877 - Posted: 7 Dec 2015, 9:43:16 UTC - in response to Message 876.  

Tried that and it worked ... for one request.
After that first attempt, where I received a bunch of results, I'm getting "internal server error" and ghost WUs again :-(
Please verify, if the problem is back for you too?
.


The issue on my computer has been eliminated. No problems , no errors.

** I notice you are running an older version of Boinc Mgr.
Perhaps try BOINC 7.6.9 **
ID: 877 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JumpinJohnny
Avatar

Send message
Joined: 1 Dec 15
Posts: 8
Credit: 443,667
RAC: 0
Message 878 - Posted: 7 Dec 2015, 15:20:41 UTC - in response to Message 876.  
Last modified: 7 Dec 2015, 15:30:38 UTC

SORRY -- posted that too quickly.
After looking around at errors that other people are having-- I see they are NOT related to Boinc version and also NOT all related to "_bh_kb_" issues.
Obviously, most of the problems came from v.04 and v.06.
I don't see a commonality in any of the v.05 or v.07 errors.
It just happens that I am not having any errors for 2 days... my widoz computer seems to be an exception...don't know why.
ID: 878 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 879 - Posted: 7 Dec 2015, 16:31:29 UTC

when the first bh units came out, I had a lot of errors. then I updated bonic manger and they all went away, until the ghosts happened now it seems to be ok. so if you update bonic manger maybe you should remove universe@home and than add it again.
ID: 879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 892 - Posted: 9 Dec 2015, 13:42:52 UTC

I wonder if they are looking into this ghost problem. because there back, but on my other computer this time.
ID: 892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 685
Credit: 90,701,965
RAC: 29,284
Message 893 - Posted: 9 Dec 2015, 15:35:31 UTC - in response to Message 892.  

I wonder if they are looking into this ghost problem. because there back, but on my other computer this time.

Can you send me a link to "ghost" WU, please?
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home project team
My Patreon profile
ID: 893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ken losey

Send message
Joined: 22 Feb 15
Posts: 11
Credit: 3,136,985
RAC: 0
Message 894 - Posted: 9 Dec 2015, 19:40:38 UTC

this is the last one
7113681 3091979 9 Dec 2015, 13:24:33 UTC 9 Dec 2015, 19:33:46 UTC Abandoned 0.00 0.00 --- Universe BHspin v0.03
ID: 894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 26 Mar 15
Posts: 52
Credit: 1,737,270
RAC: 0
Message 895 - Posted: 9 Dec 2015, 23:05:38 UTC - in response to Message 893.  
Last modified: 9 Dec 2015, 23:23:39 UTC

The Apache log must be full of HTTP 500 errors lately. Maybe the log contains more informations than the sched_reply file.

I still think that it is a database problem, somehow connected to the problems on http://universeathome.pl/universe/prefs.php?subset=global , maybe a damaged index or so.

It might as well be a heap size problem though as it seems to be worse when you still have a few workunits and less likely if the host ran empty and the sched_request file is smaller.

p.s.: Some projects keep a detailed log of the last scheduler contact of each host on server side. I don't know if this is a server side BOINC setting. I haven't seen it for quite a long time, SETI had it but it is gone there now so it might not be included in current versions of the BOINC server side anymore. Leiden Classical still has it, there the "Last contact" date in the host list is a hyperlink pointing to a file in the directory PROJECT_ROOT/sched_logs/.
ID: 895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 26 Mar 15
Posts: 52
Credit: 1,737,270
RAC: 0
Message 904 - Posted: 12 Dec 2015, 10:45:34 UTC
Last modified: 12 Dec 2015, 10:48:09 UTC

Here is one with a weird deadline :

http://universeathome.pl/universe/result.php?resultid=7168904

Sent = 12 Dec 2015, 8:00:12 UTC
Deadline = 12 Dec 2015, 8:57:52 UTC

It is a ghost, I don't have it on my host.

The others that I "received" with the same failed scheduler reply have a normal deadline (26 Dec 2015, 8:00:12 UTC)
ID: 904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tex1954

Send message
Joined: 22 Feb 15
Posts: 23
Credit: 36,222,060
RAC: 64,615
Message 906 - Posted: 12 Dec 2015, 15:26:39 UTC
Last modified: 12 Dec 2015, 15:27:43 UTC

I think their server is doing something weird... Sometimes locks up my system with XML errors and I end up having to detach/attach to fix it... lost credit for 29 WU's earlier this morning because of it... posted prob in thread below...

8-)


14966 Universe@Home 12/11/2015 11:59:40 PM Sending scheduler request: To report completed tasks.
14967 Universe@Home 12/11/2015 11:59:40 PM Reporting 47 completed tasks
14968 Universe@Home 12/11/2015 11:59:40 PM Requesting new tasks for CPU and NVIDIA GPU
14969 Universe@Home 12/11/2015 11:59:46 PM [error] Can't parse file info in scheduler reply: unexpected XML tag or syntax
14970 Universe@Home 12/11/2015 11:59:46 PM [error] No close tag in scheduler reply
14971 Universe@Home 12/12/2015 12:10:45 AM Sending scheduler request: To report completed tasks.
14972 Universe@Home 12/12/2015 12:10:45 AM Reporting 47 completed tasks
14973 Universe@Home 12/12/2015 12:10:45 AM Requesting new tasks for CPU and NVIDIA GPU
14974 Universe@Home 12/12/2015 12:10:51 AM [error] Can't parse file info in scheduler reply: unexpected XML tag or syntax
14975 Universe@Home 12/12/2015 12:10:51 AM [error] No close tag in scheduler reply
ID: 906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 26 Mar 15
Posts: 52
Credit: 1,737,270
RAC: 0
Message 907 - Posted: 12 Dec 2015, 21:08:02 UTC - in response to Message 906.  
Last modified: 12 Dec 2015, 21:09:19 UTC

...
14974 Universe@Home 12/12/2015 12:10:51 AM [error] Can't parse file info in scheduler reply: unexpected XML tag or syntax
14975 Universe@Home 12/12/2015 12:10:51 AM [error] No close tag in scheduler reply

Good find, I posted in some other thread that the BOINC XML parser is very picky with missing linefeeds and this is what such a bad one looks like :
...
<html><head>
...
<address>Apache/2.2.22 (Debian) Server at debian1.universeathome.pl Port 80</address>
</body></html>

BOINC cannot handle two tags in one line (except for open/close of the same tag)

But this error is not the primary error, it is caused by a HTML error 500 (internal server error) and the default Apache page for that error is not BOINC compliant.
As soon as the server error 500 is fixed, that tag thing will disappear on its own.
ID: 907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : i need help




Copyright © 2020 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek