extreme long wu's Universe@Home

Author	Message
Matthias Lehmkuhl Send message Joined: 23 Feb 15 Posts: 2 Credit: 2,107,545 RAC: 0	Message 2309 - Posted: 24 Jul 2017, 16:45:42 UTC - in response to Message 2279. I've got a long running result, will cancel that result now with a CPU time of more than 11 days. last checkpoint was 13.07.2017 and progress shows fraction_done of 0.450050 http://universeathome.pl/universe/workunit.php?wuid=10535753 wu_name: universe_bh2_160803_154_2_20000_1-999999_470000 result_name: universe_bh2_160803_154_2_20000_1-999999_470000_0 app_file: BHspin2_1_windows_intelx86.exe error.dat does show: error: bondi() accreted mass (6.458614) larger than envelope mass (4.354907) (2714882) error: in Renv_con() unknown Ka type: 1, iidd_old: 2840284error: in Menv_con() unknown Ka type: 1, iidd_old: 2840284 error.dat2 does show: error: bondi() accreted mass (5.652698) larger than envelope mass (5.233360) (240724) error: bondi() accreted mass (7.216964) larger than envelope mass (6.333716) (2569362) error.dat3 does show: error: bondi() accreted mass (5.652698) larger than envelope mass (5.233360) (240724) error: bondi() accreted mass (7.216964) larger than envelope mass (6.333716) (2569362) log.txt contains: 00:00:00 00:00:00 PROGRAM START: Thu Jul 13 02:29:34 2017 00:00:00 00:00:00 no checkpoint.dat file found00:00:00 00:00:00 cleaning checkpoints 00:00:00 00:00:00 gw_cpfile: source file "data0.dat2" not present 00:00:00 00:00:00 gw_cpfile: source file "data1.dat2" not present 00:00:00 00:00:00 gw_cpfile: source file "data2.dat2" not present 00:00:00 00:00:00 gw_cpfile: source file "error.dat2" not present 00:00:00 00:00:00 reading checkpoint: istart: -1; pp: 0; n: -1 00:00:00 00:00:00 checkpoint read 00:00:00 00:00:00 default values set 00:00:00 00:00:00 Reading param.in file 00:00:00 00:00:00 PARAMIN: num_tested = 20000 00:00:00 00:00:00 PARAMIN: hub_val = 1000 00:00:00 00:00:00 PARAMIN: idum = -470000 00:00:00 00:00:00 PARAMIN: OUTPUT = 3 00:00:00 00:00:00 PARAMIN: Mmina = 5.0 00:00:00 00:00:00 PARAMIN: Mminb = 3.0 00:00:00 00:00:00 PARAMIN: golambda = 0.1 00:00:00 00:00:00 PARAMIN: Beta = 0.1 00:00:00 00:00:00 PARAMIN: Fa = 1.0 00:00:00 00:00:00 PARAMIN: Sigma3 = 0 00:00:00 00:00:00 PARAMIN: Sal = -2.7 00:00:00 00:00:00 PARAMIN: SS = 0 00:00:00 00:00:00 PARAMIN unknown parameter: name: SS; value: 0 00:00:00 00:00:00 PARAMIN: ZZ = 0.0001 00:00:00 00:00:00 param.in file read 00:00:00 00:00:00 idum: -470000; num_tested: 20000 00:05:24 00:05:24 making checkpoint: j: 1000; iidd: 282852 00:05:24 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:05:24 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:05:24 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:05:24 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:05:24 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:05:24 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:05:24 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:05:24 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:11:01 00:05:37 making checkpoint: j: 2000; iidd: 575529 00:11:01 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:11:01 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:11:01 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:11:01 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:11:01 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:11:01 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:11:01 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:11:01 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:16:28 00:05:27 making checkpoint: j: 3000; iidd: 869551 00:16:28 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:16:28 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:16:28 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:16:28 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:16:28 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:16:28 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:16:28 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:16:28 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:22:36 00:06:08 making checkpoint: j: 4000; iidd: 1164932 00:22:36 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:22:36 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:22:36 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:22:36 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:22:36 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:22:36 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:22:36 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:22:36 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:28:27 00:05:51 making checkpoint: j: 5000; iidd: 1449110 00:28:27 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:28:27 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:28:27 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:28:27 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:28:27 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:28:27 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:28:27 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:28:27 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:34:07 00:05:40 making checkpoint: j: 6000; iidd: 1740336 00:34:07 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:34:07 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:34:07 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:34:07 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:34:07 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:34:07 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:34:07 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:34:07 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:39:57 00:05:50 making checkpoint: j: 7000; iidd: 2037642 00:39:57 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:39:57 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:39:57 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:39:57 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:39:57 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:39:57 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:39:57 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:39:57 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:45:09 00:05:12 making checkpoint: j: 8000; iidd: 2308124 00:45:09 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:45:09 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:45:09 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:45:09 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:45:09 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:45:09 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:45:09 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:45:09 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:50:43 00:05:34 making checkpoint: j: 9000; iidd: 2581356 00:50:43 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:50:43 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:50:43 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:50:43 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:50:43 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:50:43 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:50:43 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:50:43 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:00:00 00:00:00 PROGRAM START: Thu Jul 13 02:29:34 2017 00:00:00 00:00:00 no checkpoint.dat file found00:00:00 00:00:00 cleaning checkpoints 00:00:00 00:00:00 gw_cpfile: source file "data0.dat2" not present 00:00:00 00:00:00 gw_cpfile: source file "data1.dat2" not present 00:00:00 00:00:00 gw_cpfile: source file "data2.dat2" not present 00:00:00 00:00:00 gw_cpfile: source file "error.dat2" not present 00:00:00 00:00:00 reading checkpoint: istart: -1; pp: 0; n: -1 00:00:00 00:00:00 checkpoint read 00:00:00 00:00:00 default values set 00:00:00 00:00:00 Reading param.in file 00:00:00 00:00:00 PARAMIN: num_tested = 20000 00:00:00 00:00:00 PARAMIN: hub_val = 1000 00:00:00 00:00:00 PARAMIN: idum = -470000 00:00:00 00:00:00 PARAMIN: OUTPUT = 3 00:00:00 00:00:00 PARAMIN: Mmina = 5.0 00:00:00 00:00:00 PARAMIN: Mminb = 3.0 00:00:00 00:00:00 PARAMIN: golambda = 0.1 00:00:00 00:00:00 PARAMIN: Beta = 0.1 00:00:00 00:00:00 PARAMIN: Fa = 1.0 00:00:00 00:00:00 PARAMIN: Sigma3 = 0 00:00:00 00:00:00 PARAMIN: Sal = -2.7 00:00:00 00:00:00 PARAMIN: SS = 0 00:00:00 00:00:00 PARAMIN unknown parameter: name: SS; value: 0 00:00:00 00:00:00 PARAMIN: ZZ = 0.0001 00:00:00 00:00:00 param.in file read 00:00:00 00:00:00 idum: -470000; num_tested: 20000 00:05:24 00:05:24 making checkpoint: j: 1000; iidd: 282852 00:05:24 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:05:24 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:05:24 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:05:24 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:05:24 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:05:24 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:05:24 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:05:24 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:11:01 00:05:37 making checkpoint: j: 2000; iidd: 575529 00:11:01 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:11:01 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:11:01 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:11:01 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:11:01 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:11:01 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:11:01 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:11:01 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:16:28 00:05:27 making checkpoint: j: 3000; iidd: 869551 00:16:28 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:16:28 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:16:28 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:16:28 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:16:28 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:16:28 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:16:28 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:16:28 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:22:36 00:06:08 making checkpoint: j: 4000; iidd: 1164932 00:22:36 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:22:36 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:22:36 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:22:36 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:22:36 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:22:36 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:22:36 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:22:36 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:28:27 00:05:51 making checkpoint: j: 5000; iidd: 1449110 00:28:27 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:28:27 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:28:27 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:28:27 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:28:27 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:28:27 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:28:27 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:28:27 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:34:07 00:05:40 making checkpoint: j: 6000; iidd: 1740336 00:34:07 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:34:07 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:34:07 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:34:07 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:34:07 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:34:07 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:34:07 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:34:07 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:39:57 00:05:50 making checkpoint: j: 7000; iidd: 2037642 00:39:57 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:39:57 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:39:57 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:39:57 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:39:57 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:39:57 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:39:57 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:39:57 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:45:09 00:05:12 making checkpoint: j: 8000; iidd: 2308124 00:45:09 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:45:09 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:45:09 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:45:09 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:45:09 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:45:09 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:45:09 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:45:09 00:00:00 gw_cpfile: error.dat appended to error.dat3 00:50:43 00:05:34 making checkpoint: j: 9000; iidd: 2581356 00:50:43 00:00:00 gw_cpfile: data0.dat appended to data0.dat2 00:50:43 00:00:00 gw_cpfile: data1.dat appended to data1.dat2 00:50:43 00:00:00 gw_cpfile: data2.dat appended to data2.dat2 00:50:43 00:00:00 gw_cpfile: error.dat appended to error.dat2 00:50:43 00:00:00 gw_cpfile: data0.dat appended to data0.dat3 00:50:43 00:00:00 gw_cpfile: data1.dat appended to data1.dat3 00:50:43 00:00:00 gw_cpfile: data2.dat appended to data2.dat3 00:50:43 00:00:00 gw_cpfile: error.dat appended to error.dat3 Matthias ID: 2309 · Rating: 0 · rate: / Reply Quote

JugNut Send message Joined: 11 Mar 15 Posts: 37 Credit: 271,242,973 RAC: 0	Message 2310 - Posted: 28 Jul 2017, 13:53:51 UTC - in response to Message 2030. Last modified: 28 Jul 2017, 14:12:04 UTC Hey krzys I've just found a bunch of these bad WU's over all my PC's, what a mess. I found out the hard way that even if I abort them they still don't die. They have to be manually killed from task manager. If you don't manually kill them after aborting them boinc thinks they've gone and assigns new work into the already loaded & running slot. Not good!! Will kill this WU now, this is the fifth in the last few hours :( The only way I can tell there locked up & not just long running is keeping an eye on the checkpointing. The WU below has been running for 15hrs 23mins and hasn't checkpointed for the last 13hrs. At least now I know what to look for & how to treat them. Aggressively... As usual the stderr is empty but If you're interested I kept a copy of the slot directory before I aborted it, if you would like it just ask. http://universeathome.pl/universe/results.php?hostid=1679&offset=0&show_names=0&state=6&appid= Contents of error.dat3..... error: bondi() accreted mass (6.024443) larger than envelope mass (5.618883) (60413) error: bondi() accreted mass (7.907802) larger than envelope mass (7.402827) (144779) error: bondi() accreted mass (5.415336) larger than envelope mass (2.590284) (146410) error: bondi() accreted mass (9.456944) larger than envelope mass (9.022832) (258705) error: bondi() accreted mass (9.386890) larger than envelope mass (5.976198) (356863) error: bondi() accreted mass (11.491090) larger than envelope mass (7.780758) (361139) error: bondi() accreted mass (6.318919) larger than envelope mass (5.818937) (438696) error: bondi() accreted mass (5.645096) larger than envelope mass (5.213384) (445394) error: bondi() accreted mass (5.773027) larger than envelope mass (5.230975) (671283) error: bondi() accreted mass (12.410371) larger than envelope mass (8.284976) (693333) error: bondi() accreted mass (8.904030) larger than envelope mass (6.786716) (702075) error: bondi() accreted mass (6.480212) larger than envelope mass (6.192082) (750103) error: bondi() accreted mass (5.496527) larger than envelope mass (4.505465) (818009) EDIT: Just tried to kill another one but instead this time it killed my PC, (blue screened) ID: 2310 · Rating: 0 · rate: / Reply Quote

Gibson Praise Send message Joined: 26 Feb 15 Posts: 3 Credit: 56,424,411 RAC: 0	Message 2322 - Posted: 12 Aug 2017, 2:50:54 UTC - in response to Message 2279. This is still an issue! :( I have to ask -- is this problem being worked on? The general response seems to be deal with it, these wus come in spurts and it is a cost of doing business. I can handle that .. but it is concerning that such a long-standing problem has still not been successfully addressed and does not seem to be a priority. ID: 2322 · Rating: 0 · rate: / Reply Quote

JugNut Send message Joined: 11 Mar 15 Posts: 37 Credit: 271,242,973 RAC: 0	Message 2330 - Posted: 14 Aug 2017, 6:09:46 UTC - in response to Message 2322. Last modified: 14 Aug 2017, 6:10:18 UTC Another two never ending BHspin tasks :( So far wingan involved has had the same problem, all though it wouldn't surprise me if they eventually validated. http://universeathome.pl/universe/result.php?resultid=24645815 http://universeathome.pl/universe/result.php?resultid=24645698 ID: 2330 · Rating: 0 · rate: / Reply Quote

JugNut Send message Joined: 11 Mar 15 Posts: 37 Credit: 271,242,973 RAC: 0	Message 2331 - Posted: 14 Aug 2017, 9:47:37 UTC Last modified: 14 Aug 2017, 9:58:32 UTC Another bad batch, these WU should have been in the above post but regardless they all had to be manually aborted. Only my raspberry pi's do not show this behaviour. This is just a sample I have many more like em' "if" you're interested? http://universeathome.pl/universe/workunit.php?wuid=10841172 http://universeathome.pl/universe/workunit.php?wuid=10860295 ID: 2331 · Rating: 0 · rate: / Reply Quote

Conan Send message Joined: 4 Feb 15 Posts: 49 Credit: 15,956,546 RAC: 0	Message 2338 - Posted: 24 Aug 2017, 23:54:30 UTC Last modified: 24 Aug 2017, 23:55:16 UTC I have a BH Spin WU that has already been running for 1 and 1/2 days at 2 % with over 74 Days still to go and counting. I suspect this is a faulty WU and will never finish plus will go over deadline anyway. I will probably abort it this afternoon. Conan ID: 2338 · Rating: 0 · rate: / Reply Quote

Conan Send message Joined: 4 Feb 15 Posts: 49 Credit: 15,956,546 RAC: 0	Message 2339 - Posted: 25 Aug 2017, 3:13:56 UTC Last modified: 25 Aug 2017, 3:17:10 UTC I have just noticed that the percentage done has not move for many, many hours so I am aborting this WU. Time to complete has reached 80 days, percentage 2.029% after 1 day 16 hours. Conan ID: 2339 · Rating: 0 · rate: / Reply Quote

Krzysztof Piszczek - wspieram ... Project administrator Project developer Project tester Send message Joined: 4 Feb 15 Posts: 849 Credit: 144,180,465 RAC: 0	Message 2340 - Posted: 25 Aug 2017, 14:02:29 UTC - in response to Message 2339. If any WU calculate longer then 6 hours feel free to abort it. Or, if it is no percentage progress over one hour. Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT ID: 2340 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0	Message 2341 - Posted: 25 Aug 2017, 18:59:26 UTC Last modified: 25 Aug 2017, 19:00:33 UTC That doesn't work for unattended machines. Please fix your problem already, so computer resources aren't continually wasted! You were provided details, 6 months ago. ID: 2341 · Rating: 0 · rate: / Reply Quote

ritterm Send message Joined: 6 Mar 15 Posts: 28 Credit: 16,721,329 RAC: 0	Message 2346 - Posted: 29 Aug 2017, 16:53:13 UTC - in response to Message 2340. If any WU calculate longer then 6 hours feel free to abort it... But if it keeps checkpointing and appearing to advance toward completion, is there any reason to abort? ID: 2346 · Rating: 0 · rate: / Reply Quote

[AF>Libristes] erik Send message Joined: 21 Feb 15 Posts: 11 Credit: 364,694,894 RAC: 0	Message 2356 - Posted: 11 Sep 2017, 11:59:25 UTC But if it keeps checkpointing and appearing to advance toward completion, is there any reason to abort? No. everything seems normal now ID: 2356 · Rating: 0 · rate: / Reply Quote

Mattmon Send message Joined: 29 May 17 Posts: 1 Credit: 2,938,600 RAC: 0	Message 2357 - Posted: 12 Sep 2017, 11:23:25 UTC - in response to Message 2340. If any WU calculate longer then 6 hours feel free to abort it. Or turn off getting new tasks until this is fixed. ID: 2357 · Rating: 0 · rate: / Reply Quote

Jacob Klein Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0	Message 2359 - Posted: 15 Sep 2017, 12:09:38 UTC - in response to Message 2357. Last modified: 15 Sep 2017, 12:10:10 UTC If any WU calculate longer then 6 hours feel free to abort it. Or turn off getting new tasks until this is fixed. EXACTLY. I have several PCs doing BOINC work, and I can't monitor the details of every task that they do. This problem is real, and it wastes resources, making a CPU thread completely useless, as it spins its wheels on a task that won't complete... The devs here should put more effort into solving this problem, instead of not caring about wasted resources. Hell, for that reason alone I'd set "No New Tasks", but I'll also do it because the tasks here sometimes don't work and waste my CPU. I still hope for a fix, but in the meantime, you don't deserve my CPU if you're going to abuse it. "No New Tasks" for you. ID: 2359 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Feb 15 Posts: 253 Credit: 200,562,581 RAC: 0	Message 2360 - Posted: 18 Sep 2017, 11:01:41 UTC - in response to Message 2359. I may have mentioned this before, but the long runners seem to correlate with running other work units. I have been running Universe/BHspin v2 mostly by itself for a couple of months, and saw no long runners. Recently, I added LHC/SixTrack to this machine, and picked up a couple of long runners today. http://universeathome.pl/universe/result.php?resultid=26150652 http://universeathome.pl/universe/result.php?resultid=26150707 That is not much proof, and may be hard to fix, but I mention it for what it is worth. ID: 2360 · Rating: 0 · rate: / Reply Quote

cykodennis Send message Joined: 4 Feb 15 Posts: 24 Credit: 7,035,527 RAC: 0	Message 2361 - Posted: 19 Sep 2017, 8:03:39 UTC - in response to Message 2360. AFAIR, i can confirm this. Things started to get messy on my machine, when i ran LHC & Universe. Doesn't necessarily have to mean something, however.... "I should bring one important point to the attention of the authors and that is, the world is not the United States..." ID: 2361 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 28 Feb 15 Posts: 253 Credit: 200,562,581 RAC: 0	Message 2362 - Posted: 19 Sep 2017, 13:40:08 UTC - in response to Message 2361. I am going to try a little trick, and see how it works. Normally, LHC/SixTrack has either a lot of work or none at all. So instead of mixing it up with Universe, I have set Universe to 0 resource share. That way, when SixTrack has work, it will run by itself. And then, when SixTrack is out of work, Universe will run by itself. Maybe it will avoid some of the problems. ID: 2362 · Rating: 0 · rate: / Reply Quote

Luigi R. Send message Joined: 10 Sep 15 Posts: 12 Credit: 20,067,933 RAC: 0	Message 2363 - Posted: 19 Sep 2017, 16:37:51 UTC Last modified: 19 Sep 2017, 16:40:41 UTC I don't run Sixtrack, but I have got 4 stuck WUs and 1 suspect one. They're all named universe_bh2_160803_181_*. http://universeathome.pl/universe/workunit.php?wuid=11051278 http://universeathome.pl/universe/workunit.php?wuid=11051513 http://universeathome.pl/universe/workunit.php?wuid=11051517 http://universeathome.pl/universe/workunit.php?wuid=11051573 http://universeathome.pl/universe/workunit.php?wuid=11051697 error.dat files error: function Lzahbf(M,Mc) should not be called for HM stars error: function Lzahbf(M,Mc) should not be called for HM stars unexpected remnant case for K=5-6: 288457 error: function Lzahbf(M,Mc) should not be called for HM stars error: function Lzahbf(M,Mc) should not be called for HM stars unexpected remnant case for K=5-6: 682603 error: bondi() accreted mass (35.166687) larger than envelope mass (33.435905) (357777) error: bondi() accreted mass (9.873656) larger than envelope mass (8.763593) (395479) error: bondi() accreted mass (12.880583) larger than envelope mass (9.454412) (459214) error: bondi() accreted mass (9.679004) larger than envelope mass (8.249970) (469690) error: bondi() accreted mass (9.564457) larger than envelope mass (9.345383) (585174) error: bondi() accreted mass (10.187786) larger than envelope mass (6.455279) (611918) error: bondi() accreted mass (5.985729) larger than envelope mass (4.112997) (645555) error: bondi() accreted mass (35.166687) larger than envelope mass (33.435905) (357777) error: bondi() accreted mass (9.873656) larger than envelope mass (8.763593) (395479) error: bondi() accreted mass (12.880583) larger than envelope mass (9.454412) (459214) error: bondi() accreted mass (9.679004) larger than envelope mass (8.249970) (469690) error: bondi() accreted mass (9.564457) larger than envelope mass (9.345383) (585174) error: bondi() accreted mass (10.187786) larger than envelope mass (6.455279) (611918) error: bondi() accreted mass (5.985729) larger than envelope mass (4.112997) (645555) error: function Lzahbf(M,Mc) should not be called for HM stars error: function Lzahbf(M,Mc) should not be called for HM stars unexpected remnant case for K=5-6: 446956 error: bondi() accreted mass (5.599060) larger than envelope mass (4.544127) (28008) error: bondi() accreted mass (5.330190) larger than envelope mass (4.395586) (105539) error: bondi() accreted mass (2.216546) larger than envelope mass (1.953147) (135074) error: bondi() accreted mass (5.860709) larger than envelope mass (3.212747) (195016) error: bondi() accreted mass (5.714418) larger than envelope mass (5.400754) (218800) error: bondi() accreted mass (5.962354) larger than envelope mass (5.099944) (257882) error: bondi() accreted mass (9.910301) larger than envelope mass (8.742725) (301521) error: bondi() accreted mass (7.603677) larger than envelope mass (6.561131) (313873) error: bondi() accreted mass (5.856054) larger than envelope mass (5.343091) (321142) error: bondi() accreted mass (5.580022) larger than envelope mass (4.316905) (340643) error: bondi() accreted mass (5.599060) larger than envelope mass (4.544127) (28008) error: bondi() accreted mass (5.330190) larger than envelope mass (4.395586) (105539) error: bondi() accreted mass (2.216546) larger than envelope mass (1.953147) (135074) error: bondi() accreted mass (5.860709) larger than envelope mass (3.212747) (195016) error: bondi() accreted mass (5.714418) larger than envelope mass (5.400754) (218800) error: bondi() accreted mass (5.962354) larger than envelope mass (5.099944) (257882) error: bondi() accreted mass (9.910301) larger than envelope mass (8.742725) (301521) error: bondi() accreted mass (7.603677) larger than envelope mass (6.561131) (313873) error: bondi() accreted mass (5.856054) larger than envelope mass (5.343091) (321142) error: bondi() accreted mass (5.580022) larger than envelope mass (4.316905) (340643) error: bondi() accreted mass (9.476663) larger than envelope mass (8.257562) (3481) error: function Lzahbf(M,Mc) should not be called for HM stars error: function Lzahbf(M,Mc) should not be called for HM stars unexpected remnant case for K=5-6: 21404 What a waste! ID: 2363 · Rating: 0 · rate: / Reply Quote

Luigi R. Send message Joined: 10 Sep 15 Posts: 12 Credit: 20,067,933 RAC: 0	Message 2364 - Posted: 19 Sep 2017, 16:40:05 UTC - in response to Message 2363. Last modified: 19 Sep 2017, 16:40:25 UTC Delete this post. ID: 2364 · Rating: 0 · rate: / Reply Quote

nexiagsi16v Send message Joined: 28 Feb 15 Posts: 23 Credit: 42,229,680 RAC: 0	Message 2367 - Posted: 20 Sep 2017, 18:20:43 UTC 250h for nothing....this WU stands at 0,1%. WU http://universeathome.pl/universe/result.php?resultid=26112133 work packet http://universeathome.pl/universe/workunit.php?wuid=11376308 ID: 2367 · Rating: 0 · rate: / Reply Quote

hsdecalc Send message Joined: 2 Mar 15 Posts: 7 Credit: 4,296,304 RAC: 0	Message 2375 - Posted: 25 Sep 2017, 12:35:37 UTC I have regular endless tasks to. Win 10 PC which run 7/24. Aborting the last WU after 14 hours I found the process two days later still running in Taskmanager!!! So the process wasn´t canceled but removed from Boinc. I have 80 hours wasted time. So I can´t run no more WUs because of this bad behavior. ID: 2375 · Rating: 0 · rate: / Reply Quote