1) Message boards : Number crunching : extreme long wu's (Message 2577)
Posted 27 Jan 2018 by Tex1954
Post:
Does any admin actually read the message boards????

8-)
2) Message boards : Number crunching : extreme long wu's (Message 2573)
Posted 25 Jan 2018 by Tex1954
Post:
I've been getting a TON of over 3&4 days long WU's but they do complete.

Problem is, I still get the same 666.67 points for a 3&4 day long WU as I do for a 2 hour WU...

Umm, maybe something could be done to correct this points problem?

8-)
3) Message boards : Number crunching : Erroneous validations or incorrect CPU time? (Message 1676)
Posted 19 Oct 2016 by Tex1954
Post:
This is weird... I KNOW an i3-2120 is not 14 times faster than than an X5680... unless this is an AVX instruction thing....??????

http://universeathome.pl/universe/workunit.php?wuid=7360350

The short task 49522 computer has this in stderr.out:

<core_client_version>7.5.0</core_client_version>
<![CDATA[
<stderr_txt>
04:49:57 (4972): Can't set up shared mem: -1. Will run in standalone mode.
09:19:32 (1637): Can't set up shared mem: -1. Will run in standalone mode.
10:16:30 (7498): Can't set up shared mem: -1. Will run in standalone mode.
10:46:40 (7498): called boinc_finish(0)

</stderr_txt>
]]>

Soo, wondering why that Wu gets same credit... or is the run time wrong due to a restart or something?

I see this in a bunch of WU's with different other systems.... but the X5680 WU has clean stderr file...

8-)
4) Message boards : Number crunching : What do you guys make of this? (Message 1675)
Posted 19 Oct 2016 by Tex1954
Post:
Well, I don't do that and haven't had any problems with latest batch...

8-)

Thank you Krzysztof for your speedy reply & fix . Much appreciated.

All is well in the Universe once more ;)


@ Stiller Cruncher: Good luck in finding a fix.

@Tex1954: What happens if you tick the checkbox in Boinc Manager preferences called "Skip image file verification"? Of course this usually not a good idea and a proper fix would be best but in your case maybe worth a shot? It's just a last resort if no fix is found. Try it on a few tasks and see what happens.
5) Message boards : Number crunching : What do you guys make of this? (Message 1601)
Posted 30 Sep 2016 by Tex1954
Post:
What about all the download errors?

Lin-1240V2-1

5044 Universe@Home 9/30/2016 8:09:03 AM [error] MD5 check failed for universe_bh2_160803_9_10_20000_1-999999_679000
5045 Universe@Home 9/30/2016 8:09:03 AM [error] expected 61877bac4fe69388c25e46ce9a594f35, got 8e2d98ca66085d450811579c5958518b
5046 Universe@Home 9/30/2016 8:09:03 AM [error] Checksum or signature error for universe_bh2_160803_9_10_20000_1-999999_679000
5047 Universe@Home 9/30/2016 8:09:03 AM [error] MD5 check failed for universe_bh2_160803_9_10_20000_1-999999_684000
5048 Universe@Home 9/30/2016 8:09:03 AM [error] expected 1627d98f2d66bd1a9382c1f2cacb6c53, got 64ae30ef49df63871fca168c107238a4
5049 Universe@Home 9/30/2016 8:09:03 AM [error] Checksum or signature error for universe_bh2_160803_9_10_20000_1-999999_684000

Reboot/Reset don't fix problem... Seems only on Linux setups..

8-)
6) Message boards : Number crunching : Computer not receiving Work units (Message 1526)
Posted 6 Sep 2016 by Tex1954
Post:
I can't get WU's either on any machine even if I reset the project...

440 Universe@Home 9/5/2016 11:54:27 PM Resetting project
441 Universe@Home 9/5/2016 11:54:33 PM Master file download succeeded
442 Universe@Home 9/5/2016 11:54:38 PM Sending scheduler request: To fetch work.
443 Universe@Home 9/5/2016 11:54:38 PM Requesting new tasks for CPU
444 Universe@Home 9/5/2016 11:54:40 PM Scheduler request completed: got 0 new tasks
445 Universe@Home 9/5/2016 11:54:40 PM No tasks sent


2059 Universe@Home 9/5/2016 11:50:56 PM Requesting new tasks for CPU
2060 Universe@Home 9/5/2016 11:50:58 PM Scheduler request completed: got 0 new tasks
2061 Universe@Home 9/5/2016 11:50:58 PM No tasks sent
2062 Universe@Home 9/5/2016 11:51:13 PM Resetting project
2063 Universe@Home 9/5/2016 11:51:16 PM Master file download succeeded
2064 Universe@Home 9/5/2016 11:51:22 PM Sending scheduler request: To fetch work.
2065 Universe@Home 9/5/2016 11:51:22 PM Requesting new tasks for CPU
2066 Universe@Home 9/5/2016 11:51:25 PM Scheduler request completed: got 0 new tasks
2067 Universe@Home 9/5/2016 11:51:25 PM No tasks sent

Yikes!

8-)
7) Message boards : Number crunching : Long running work units (Message 1191)
Posted 24 May 2016 by Tex1954
Post:
My 4770k runs task in ~4900s, but today I've got 6 tasks >17000s long. Yes, they are "_10_" tasks.

Cpu usage is ~100%, I'm sure about this.

Wingmen seem to run (those tasks) in their normal time.


Looking at all those, I see a progression of WU's _5_ up to _10_ on all my setups. Currently, my 2P 24T setup has all those _10_ WU's y'all are abandoning I guess. So far, at 8 hours run time and they are about 68% complete.

Should I abandon them? I think not... I'm here to help the project.

However, it was and still is of some concern that the long tasks make the same points as the short tasks and that motivates folks to abandon them. For those only interested in points, I suppose that is to be expected.

I KNOW there is a way to help compensate point-wise for long tasks... one only has to identify them and use a multiplier on the points, even for fixed point setups. Other projects do this... but their LONG tasks are identified ahead of time which is perhaps something that this project is unable to predict.

Anyway, point production = electricity used in many peoples minds and I'm sure it would benefit the project to pic some average time breakpoints on a certain CPU (via FLOPS/Sec or something) and adjust point output using a simple multiple.. like 4 hours on a 3770 = 333, 4-7.9 - 666, 8-11.9 = 999 and so forth.

In fact, one could simply use a Time(seconds)/FLOPS/Sec value like this:

round((Time/FLOPs) /4) * 333

or something simple like that to determine points... Even use a fix CPU average like 3750 or 4750 for FLOPS so people could not cheat.. and use it on the fastest (lowest) time of two giving same points to both. (assumes Primary and Wingman)

8-)
8) Message boards : Number crunching : Long running work units (Message 1165)
Posted 2 May 2016 by Tex1954
Post:
This is what BM reports, but I not really believe in the numbers... Just see CPU usage...


CPU usage is normal, as in 98% or higher. I see no problems in the longer running WU's, only that they run longer... excepting the ones that never finish! Hope you can figure those out.

8-)
9) Message boards : Number crunching : Long running work units (Message 1163)
Posted 2 May 2016 by Tex1954
Post:
Well, hope it helps. One other thing I notice (like others mentioned) is when the task properties are viewed, they report only 0.06 GFLOPS/Sec???? If the tasks are not doing much math, that is why credit new is so flaky.

Computer: Linux-DX5680
Project Universe@Home

Name universe_bh_362_16_20000_1-999999_335600_1

Application Universe BHspin 0.09
Workunit name universe_bh_362_16_20000_1-999999_335600
State Running
Received 5/1/2016 10:50:38 PM
Report deadline 5/15/2016 10:50:23 PM
Estimated app speed 0.06 GFLOPs/sec
Estimated task size 807 GFLOPs
CPU time at last checkpoint 01:36:19
CPU time 01:42:02
Elapsed time 01:42:53
Estimated time remaining 02:20:58
Fraction done 42.160%
Virtual memory size 12.80 MB
Working set size 3.46 MB
Directory slots/1
Process ID 2447



This is on a task near completion. If your WU's are not doing a lot of math, what are they doing? They use less than 4 MB of ram... just curious...

8-)
10) Message boards : Number crunching : Long running work units (Message 1161)
Posted 2 May 2016 by Tex1954
Post:
2p setup just finished a couple dozen 12+ hours tasks like this one... and only 333 points?

http://universeathome.pl/universe/workunit.php?wuid=4907485

I think even Credit New would give more points... but all in the same boat I guess...

8-)
11) Message boards : Number crunching : Long running work units (Message 1159)
Posted 1 May 2016 by Tex1954
Post:
I have now 4 long ones on two computers (total 8) that have run for 6 days on one setup and over 1 day on the other setup. Both are running Linux. I let them run because I was curious... I'll abort them all after I report what is in the slots directories now...

Both setups are server grade with E3-1230V3 and an E3-1240V2 CPU's. Both run Linux Mint 17.3 .

In any case, I checked the slot files and nothing is changing except the error.dat files some stuff...

error.dat =
error: function Lzahbf(M,Mc) should not be called for HM stars
error: function Lzahbf(M,Mc) should not be called for HM stars
unexpected remnant case for K=5-6: 254568

error.dat2 =
error: bondi() accreted mass (2.801104) larger than envelope mass (2.786657) (190181)

error.dat3 =
error: bondi() accreted mass (2.801104) larger than envelope mass (2.786657) (190181)

The boinc_mmap_file has some binary junk in it...

Nothing in the stderr.txt file.

log.txt =
00:00:00 00:00:00 PROGRAM START: Sun Apr 24 23:43:52 2016
00:00:00 00:00:00 no checkpoint.dat file found00:00:00 00:00:00 cleaning checkpoints
00:00:00 00:00:00 gw_cpfile: source file "data0.dat2" not present
00:00:00 00:00:00 gw_cpfile: source file "data1.dat2" not present
00:00:00 00:00:00 gw_cpfile: source file "data2.dat2" not present
00:00:00 00:00:00 gw_cpfile: source file "error.dat2" not present
00:00:00 00:00:00 reading checkpoint: istart: -1; pp: 0; n: -1
00:00:00 00:00:00 checkpoint read
00:00:00 00:00:00 default values set
00:00:00 00:00:00 Reading param.in file
00:00:00 00:00:00 PARAMIN: num_tested = 20000
00:00:00 00:00:00 PARAMIN: hub_val = 1000
00:00:00 00:00:00 PARAMIN: idum = -943500
00:00:00 00:00:00 PARAMIN: OUTPUT = 3
00:00:00 00:00:00 PARAMIN: Sal = -2.3
00:00:00 00:00:00 PARAMIN: Mmina = 5.0
00:00:00 00:00:00 PARAMIN: Mminb = 3.0
00:00:00 00:00:00 PARAMIN: Fa = 1
00:00:00 00:00:00 PARAMIN: ZZ = 0.0001
00:00:00 00:00:00 param.in file read
00:00:00 00:00:00 idum: -943500; num_tested: 20000
00:03:53 00:03:53 making checkpoint: j: 1000; iidd: 118910
00:03:53 00:00:00 gw_cpfile: data0.dat appended to data0.dat2
00:03:53 00:00:00 gw_cpfile: data1.dat appended to data1.dat2
00:03:53 00:00:00 gw_cpfile: data2.dat appended to data2.dat2
00:03:53 00:00:00 gw_cpfile: error.dat appended to error.dat2
00:03:53 00:00:00 gw_cpfile: data0.dat appended to data0.dat3
00:03:53 00:00:00 gw_cpfile: data1.dat appended to data1.dat3
00:03:53 00:00:00 gw_cpfile: data2.dat appended to data2.dat3
00:03:53 00:00:00 gw_cpfile: error.dat appended to error.dat3
00:07:50 00:03:57 making checkpoint: j: 2000; iidd: 242302
00:07:50 00:00:00 gw_cpfile: data0.dat appended to data0.dat2
00:07:50 00:00:00 gw_cpfile: data1.dat appended to data1.dat2
00:07:50 00:00:00 gw_cpfile: data2.dat appended to data2.dat2
00:07:50 00:00:00 gw_cpfile: error.dat appended to error.dat2
00:07:50 00:00:00 gw_cpfile: data0.dat appended to data0.dat3
00:07:50 00:00:00 gw_cpfile: data1.dat appended to data1.dat3
00:07:50 00:00:00 gw_cpfile: data2.dat appended to data2.dat3
00:07:50 00:00:00 gw_cpfile: error.dat appended to error.dat3
00:00:00 00:00:00 PROGRAM START: Sun May 1 06:16:53 2016
00:00:00 00:00:00 reading checkpoint: istart: 2000; pp: 242302; n: 2
00:00:00 00:00:00 checkpoint read
00:00:00 00:00:00 default values set
00:00:00 00:00:00 Reading param.in file
00:00:00 00:00:00 PARAMIN: num_tested = 20000
00:00:00 00:00:00 PARAMIN: hub_val = 1000
00:00:00 00:00:00 PARAMIN: idum = -943500
00:00:00 00:00:00 PARAMIN: OUTPUT = 3
00:00:00 00:00:00 PARAMIN: Sal = -2.3
00:00:00 00:00:00 PARAMIN: Mmina = 5.0
00:00:00 00:00:00 PARAMIN: Mminb = 3.0
00:00:00 00:00:00 PARAMIN: Fa = 1
00:00:00 00:00:00 PARAMIN: ZZ = 0.0001
00:00:00 00:00:00 param.in file read
00:00:00 00:00:00 idum: -943500; num_tested: 20000
00:00:00 00:00:00 random number generator initialised: 242302
00:00:00 00:00:00 PROGRAM START: Sun May 1 06:28:15 2016
00:00:01 00:00:01 reading checkpoint: istart: 2000; pp: 242302; n: 2
00:00:01 00:00:00 checkpoint read
00:00:01 00:00:00 default values set
00:00:01 00:00:00 Reading param.in file
00:00:01 00:00:00 PARAMIN: num_tested = 20000
00:00:01 00:00:00 PARAMIN: hub_val = 1000
00:00:01 00:00:00 PARAMIN: idum = -943500
00:00:01 00:00:00 PARAMIN: OUTPUT = 3
00:00:01 00:00:00 PARAMIN: Sal = -2.3
00:00:01 00:00:00 PARAMIN: Mmina = 5.0
00:00:01 00:00:00 PARAMIN: Mminb = 3.0
00:00:01 00:00:00 PARAMIN: Fa = 1
00:00:01 00:00:00 PARAMIN: ZZ = 0.0001
00:00:01 00:00:00 param.in file read
00:00:01 00:00:00 idum: -943500; num_tested: 20000
00:00:01 00:00:00 random number generator initialised: 242302
00:00:00 00:00:00 PROGRAM START: Sun May 1 06:57:59 2016
00:00:00 00:00:00 reading checkpoint: istart: 2000; pp: 242302; n: 2
00:00:00 00:00:00 checkpoint read
00:00:00 00:00:00 default values set
00:00:00 00:00:00 Reading param.in file
00:00:00 00:00:00 PARAMIN: num_tested = 20000
00:00:00 00:00:00 PARAMIN: hub_val = 1000
00:00:00 00:00:00 PARAMIN: idum = -943500
00:00:00 00:00:00 PARAMIN: OUTPUT = 3
00:00:00 00:00:00 PARAMIN: Sal = -2.3
00:00:00 00:00:00 PARAMIN: Mmina = 5.0
00:00:00 00:00:00 PARAMIN: Mminb = 3.0
00:00:00 00:00:00 PARAMIN: Fa = 1
00:00:00 00:00:00 PARAMIN: ZZ = 0.0001
00:00:00 00:00:00 param.in file read
00:00:00 00:00:00 idum: -943500; num_tested: 20000
00:00:00 00:00:00 random number generator initialised: 242302

This task is one of four on this setup running over 6 days. I've restarted it a couple times to see if something changes... seems the completion percent is moving up...

8-)

12) Message boards : Number crunching : Long running work units (Message 1150)
Posted 21 Apr 2016 by Tex1954
Post:
I have observed the same thing. It seems AMD runs faster on these "10" tasks but also some i3 and i5 CPU's seem to go faster. It's very interesting that I can find a minor correlation between the OS (Win vs. Linux). I see the same thing on my E3-1240V2 CPU's as I do on the 2p X5680 setup.

However, they ALL get the same 333 points... how weird...

In many cases, the i3/i5 CPU's that significantly do better are running windows instead of Linux, so some compiler thing may be going on as well...

Tis for the developers to clear this up I think...

:D
13) Message boards : Number crunching : Excessively Long Estimated Finish Times (Message 997)
Posted 31 Dec 2015 by Tex1954
Post:
Everything was fine on all my setups until a couple days ago... I don't think there is anything I could do on my end...

Well, we will see how things go and I will upgrade to 7.6.22 and see if that does anything...

8-)
14) Message boards : Number crunching : Something is messy on the project prefs page (Message 993)
Posted 30 Dec 2015 by Tex1954
Post:
Well, I guess it's okay then... At least I understand it better now. Seems my setups use that override file anyway. This question came up with another project that could not read the globals or didn't have them setup as well.

Thanks!

8-)
15) Message boards : Number crunching : Excessively Long Estimated Finish Times (Message 992)
Posted 30 Dec 2015 by Tex1954
Post:
Okay, I'm getting a LOT of tasks that have estimated completion times in DAYS rather than hours like these:



This has the effect of telling the BOINC client that a 1-day cache is already FULL and it won't fetch any more work from ANY project if all the cores are loaded up with same. Believe me, I have a 24-core setup that STOPPED fetching ANY new WU's from ANY project because it had 24 of these 4-Day long tasks loaded up.

The REALLY BAD THING is the tasks complete in no more than 4.8 hours!!! I've scanned hundreds of my tasks and the LONGEST I could find took 17334 seconds.

Sooo, for whatever reason, something has changed in the estimated completion times and it isn't on my end. This is totally screwing up my setups, especially those that have trouble getting WU's from other projects that have FEW WU's to go around. As the tasks near completion, naturally the estimated time decreases, but very slowly until about 4 hours done.



Meanwhile, BOINC thinks my 1-Day cache is full... what a lie...

Please fix this.

8-)

One setup has 62 of these 4-day long tasks in the cache... guess it won't be running anything else for years... sheesh...
16) Message boards : Number crunching : Something is messy on the project prefs page (Message 988)
Posted 29 Dec 2015 by Tex1954
Post:
I'm not sure about all this preferences setting where one project affects them all. I understand the idea, but there is NO reason I can think of where a CPU running several projects has to use the SAME local preferences for EACH project.. Also, I am not 100% sure which preferences are considered GLOBAL and which LOCAL.

Suffice it to say, I don't think it wise for ANY project to mess with any other project settings under ANY circumstances except the truly global ones such as CPU's and CPU % and time between switching tasks and the disk use parameters. Certainly it should NOT share project preferences at all.

Somewhere along the road this project insisted that my systems were setup to use WORK location preferences (which are project specific) while my actual location was DEFAULT for this project. It is still doing that!!!!

So, to perhaps circumvent the error, I created a WORK preference and now the error messages seem to be stopped..

This really still needs to be fixed.

8-)
17) Message boards : Number crunching : upload problem (Message 987)
Posted 29 Dec 2015 by Tex1954
Post:
Same here... keep getting this when it tries to report the 14 completed tasks::::

Win7-5930K

819 Universe@Home 12/28/2015 11:23:12 PM Reporting 14 completed tasks
820 Universe@Home 12/28/2015 11:23:12 PM Requesting new tasks for CPU
821 Universe@Home 12/28/2015 11:23:17 PM [error] Can't parse task in scheduler reply: unexpected XML tag or syntax
822 Universe@Home 12/28/2015 11:23:17 PM [error] No close tag in scheduler reply
825 Universe@Home 12/28/2015 11:24:02 PM Reporting 14 completed tasks
826 Universe@Home 12/28/2015 11:24:02 PM Requesting new tasks for CPU
827 Universe@Home 12/28/2015 11:24:06 PM [error] Can't parse task in scheduler reply: unexpected XML tag or syntax
828 Universe@Home 12/28/2015 11:24:06 PM [error] No close tag in scheduler reply
18) Message boards : Number crunching : No New Work (Message 986)
Posted 28 Dec 2015 by Tex1954
Post:
Same here, get some tasks once in a while, server shows tons ready to send, but I can't get any. Tried Project Resets and everything else I could think of... no joy...

Seems the server status isn't being updated in real time if at all...

8-)

PS: Getting some of this lately... maybe that why? Sometimes detach/attach cures problem for a while, then it mess up again...

Win7-5930K

58 Universe@Home 12/28/2015 4:47:41 PM Sending scheduler request: To fetch work.
59 Universe@Home 12/28/2015 4:47:41 PM Requesting new tasks for CPU
60 Universe@Home 12/28/2015 4:47:46 PM [error] Can't parse workunit in scheduler reply: unexpected XML tag or syntax
61 Universe@Home 12/28/2015 4:47:46 PM [error] No close tag in scheduler reply
19) Message boards : Number crunching : i need help (Message 925)
Posted 16 Dec 2015 by Tex1954
Post:
Getting probs again....

Had to re-attach again just to report completed tasks, then got about 7 compute errors, only 2 tasks running now.

Win7-DX5680

209 Universe@Home 12/15/2015 9:29:04 PM Sending scheduler request: To fetch work.
210 Universe@Home 12/15/2015 9:29:04 PM Requesting new tasks for CPU
211 Universe@Home 12/15/2015 9:29:06 PM Scheduler request failed: HTTP internal server error


8-)
20) Message boards : Number crunching : i need help (Message 908)
Posted 13 Dec 2015 by Tex1954
Post:
Getting more problems again on 2P and 1P servers.....

Win7-DX5680

1389 Universe@Home 12/13/2015 2:59:03 AM Sending scheduler request: To fetch work.
1390 Universe@Home 12/13/2015 2:59:03 AM Requesting new tasks for CPU and NVIDIA GPU
1391 Universe@Home 12/13/2015 2:59:08 AM [error] Can't parse file info in scheduler reply: unexpected XML tag or syntax
1392 Universe@Home 12/13/2015 2:59:08 AM [error] No close tag in scheduler reply
1393 Universe@Home 12/13/2015 4:16:02 AM Sending scheduler request: To fetch work.
1394 Universe@Home 12/13/2015 4:16:02 AM Requesting new tasks for CPU and NVIDIA GPU
1395 Universe@Home 12/13/2015 4:16:13 AM [error] Can't parse file info in scheduler reply: unexpected XML tag or syntax
1396 Universe@Home 12/13/2015 4:16:13 AM [error] No close tag in scheduler reply

and this on another 1P server...

Win7-1230V21

1360 Universe@Home 12/13/2015 4:16:01 AM Sending scheduler request: Requested by user.
1361 Universe@Home 12/13/2015 4:16:01 AM Requesting new tasks for CPU
1362 Universe@Home 12/13/2015 4:16:02 AM Scheduler request failed: HTTP internal server error
1363 Universe@Home 12/13/2015 4:19:40 AM Sending scheduler request: To fetch work.
1364 Universe@Home 12/13/2015 4:19:40 AM Requesting new tasks for CPU
1365 Universe@Home 12/13/2015 4:19:49 AM Scheduler request failed: HTTP internal server error


Oh well, seems it isn't "MY" problem... I hope...

8-)

PS:
Seems the 1P has started working again, 2P still hung up...


Next 20




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek