Message boards :
Number crunching :
Long running work units
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 23 Mar 15 Posts: 3 Credit: 155,811,308 RAC: 0 |
This is what BM reports, but I not really believe in the numbers... Just see CPU usage... Yeah, the never finishing ones are a pain in the rear. Sad to say, but I abort anything with "0" or "10" in the third field on sight. Some of them do finish but some of them don't, and it is hard to tell which is which. Turning off "keep suspended in memory", suspending and resuming them resets them back to the last check point which is often days earlier. So, rather than wait for a day to find out, I just abort any "0" or "10" sequence work and hope someone with an AMD processor picks them up. |
Send message Joined: 10 Sep 15 Posts: 12 Credit: 20,067,933 RAC: 0 |
My 4770k runs task in ~4900s, but today I've got 6 tasks >17000s long. Yes, they are "_10_" tasks. http://universeathome.pl/universe/result.php?resultid=11152590 http://universeathome.pl/universe/result.php?resultid=11150897 http://universeathome.pl/universe/result.php?resultid=11255457 http://universeathome.pl/universe/result.php?resultid=11255023 http://universeathome.pl/universe/result.php?resultid=11152562 http://universeathome.pl/universe/result.php?resultid=11150887 Cpu usage is ~100%, I'm sure about this. Wingmen seem to run (those tasks) in their normal time. |
Send message Joined: 21 Feb 15 Posts: 46 Credit: 926,538,317 RAC: 0 |
Some of them do finish but some of them don't, and it is hard to tell which is which. Turning off "keep suspended in memory", suspending and resuming them resets them back to the last check point which is often days earlier. So, rather than wait for a day to find out, I just abort any "0" or "10" sequence work and hope someone with an AMD processor picks them up."0" WUs crash, "10" WUs running longer. But not all "0" on all CPUs und all OS crash. Mostly Win machines crunch "0" WUs normal. |
Send message Joined: 22 Feb 15 Posts: 23 Credit: 37,205,060 RAC: 0 |
My 4770k runs task in ~4900s, but today I've got 6 tasks >17000s long. Yes, they are "_10_" tasks. Looking at all those, I see a progression of WU's _5_ up to _10_ on all my setups. Currently, my 2P 24T setup has all those _10_ WU's y'all are abandoning I guess. So far, at 8 hours run time and they are about 68% complete. Should I abandon them? I think not... I'm here to help the project. However, it was and still is of some concern that the long tasks make the same points as the short tasks and that motivates folks to abandon them. For those only interested in points, I suppose that is to be expected. I KNOW there is a way to help compensate point-wise for long tasks... one only has to identify them and use a multiplier on the points, even for fixed point setups. Other projects do this... but their LONG tasks are identified ahead of time which is perhaps something that this project is unable to predict. Anyway, point production = electricity used in many peoples minds and I'm sure it would benefit the project to pic some average time breakpoints on a certain CPU (via FLOPS/Sec or something) and adjust point output using a simple multiple.. like 4 hours on a 3770 = 333, 4-7.9 - 666, 8-11.9 = 999 and so forth. In fact, one could simply use a Time(seconds)/FLOPS/Sec value like this: round((Time/FLOPs) /4) * 333 or something simple like that to determine points... Even use a fix CPU average like 3750 or 4750 for FLOPS so people could not cheat.. and use it on the fastest (lowest) time of two giving same points to both. (assumes Primary and Wingman) 8-) |
Send message Joined: 28 Feb 15 Posts: 253 Credit: 200,562,581 RAC: 0 |
"0" WUs crash, "10" WUs running longer. But not all "0" on all CPUs und all OS crash. Mostly Win machines crunch "0" WUs normal. I have just set up a Ubuntu 16.04 machine (i7-4790), and have completed 36 of the "0" without problems. Universe@Home 0.09 Universe BHspin universe_bh_366_0_20000_1-999999_929300_0 03:01:50 (02:29:01) 5/23/2016 6:03:08 AM 5/23/2016 6:46:11 AM 81.95 Reported: OK i7-4790-PC (LAN) But I have not gotten any of the "10" yet, so that will tell the story. |
Send message Joined: 4 Feb 15 Posts: 49 Credit: 15,956,546 RAC: 0 |
Over the last month I have found 3 of these "10" type work units on my Linux 64 bit computer that have run from over 17 to over 20 hours (over 65,000 to over 75,000 seconds), I have finished one today. Each has been partnered with a Windows 7 computer when this has happened. The Windows computers have all had normal run times. All have been paid same 333.33 points. I have only had 1 of the "0" type that I can find and it has had the shortest run time of any of my work units at 5,500 seconds. My normal run times are from 7,500 to 14,500 seconds for all other types. My Windows 32 bit computer has had fairly consistant run times from 13,000 to 24,000 seconds for all types. Other than that I have had no errors running BHSpin work units on either Linux or Windows. Conan |
Send message Joined: 28 Feb 15 Posts: 253 Credit: 200,562,581 RAC: 0 |
Looking through my BoincTask History, I find that I have completed twelve of the "10s" without problem in the past three days. They all ran around 7 hours 40 minutes on this i7-4790 machine. Again, this is with Ubuntu 16.04, so maybe there have been fixes to Linux? I have not used it before. universe_bh_368_10_20000_1-999999_229800_1 |
Send message Joined: 21 Feb 15 Posts: 46 Credit: 926,538,317 RAC: 0 |
As noted above, for example, AMD Bulldozer (Vishera, Steamroller, Excavator) based CPUs are running these in normal time (Linux). |
Send message Joined: 10 Sep 15 Posts: 12 Credit: 20,067,933 RAC: 0 |
Looking through my BoincTask History, I find that I have completed twelve of the "10s" without problem in the past three days. They all ran around 7 hours 40 minutes on this i7-4790 machine. Again, this is with Ubuntu 16.04, so maybe there have been fixes to Linux? I have not used it before. It doesn't look there is something wrong with them. They are only longer, without a proper credits' scaling. They don't go into error. I don't know if there could be a bug that causes useless loops or an operative system's inefficiency.. the run-time is ok (~100% of cpu-time). I think it would be wise to abort them before starting to compute, as we know there are other configurations (os+hw) that are not affected by this issue. |
Send message Joined: 2 Jun 16 Posts: 169 Credit: 317,253,046 RAC: 0 |
My 2P 2670 got another batch of about 35 tasks with 10 in the name that take 3x longer than normal. |
Send message Joined: 9 Nov 15 Posts: 6 Credit: 193,753,698 RAC: 0 |
My 2P 2670 got another batch of about 35 tasks with 10 in the name that take 3x longer than normal. 3x longer... then you are lucky; I have 1 machine where they take 9 - 10 times longer! Worst thing is they are credited same as short tasks! |
Send message Joined: 2 Jun 16 Posts: 169 Credit: 317,253,046 RAC: 0 |
Looks like BHSpin2 is also affected by these same long WUs. Guess I'll check those as well so pre-abort them. |