1) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5977)
Posted 5 Dec 2022 by Brummig
Post:
I'm sorry, but I simply don't buy the argument that a system with two processor cores idle that can copy the full set of U@H results files in the blink of an eye has a problem with super-overloaded disk I/O that results in massive delays saving half a dozen small files. If things were that bad, I would expect major problems with other tasks, such as logging in and doing the monthly software update (just consider the disk I/O load during that task). But the reality is everything is working normally except U@H, and all three Pies are responsive to logging in and to the command line. However, even the fastest machine on the planet will get stuck for five minutes writing five small files if the threads involved deadlock, and only the (five minute) thread wait timeout breaks that deadlock (probably long after the files were actually written).
2) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5975)
Posted 3 Dec 2022 by Brummig
Post:
All three Pies are set up with the same file system and SD card type and size. All three have their software upgraded once a month on the same day. All three are running U@H, Einstein, and WCG. Just recently I've let the two Pi 3's loose on Asteroids, but the erroring predates that.

I've tested write speed with the following command:
sudo dd if=/dev/zero of=/tmp/output bs=8k count=10k; sudo rm -f /tmp/output

The Pi 2 achieves around 135 MB/s, whilst the two Pi 3's achieve around 270 MB/s. The person in the linked post has a Pi 4. Obviously I know nothing about its file system, but the assumption would be it's faster still.

These Pies only upload at night, which has allowed me to copy an entire set of U@H results on one of the Pi 3's to another directory on the same SD card. It happened in the blink of an eye, which is hardly surprising as most of the files are very small, with one "large" file weighing in at 216731 bytes. Taking five minutes to write those files would be absurd.

To summarise:

Pi 2: No problems.
Pi 3: Problems.
Pi 3: Problems.
Pi 4: Problems.

I think the time it takes to write the files not the issue. When I see something that works just fine on a slower system, and then it starts occasionally locking up on some faster systems, my immediate thought is "thread deadlock".
3) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5972)
Posted 2 Dec 2022 by Brummig
Post:
So why does the even slower Pi 2, with more cores running BOINC, have no problems writing in a timely manner to the same type of SD card?
4) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5969)
Posted 1 Dec 2022 by Brummig
Post:
All three Pies use the same spec Sandisk SD card for storage, but only the Pi 3's have a problem (a Pi 4 in the case of that other thread). The error reports state that the "finish file" was written. Does that mean the files U@H uploads, or something else?
5) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5966)
Posted 30 Nov 2022 by Brummig
Post:
Interesting, but I'm not convinced and it doesn't give me a solution anyway. One of the two Pies affected (and the one erroring most frequently) runs BOINC with one of two cores, so reducing the number of running U@H tasks to two isn't an option (and the problem only occurs with U@H). Also I have a third Pi, and it's a lowly Pi 2 running BOINC on all four cores without any issues. All three have the same file system, and always have. Can the five minute time-out be increased, to see if that makes the problem go away?
6) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5959)
Posted 29 Nov 2022 by Brummig
Post:
This is still a problem...
7) Message boards : Number crunching : Varying task time (Message 5720)
Posted 25 Jul 2022 by Brummig
Post:
Well personally I would say your times are perfectly reasonable. All my hosts are fairly elderly, most are raspberry flavour, and all have a CPU core temperature limiter running on them (either TThrottle or my own script, https://universeathome.pl/universe/forum_thread.php?id=625). All reliably complete well within the U@H time limit.
8) Message boards : Number crunching : Varying task time (Message 5713)
Posted 21 Jul 2022 by Brummig
Post:
Also the brownie points are fixed at some multiple of the speed of a vinyl LP.
9) Message boards : Number crunching : Tasks Not Uploading (Message 5562)
Posted 19 May 2022 by Brummig
Post:
I hope they step on a LEGO. Repeatedly.

My vote would be for stepping on a British power plug; by comparison a Lego brick is child's play.

I've not been able to download any new tasks today, but on the positive side I only have another twenty bytes to upload on this host. It's been retrying for about four days, of course, but hopefully soon. That just leaves my other hosts; they're not doing so well.
10) Message boards : Number crunching : Project has no tasks available (Message 5556)
Posted 19 May 2022 by Brummig
Post:
The problem with your one Project a day idea is that just about the time you get all 300 of your pc's setup it's time to stop getting new tasks and abort the existing ones so you can get new tasks from the next Project

So penalise people who abort tasks (or simply don't return them), and make part of the game ensuring you have enough tasks, but not too many. That would also address task hoarding.

But feel free to come up with your own suggestions for rules that don't annoy those not involved in the competition.
11) Message boards : Number crunching : Tasks Not Uploading (Message 5555)
Posted 19 May 2022 by Brummig
Post:
@Claus-Heinrich Bruns: Es gab einen großen Wettbewerb von SETI Deutschland, der "BOINC Pentathlon" heißt. Es war sehr ärgerlich.
12) Message boards : Number crunching : Project has no tasks available (Message 5453)
Posted 16 May 2022 by Brummig
Post:
My earlier comment was prompted by the comment above it that said "The only benefit of option 2 is if you are a jerk and want to play the probabilities". It was intended to make people engaged in the competition think about their own actions, the effect they have on others, and what those others may be thinking as a result. I apologise if my analogy was not quite perfect enough for those trying to justify their actions with lengthy posts. As someone not engaged in the competition, I just see a two week long DDOS attack on an innocent internet service, and the whys and wherefores of that attack are irrelevant. Indeed, the people who never return a result are arguably less of a problem because at least they create a lesser overload of the available bandwidth.

Personally I certainly don't have a problem with people running competitions (and I would imagine that applies to other commenters on this thread and others who are are fed up with the effects of this competition), but how about playing nicely, and not creating chaos for two weeks? Why not limit the competition to one day per BOINC project, for example?
13) Message boards : Number crunching : Project has no tasks available (Message 5412)
Posted 13 May 2022 by Brummig
Post:
If a football team from half way round the globe with which you had no connection or interest decided to spend the next two weeks playing competition matches 24/7 on your front lawn, preventing you from leaving your house or playing football with your own children, what would you think of them?
14) Message boards : Number crunching : Server Thread (Message 5407)
Posted 13 May 2022 by Brummig
Post:
Zipping the files together would reduce the network overhead, and wouldn't cost a lot of spare cash :). Could that be done with reasonable ease?

I wish the people who want to play would find another game that doesn't cause problems for everyone.
15) Message boards : Number crunching : Pi Autoshifter (Message 5238)
Posted 30 Mar 2022 by Brummig
Post:
As a designer of aerospace electronics, I'm required to derate components against their maximum temperature, which in the case of semiconductors is the maximum junction temperature as specified by the manufacturer. For microprocessors Boeing requires a 20-30 °C derating, depending on how critical is the equipment. Commercial grade microprocessors don't go anywhere near 150 °C maximum junction temperature; the BCM2837 appears to be rated at 85 °C. Modern commercial electronics is typically pushed far closer to the temperature limits than is acceptable in aerospace applications, leading to short lifespans if a component spends most of its life at elevated temperature (of course much commercial electronics spends most of its life idling or completely unpowered). One can exceed the manufacturer's rating without the component immediately failing, but don't expect it to have a very long life. That's probably OK for some kinds of ordnance, but not a good idea if you want something to run 24/7 for many years. Don't forget also that one hot component can cause "hot-boxing", taking other components beyond their rated temperature.

Keeping the temperature down improves lifetime and reliability, and that's important to me as I am not one of those people who treats electronics as disposable. As a species, we have become far too wasteful of precious resources. I would prefer not to run the processor 24/7 at the safety limit, but you're free to do that if you like, or even turn off the limiting altogether if you so desire. I'm just offering a way to get the temperature down without fitting fans or massive heatsinks.

Regarding heat cycling, I did consider that and set the temperature limits in the code to minimise it. I could improve the situation with more sophisticated control, but following monitoring I'm happy with how it's working.

As I made clear in my OP, both Pi 3s have HATS. One has room for a heatsink about 3 mm high, whilst the other has to make do with a custom-made sheet of aluminium about 1 mm thick. Both Pies are in confined spaces. I did run some experiments with and without heatsinks, and with different levels of confinement. The heatsinks did make a difference, but not a big difference. Their performance did improve when the Pi was run out of a case, but that's not a practical solution. In the case of one of the Pies I was able to improve the ventilation. Maybe one day I'll find the time to make a better case. Probably not.

I wanted to automate the work limiting, as the Pi in the garden sees wide temperature swings, even during a single day. Whilst I was able to set the Pi to a frequency and have it stay there, setting the frequency dynamically just didn't work, and trying to make it work was wasting far too much time.
16) Message boards : Number crunching : Free DC (Message 5175)
Posted 16 Mar 2022 by Brummig
Post:
Have you ticked Do you consent to exporting your data to BOINC statistics aggregation Web sites? in your Universe@Home preferences?
17) Message boards : Number crunching : Pi Autoshifter (Message 5144)
Posted 6 Mar 2022 by Brummig
Post:
Yes, of course I know there are HATs with fans, but I can't fit them to these Pies due to the way these Pies are used for their primary purposes. Plus I don't want to use fans anyway, on account of the dust, noise, and reliability issues, and the increased power consumption. For me, one of the nice features of Pies is the lack of an obligatory fan.
18) Message boards : Number crunching : Pi Autoshifter (Message 5137)
Posted 5 Mar 2022 by Brummig
Post:
Previously on Universe@Home I wrote about how the USB cable used to feed power to a Raspberry Pi is critical if you want it to run at full speed: https://universeathome.pl/universe/forum_thread.php?id=615. So having replaced all my Pi power cables, was I a happy cruncher? Were my Pi 3's now outrunning my Pi 2? No, as it turns out there is another problem.

A Pi 2 can run four tasks night and day, and with no heatsink the CPU temperature will remain entirely reasonable. If you try to do the same on a Pi 3, even with the standard (small) heatsink that is often used, the CPU temperature will rise and rise until the Pi starts throttling in an attempt to not boil its own brains. However, the temperature at which it does this is rather alarming (over 80 °C). There is a strong relationship between temperature and the reliability and lifetime of electronic components, so this is bad. Of course the throttling also means the processor slows down, and by quite a lot. So much so, that a Pi 2 with all four cores running flat out may well end up outperforming a Pi 3 that is throttling itself.

I really didn't want the CPUs in my Pi 3's running at close to their maximum temperature rating, but adding more capable cooling is not an option as both of them wear HATs. However, one of them does live outside, where in the winter it can get pretty cold. To take advantage of that I needed a means of automatically adjusting the workload, preferably not installing additional software packages in order to do so. I couldn't get the CPU frequency to change reliably, so I gave up on that. There is a BOINC option to change the percentage of time for which each CPU works, but this works rather crudely, just turning the task on and off using a PWM-style approach. That is visually irritating if you use boinctui to monitor BOINC. So I opted to automatically adjust the percentage of cores used according to CPU temperature, taking care to play nicely with the method by which the GUI BOINC manager controls computing options. The script below is my solution, and is run by cron every fifteen minutes (*/15 * * * *) on all my three BOINC Pies. It has been working nicely for several months now.

For various reasons my Pies are tucked away in strange places. My Pi 2 lives in the airing cupboard, a few centimetres away from the (heavily insulated) hot water cylinder, so the ambient temperature is often elevated above room temperature. Every time I have checked on it I have found it was running all four cores, and it has an RAC on U@H of 1,602. Its current CPU temperature is 68 °C (four cores running).

One of my Pi 3's lives behind the sofa in the living room. It typically runs two cores, but occasionally it drops back to one core to keep the CPU at a reasonable temperature. It has it has an RAC on U@H of 1,360. Its current CPU temperature is 73 °C (two cores running).

My Pi 3 that lives outside mostly runs two cores, but when the ambient temperature drops below about 5 °C it makes more and more use of a third core. It it has an RAC on U@H of 1,664. Its current CPU temperature is 71 °C (two cores running, 8 °C ambient).

So as you can see, despite the slower cores the Pi 2 is comfortably outrunning the Pi 3 in the living room, and is currently a match for the Pi 3 outside. However, as we head through Spring, I fully expect the Pi 2 to start outrunning the Pi 3 in the garden. Those little heatsinks (that are not even needed on the Pi 2) really only buy you a few minutes in which to work the CPU flat out. Without substantial heatsinking, IMHO it's better to crunch using a Pi 2 than a Pi 3. I don't have a Pi 4 to test, but I would not be surprised to find the same issue.

boinc-autoshifter
#!/bin/bash

if ! [ $(id -u) = 0 ]; then
   echo "Must be run as root!"
   exit 1
fi

# Assume Bullseye...
VCGENCMD=/usr/bin/vcgencmd
if [ ! -f "$VCGENCMD" ];
then
  # Nope
  echo "Please upgrade me."
  VCGENCMD=/opt/vc/bin/vcgencmd
fi

if [ ! -f "/var/lib/boinc-client/global_prefs_override.xml" ];
then
  # Create overrides file

  echo "Creating computing preferences override of percentage of BOINC CPUs."
  echo -e "<global_preferences>\n   <max_ncpus_pct>50.000000</max_ncpus_pct>\n</global_preferences>" > /var/lib/boinc-client/global_prefs_override.xml
  chown boinc:boinc /var/lib/boinc-client/global_prefs_override.xml
fi

# Set variables to initial values

pcpus=$(grep -oPm1 "(?<=<max_ncpus_pct>)[^<]+" /var/lib/boinc-client/global_prefs_override.xml)
pcpus_new=$pcpus
pcpus_int=$(echo $pcpus | sed 's/\..*$//')
pcpus_trailer=$(echo $pcpus | sed 's/^[^.]*//')
source <($VCGENCMD measure_temp | sed "s/'C//g")

echo "Current temperature is: "$temp" °C"

# Change the number of cpus?

if awk "BEGIN {exit !($temp < 65)}";
then
  if awk "BEGIN {exit !($pcpus <= 75)}";
  then
    pcpus_new=$(expr $pcpus_int + 25)$pcpus_trailer
  else
    pcpus_new=100.000000
  fi
elif awk "BEGIN {exit !($temp > 75)}";
then
  if awk "BEGIN {exit !($pcpus > 25)}";
  then
    pcpus_new=$(expr $pcpus_int - 25)$pcpus_trailer
  else
    echo "Warning: overtemperature and unable to reduce BOINC load."
    pcpus_new=25.000000
  fi
fi

if awk "BEGIN {exit !($pcpus != $pcpus_new)}";
then
  echo "Switching percentage of BOINC CPUs from "$pcpus" to "$pcpus_new
  sed -i "s/<max_ncpus_pct>$pcpus<\/max_ncpus_pct>/<max_ncpus_pct>$pcpus_new<\/max_ncpus_pct>/g" /var/lib/boinc-client/global_prefs_override.xml
  boinccmd --read_global_prefs_override
fi
19) Message boards : Number crunching : Test (Message 5133)
Posted 23 Feb 2022 by Brummig
Post:
I'll go for (A)bort, because I just love being on an infinite spin cycle.
20) Message boards : Number crunching : rosetta@home on raspberry pis? (Message 5091)
Posted 16 Feb 2022 by Brummig
Post:
My Pies are currently set up with U@H, Einstein, and WCG, but I'm not getting any tasks from Einstein (apparently they are have a problem), and WCG is taking an extended break. In the past I have run Asteroids, but that has been dead for several months now, though they do have a Pi running as their place-holder web server.


Next 20




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek