61) Questions and Answers : Getting started : What is happening? (Message 6050)
Posted 18 Mar 2023 by Grant (SSSF)
Post:
Hmm, thats odd because I had thought my Mac has been returning results for the last several weeks - I may have been misreading it but there was no problem raised when I added the project (unlike LHC which requires VirtualBox).
Looking at your systems, your Linux & Windows systems are returning work, but your Darwin system which was attached to Rosetta on Feb 19 hasn't done a single Task.

You can add a Project, even if it doesn't have an application for that system's OS, it just won't be able to get any work. Go to the BOINC Manger advanced view and bring up the Event Log (Tools Tab, Event Log).
Go to the Projects Tab, select Universe & click on Update. See what messages come up in the Event Log.
62) Message boards : Number crunching : Universe Disappeared (Message 6046)
Posted 18 Mar 2023 by Grant (SSSF)
Post:
You now show two Ryzen 7 5700G systems- one with Windows 10, the other with Windows 11.
Looks like you upgraded the OS on your system, and upgraded the BOINC Manager, and for some reason the system got a new ID number- Created 18 Mar 2023, 4:44:57 UTC


Did you upgrade the OS 3 days ago? Because that's the last time the Win10 system contacted the Universe server.
63) Message boards : Number crunching : Universe Disappeared (Message 6043)
Posted 17 Mar 2023 by Grant (SSSF)
Post:
Tried to add Universe, but got another message saying that the project could not be added.
That would be because of the old BOINC version you have.
Too late to edit- or if you have upgraded BOINC to the current version then it's an indication of a configuration issue with your computer/modem/isp (in order of most likely cause) that is blocking access to the Universe servers.
64) Message boards : Number crunching : Universe Disappeared (Message 6042)
Posted 17 Mar 2023 by Grant (SSSF)
Post:
Tried to add Universe, but got another message saying that the project could not be added.
That would be because of the old BOINC version you have.
65) Questions and Answers : Getting started : What is happening? (Message 6039)
Posted 17 Mar 2023 by Grant (SSSF)
Post:
Looking at the Applications page, there doesn't appear to be one for Apple OS, just Windows & LINUX.
66) Message boards : Number crunching : Server Thread (Message 6033)
Posted 15 Mar 2023 by Grant (SSSF)
Post:
No problems here at all since my earlier post outage post.


Exit BOINC, wait a while, then restart it.
Check the Event log to see what messages are there.

No recent changes to your AV software? These days they often also take over the firewall settings.
You have an older version of BOINC there, if there's nothing obvious in the Event log, try upgrading to the current BOINC version.
67) Message boards : Number crunching : Server Thread (Message 6027)
Posted 13 Mar 2023 by Grant (SSSF)
Post:
Thanks for the update.

Things seem to have settled down now; uploads, downloads & Scheduler requests all going through OK without need of manual intervention.
68) Message boards : Number crunching : Server Thread (Message 6025)
Posted 13 Mar 2023 by Grant (SSSF)
Post:
We seem to be back after an extended break.
Forums are sluggish, web site extremely slow to respond. And uploads & downloads are taking lots of Retrys to get through, but they are slowly clearing. Scheduler response is slow, but not too bad compared to everything else.
69) Message boards : Number crunching : Running Universe@Home - progress stuck (Message 6022)
Posted 3 Mar 2023 by Grant (SSSF)
Post:
Likewise- i've never had Universe Task stall.
In the past on other projects if i've had a Task stall for longer than it's expected runtime, i'd exit BOINC, give it 20-30 seconds & then restart. If the Task gets stuck again, then i'd abort it.
Once or twice the Task has completed normally, but most times it's had to be aborted.
70) Message boards : Number crunching : Server Thread (Message 6015)
Posted 10 Feb 2023 by Grant (SSSF)
Post:
Thanks, nice to know what's happening.
71) Message boards : Number crunching : Server Thread (Message 6013)
Posted 10 Feb 2023 by Grant (SSSF)
Post:
But, there is also a good news, that I have delivered another server to CAMK this morning and the machine will replace our old, main server in January.
Wondering if the main server has been upgraded yet or not?
72) Message boards : Number crunching : No new tasks for one of my machines? (Message 6010)
Posted 25 Jan 2023 by Grant (SSSF)
Post:
Glad to hear it was an easy fix.
73) Message boards : Number crunching : No new tasks for one of my machines? (Message 6008)
Posted 25 Jan 2023 by Grant (SSSF)
Post:
Don't know how you would do it on a headless unit, but the BOINC Banager has the option to view the Event log, which shows the messages generated when contacting the Scheduler (ie requesting work, not requesting work as cache is full etc), along with other BOINC manager activity.
74) Message boards : News : Server room maintenance break (Message 5996)
Posted 3 Jan 2023 by Grant (SSSF)
Post:
Looks like we're back from the outage, but the feeder isn't running (along with quite a few other processes), so it's not possible to even report completed work at this stage.
Hopefully the rest of the processes will be up & running soon.
75) Message boards : News : Server room maintenance break (Message 5993)
Posted 1 Jan 2023 by Grant (SSSF)
Post:
Thanks for the early notice.
76) Questions and Answers : Windows : Cannot connect to project... (Message 5990)
Posted 30 Dec 2022 by Grant (SSSF)
Post:
From memory you need the latest (or next to latest) BOINC version to attach to the project using the manager & select the project, otherwise it's a case of entering the project address manually.
I vaguely remember i had to upgrade my BOINC Manager to the current version at the time to be able to select the project and make it stick.
77) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5976)
Posted 3 Dec 2022 by Grant (SSSF)
Post:
I've tested write speed with the following command:
sudo dd if=/dev/zero of=/tmp/output bs=8k count=10k; sudo rm -f /tmp/output

The Pi 2 achieves around 135 MB/s, whilst the two Pi 3's achieve around 270 MB/s. The person in the linked post has a Pi 4. Obviously I know nothing about its file system, but the assumption would be it's faster still.
What about random I/O with increasing loads?


I think the time it takes to write the files not the issue. When I see something that works just fine on a slower system, and then it starts occasionally locking up on some faster systems, my immediate thought is "thread deadlock".
The issue is that the OS is not completing the writes within 5 minutes, and reporting that to the programme.


What it could come down to is that the CPU just isn't up to the load involved with trying to perform that many writes at once- the faster ta storage device is, then the greater the load on the CPU to keep the data coming. If the write cache was larger, while it would briefly increase the CPU still further, it should allow the writes to complete, and then be reported back tothe programme as being completed, in less than 5 minutes.
And with Universe producing 5 Result files for every Task completed, it's write load when a Task is completed is 5 times higher than projects that produce only a single result file. And keep in mind that not only are there the result file to write, but the OS would also be doing writes updating it's file system records as well- and that would be close to 5 times greater than with other projects as well.
But if the systems with the issue have faster CPUs, then there must be some other difference between the good and problem systems that results in the I/O bottleneck.


In the end it could just come down to the fact that the system just isn't up to the loads produced by Universe when a Task completes if the system is also doing other (relatively) significant disk I/O at the same time.
78) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5974)
Posted 2 Dec 2022 by Grant (SSSF)
Post:
Are the same projects running on the Pi2 and Pi3? If the same then damaged micro SDHC card on the Pi3 or something wrong with the OS and different from the Pi2.
Yep- even if the hardware is the same, the BIOS is configured the same (eg the huge difference between AHCI & IDE for SATA drive perfromance), the OS is the same the Projects are the same & the Tasks being run are the same, are each system configured in exactly the same way?
Are the file systems the same? Is write caching enabled on the systems that are having issues? What is the size of the read and write caches on the good & bad systems? Are the running processes between both the good & bad systems the same?

You could monitor the disk I/O on the good and bad systems and compare the activity. Run some disk benchmarks to see how the different systems perform.
79) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5970)
Posted 2 Dec 2022 by Grant (SSSF)
Post:
All three Pies use the same spec Sandisk SD card for storage, but only the Pi 3's have a problem (a Pi 4 in the case of that other thread). The error reports state that the "finish file" was written. Does that mean the files U@H uploads, or something else?
Yes, the result files that are produced when the Task is finished get returned to the Univerae@home project. But to return them, they have to be saved to disk first.

This is the part that's causing the problem-
Process still present 5 min after writing finish file;
The Universe application has saved the result files to disk- that it has asked the Operating System to save those files, but once they have actually been saved the OS should then report back to the programme that the files have been saved and then the Task and all it's associated processes can be finalised & then end normally.

But with the storage bottleneck, even though the file might have been written, the Universe application still hasn't received confirmation of that from the OS after 5 min, so the BOINC Manage clobbers it thinking that there's an issue with that Task.

The problem is all due to the system the work is running on- for whatever reason, there is a significant bottleneck with the file I/O, and that is the cause of the Tasks failing. After 5mins, the programme hasn't received confirmation back from the OS that the files have been saved.
80) Message boards : Number crunching : Process still present 5 min after writing finish file (Message 5968)
Posted 1 Dec 2022 by Grant (SSSF)
Post:
All three have the same file system, and always have. Can the five minute time-out be increased, to see if that makes the problem go away?
The problem is the storage I/O bottleneck.
A single Universe task produces 5 result files, most other projects just one. So that results in a lot of disk I/O. If it's not completed in 5 minutes, giving it more time really isn't a good idea.

The issue usually only occurred with very high core/thread systems and older HDD storage.
So for it to occur on a system with only a couple of cores indicates an extremely slow storage device. Either a faster storage device (in particular random I/O performance), or more system RAM allocated to disk write caching (if available) may help.


Previous 20 · Next 20




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek