Message boards : Number crunching : Server Thread
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next

AuthorMessage
Profile Tom M

Send message
Joined: 18 Jul 17
Posts: 123
Credit: 1,017,526,950
RAC: 3,617,473
Message 5389 - Posted: 12 May 2022, 5:01:57 UTC

Uploads seem to be running very slowly or backed up. Not seeing any "project backoff".

Tom M
A proud member of the OFA (Old Farts Assoc.)
ID: 5389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 23 Apr 22
Posts: 133
Credit: 56,533,333
RAC: 104,642
Message 5392 - Posted: 12 May 2022, 5:31:47 UTC

Yep, we're back to uploads & downloads not happening and Scheduler requests timing out, failing to receive data from the peer, couldn't connect to server etc, etc responses again.
Makes a change from the "Project has no Tasks available" responses, even when it's got millions.


Looking at the graphs, it's probably a result of all the tasks that people have been hoarding now being dumped back on to the server.
Grant
Darwin NT
ID: 5392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
doug

Send message
Joined: 13 Mar 22
Posts: 4
Credit: 585,333
RAC: 0
Message 5394 - Posted: 12 May 2022, 14:52:23 UTC

I have to think that the project admins here had no idea what the actual load on the server would be when they agreed to this pentathlon. I finally got (after close to 2 days of trying) 1 task in the early hours this morning, which finished fine. It's been trying to upload, and it can't even upload a 20 byte file!

Oh well, even if a lot of regulars are pissed, at least the project is getting a lot of work done. I hope.
ID: 5394 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 832
Credit: 144,003,798
RAC: 1
Message 5396 - Posted: 12 May 2022, 15:27:36 UTC

In last 24h server get:
Success finished tasks: 1'171'604
Tasks with computation error (and others errors) 17'335
Tasks where host didn't contact with server 2'520
Validation errors 67
Orphaned tasks 2160

I didn't expect that this numbers are even possible on our server.
Now we have some problems with disks subsystem (even SSD's are bit to slow to manage this quantity of files).
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 5396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 299
Credit: 4,131,262,700
RAC: 4,667,293
Message 5397 - Posted: 12 May 2022, 16:05:44 UTC - in response to Message 5389.  

Uploads seem to be running very slowly or backed up. Not seeing any "project backoff".

Tom M

I'm seeing "project backoff" on each of many hundreds of uploads.

A proud member of the OFA (Old Farts Association)
ID: 5397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom M

Send message
Joined: 18 Jul 17
Posts: 123
Credit: 1,017,526,950
RAC: 3,617,473
Message 5399 - Posted: 12 May 2022, 18:40:07 UTC - in response to Message 5397.  

Uploads seem to be running very slowly or backed up. Not seeing any "project backoff".

Tom M

I'm seeing "project backoff" on each of many hundreds of uploads.


Sigh, just looked. I too am having project backoff of uploads.

Hopefully none of the data we have processed gets lost in this.

Tom M
A proud member of the OFA (Old Farts Assoc.)
ID: 5399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 17 Jun 16
Posts: 3
Credit: 37,189,367
RAC: 0
Message 5400 - Posted: 12 May 2022, 19:01:52 UTC

You can definitely see the effects of the pent:
https://grafana.kiska.pw/d/boinc/boinc?orgId=1&from=now-14d&to=now

As seen:


And here:
ID: 5400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom M

Send message
Joined: 18 Jul 17
Posts: 123
Credit: 1,017,526,950
RAC: 3,617,473
Message 5403 - Posted: 12 May 2022, 23:59:18 UTC

There outta be a way to throttle the upload/download volume so that when we connect we actually have some resources available to upload/download files.

Bet there is something in a router someplace...

Tom M
A proud member of the OFA (Old Farts Assoc.)
ID: 5403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gish

Send message
Joined: 19 Dec 21
Posts: 2
Credit: 135,613,333
RAC: 0
Message 5404 - Posted: 13 May 2022, 0:20:46 UTC

Also seeing my uploads backing off. Hopefully we get unblocked soon :)
ID: 5404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 23 Apr 22
Posts: 133
Credit: 56,533,333
RAC: 104,642
Message 5406 - Posted: 13 May 2022, 5:55:18 UTC - in response to Message 5396.  
Last modified: 13 May 2022, 5:58:44 UTC

In last 24h server get:
Success finished tasks: 1'171'604
Tasks with computation error (and others errors) 17'335
Tasks where host didn't contact with server 2'520
Validation errors 67
Orphaned tasks 2160

I didn't expect that this numbers are even possible on our server.
Now we have some problems with disks subsystem (even SSD's are bit to slow to manage this quantity of files).
Are they SATA/SCSI or NVMe SSDs?
I'd say it's all to do with the limitations that the project is working with.

Most other projects have multiple servers-
One for a separate science database, so the size of the active database can be kept as small as practical (i've still got Valid tasks in my Task list from April 24- the Scheduler etc has to deal with a database that has all those Tasks still in it).
More than one download server to help balance extreme loads.
A separate server for the web pages and forums, account's etc)
And one heavy duty server for the hard work- The Scheduler feeder, transitioner, validator etc.


And what probably has the biggest impact of all here at Univers@home- a single Tasks produces 6 Result files (not a single result file) which means a huge amount of work for the active database when results are returned, and when the completed Task is reported.
I'm guessing this was done to reduce the work that would be needed to get the necessary result data out of a single Result file?
But the cost is the 6 fold increase in the load on the upload server and on the active database, over having a single result file returned per Task.


Anyone got some spare cash for an extra big & beefy database server and an AFA (All Flash Array) storage server? Or care to donate them?
Grant
Darwin NT
ID: 5406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brummig
Avatar

Send message
Joined: 23 Mar 16
Posts: 95
Credit: 22,515,842
RAC: 6,922
Message 5407 - Posted: 13 May 2022, 8:25:02 UTC - in response to Message 5406.  

Zipping the files together would reduce the network overhead, and wouldn't cost a lot of spare cash :). Could that be done with reasonable ease?

I wish the people who want to play would find another game that doesn't cause problems for everyone.
ID: 5407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom M

Send message
Joined: 18 Jul 17
Posts: 123
Credit: 1,017,526,950
RAC: 3,617,473
Message 5409 - Posted: 13 May 2022, 9:23:24 UTC

My main U@H cruncher has run out of tasks :(
A proud member of the OFA (Old Farts Assoc.)
ID: 5409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 23 Apr 22
Posts: 133
Credit: 56,533,333
RAC: 104,642
Message 5411 - Posted: 13 May 2022, 10:03:37 UTC

Several times a day i just keep hitting Retry pending transfers till they all clear, then Update till the Scheduler finally doesn't error out & doesn't give a "Project has no available Tasks" response, then hit Retry pending transfers again till all of the downloads have finally come through.
Then i let it be for 6-10 hours, and then start hitting Retry pending transfers all over again...

I'lll be glad when the stupidity is over with (till the next time).
Grant
Darwin NT
ID: 5411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 832
Credit: 144,003,798
RAC: 1
Message 5413 - Posted: 13 May 2022, 16:00:37 UTC - in response to Message 5406.  

Universe@Home are 3 servers:
1. Main server
2. Database server
3. Storage server
On main server SSD's are SATA, there is no NVMe ports at all as this is about 7 years old Xeon 1230 machine.
Database server is on NVMe machine and it must be quick, as it doing loads more than only project database serving. It do most work with computation results received from volunteers.
Storage server is already used to storage all results and is used also to manipulate results with other software where results files have to be sorted, analysed etc.

This is what our budged was enough to buy and usually it is quite enough to do all tasks smoothly.

BOINC engine not allow to zip all files on client side together itself. Obviously we can send to volunteers another executable file which will pack result files together but... This doesn't have any sense as the files must be unpacked back on server which will cause in more disk operations and overall server load (but single result files are zipped on client side).

Anyway, we will probably upgrade systems with new grant in about 2 years (obviously if we get it - but with current project results I strongly believe that we get it).
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 5413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 23 Apr 22
Posts: 133
Credit: 56,533,333
RAC: 104,642
Message 5414 - Posted: 13 May 2022, 20:56:05 UTC - in response to Message 5413.  

Universe@Home are 3 servers:
1. Main server
2. Database server
3. Storage server
On main server SSD's are SATA, there is no NVMe ports at all as this is about 7 years old Xeon 1230 machine.
Database server is on NVMe machine and it must be quick, as it doing loads more than only project database serving. It do most work with computation results received from volunteers.
Storage server is already used to storage all results and is used also to manipulate results with other software where results files have to be sorted, analysed etc.
I can easily see a 7 year old Xeon even with SSDs struggling with the current workload- low core/thread count, low clock speeds, and low maximum possible RAM even when maxed out for caching when compared to even a 2 year old system, let alone what current hardware is capable of.

As there is a separate Science database sever, then reducing the time Validated Tasks are kept before being transitioned from the Main server to the Science database i expect would provide a big boost in helping the Main server's performance.
On my Task list, i've presently got 3 weeks worth there- that's 1,475 Tasks. Reducing that to 5 days (or even just a week) would bring that number down by 2/3rds (or more).
Out of all of my Tasks (In Progress, Pending, Inconclusive, Valid, Invalid, Error), that would reduce the total number by 20% Taking 20% off of the Main server database load would have to result in a good boost in it's performance- as fast as SSDs are compared to HDDs, RAM is so much faster again than SSDs. The greater the portion of the database that can be cached in RAM, then the better things will perform.



Anyway, we will probably upgrade systems with new grant in about 2 years (obviously if we get it - but with current project results I strongly believe that we get it).
Good luck with that- finger crossed.
It's the usual chicken & egg problem- you need the results to get the funding for the hardware, but to get the results you need the hardware to produce those results...

Thanks for the info.
Grant
Darwin NT
ID: 5414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 9 Nov 17
Posts: 18
Credit: 554,384,333
RAC: 2,433
Message 5415 - Posted: 13 May 2022, 21:46:42 UTC - in response to Message 5411.  

Grant (SSSF) wrote:
Several times a day i just keep hitting Retry pending transfers till they all clear, then Update till the Scheduler finally doesn't error out & doesn't give a "Project has no available Tasks" response, then hit Retry pending transfers again till all of the downloads have finally come through.
Then i let it be for 6-10 hours, and then start hitting Retry pending transfers all over again...
There are rather obvious alternatives to what you say you are doing.

a) The steps which you describe are simple mechanical steps. You could let a computer do them for you. Computers excel at repetitive trivial tasks.

b) There are so many other DC projects out there waiting for your computer capacity. If you like U@h much more than the other projects, no problem, the duration of the Pentathlon's 'Obstacle Run' at Universe@home is public. Things will be back to normal at U@h one or two days after the competition.
ID: 5415 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 23 Apr 22
Posts: 133
Credit: 56,533,333
RAC: 104,642
Message 5416 - Posted: 14 May 2022, 0:13:10 UTC

In progress rapidly climbing again, and so uploads & downloads now work, but the usual Scheduler response to work requests now is "Project has no Tasks available" once again.
Grant
Darwin NT
ID: 5416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 299
Credit: 4,131,262,700
RAC: 4,667,293
Message 5417 - Posted: 14 May 2022, 1:10:20 UTC

Einstein unreachable now.

A proud member of the OFA (Old Farts Association)
ID: 5417 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grant (SSSF)

Send message
Joined: 23 Apr 22
Posts: 133
Credit: 56,533,333
RAC: 104,642
Message 5418 - Posted: 14 May 2022, 5:44:47 UTC - in response to Message 5416.  

In progress rapidly climbing again, and so uploads & downloads now work, but the usual Scheduler response to work requests now is "Project has no Tasks available" once again.

I spoke too soon- file transfers are still having issues. Not as much as they were, bit still a problem.
Grant
Darwin NT
ID: 5418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 12 Aug 17
Posts: 21
Credit: 58,957,280
RAC: 0
Message 5419 - Posted: 14 May 2022, 6:24:22 UTC - in response to Message 5417.  

Einstein unreachable now.


and another one bites the dust...

SG should be banned for creating this havoc each and every year - but thats just my 5 cents.
ID: 5419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 14 · Next

Message boards : Number crunching : Server Thread




Copyright © 2023 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek