Message boards :
Number crunching :
Server Thread
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 16 · Next
Author | Message |
---|---|
Send message Joined: 18 Jul 17 Posts: 138 Credit: 1,379,173,617 RAC: 0 |
Uploads seem to be running very slowly or backed up. Not seeing any "project backoff". Tom M A proud member of the OFA (Old Farts Assoc.) |
Send message Joined: 23 Apr 22 Posts: 167 Credit: 69,772,000 RAC: 0 |
Yep, we're back to uploads & downloads not happening and Scheduler requests timing out, failing to receive data from the peer, couldn't connect to server etc, etc responses again. Makes a change from the "Project has no Tasks available" responses, even when it's got millions. Looking at the graphs, it's probably a result of all the tasks that people have been hoarding now being dumped back on to the server. Grant Darwin NT |
Send message Joined: 13 Mar 22 Posts: 4 Credit: 676,000 RAC: 0 |
I have to think that the project admins here had no idea what the actual load on the server would be when they agreed to this pentathlon. I finally got (after close to 2 days of trying) 1 task in the early hours this morning, which finished fine. It's been trying to upload, and it can't even upload a 20 byte file! Oh well, even if a lot of regulars are pissed, at least the project is getting a lot of work done. I hope. |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
In last 24h server get: Success finished tasks: 1'171'604 Tasks with computation error (and others errors) 17'335 Tasks where host didn't contact with server 2'520 Validation errors 67 Orphaned tasks 2160 I didn't expect that this numbers are even possible on our server. Now we have some problems with disks subsystem (even SSD's are bit to slow to manage this quantity of files). Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 10 May 20 Posts: 310 Credit: 4,733,484,700 RAC: 0 |
Uploads seem to be running very slowly or backed up. Not seeing any "project backoff". I'm seeing "project backoff" on each of many hundreds of uploads. A proud member of the OFA (Old Farts Association) |
Send message Joined: 18 Jul 17 Posts: 138 Credit: 1,379,173,617 RAC: 0 |
Uploads seem to be running very slowly or backed up. Not seeing any "project backoff". Sigh, just looked. I too am having project backoff of uploads. Hopefully none of the data we have processed gets lost in this. Tom M A proud member of the OFA (Old Farts Assoc.) |
Send message Joined: 17 Jun 16 Posts: 3 Credit: 37,189,367 RAC: 0 |
You can definitely see the effects of the pent: https://grafana.kiska.pw/d/boinc/boinc?orgId=1&from=now-14d&to=now As seen: And here: |
Send message Joined: 18 Jul 17 Posts: 138 Credit: 1,379,173,617 RAC: 0 |
There outta be a way to throttle the upload/download volume so that when we connect we actually have some resources available to upload/download files. Bet there is something in a router someplace... Tom M A proud member of the OFA (Old Farts Assoc.) |
Send message Joined: 19 Dec 21 Posts: 2 Credit: 135,613,333 RAC: 0 |
Also seeing my uploads backing off. Hopefully we get unblocked soon :) |
Send message Joined: 23 Apr 22 Posts: 167 Credit: 69,772,000 RAC: 0 |
In last 24h server get:Are they SATA/SCSI or NVMe SSDs? I'd say it's all to do with the limitations that the project is working with. Most other projects have multiple servers- One for a separate science database, so the size of the active database can be kept as small as practical (i've still got Valid tasks in my Task list from April 24- the Scheduler etc has to deal with a database that has all those Tasks still in it). More than one download server to help balance extreme loads. A separate server for the web pages and forums, account's etc) And one heavy duty server for the hard work- The Scheduler feeder, transitioner, validator etc. And what probably has the biggest impact of all here at Univers@home- a single Tasks produces 6 Result files (not a single result file) which means a huge amount of work for the active database when results are returned, and when the completed Task is reported. I'm guessing this was done to reduce the work that would be needed to get the necessary result data out of a single Result file? But the cost is the 6 fold increase in the load on the upload server and on the active database, over having a single result file returned per Task. Anyone got some spare cash for an extra big & beefy database server and an AFA (All Flash Array) storage server? Or care to donate them? Grant Darwin NT |
Send message Joined: 23 Mar 16 Posts: 96 Credit: 23,431,842 RAC: 0 |
Zipping the files together would reduce the network overhead, and wouldn't cost a lot of spare cash :). Could that be done with reasonable ease? I wish the people who want to play would find another game that doesn't cause problems for everyone. |
Send message Joined: 18 Jul 17 Posts: 138 Credit: 1,379,173,617 RAC: 0 |
My main U@H cruncher has run out of tasks :( A proud member of the OFA (Old Farts Assoc.) |
Send message Joined: 23 Apr 22 Posts: 167 Credit: 69,772,000 RAC: 0 |
Several times a day i just keep hitting Retry pending transfers till they all clear, then Update till the Scheduler finally doesn't error out & doesn't give a "Project has no available Tasks" response, then hit Retry pending transfers again till all of the downloads have finally come through. Then i let it be for 6-10 hours, and then start hitting Retry pending transfers all over again... I'lll be glad when the stupidity is over with (till the next time). Grant Darwin NT |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Universe@Home are 3 servers: 1. Main server 2. Database server 3. Storage server On main server SSD's are SATA, there is no NVMe ports at all as this is about 7 years old Xeon 1230 machine. Database server is on NVMe machine and it must be quick, as it doing loads more than only project database serving. It do most work with computation results received from volunteers. Storage server is already used to storage all results and is used also to manipulate results with other software where results files have to be sorted, analysed etc. This is what our budged was enough to buy and usually it is quite enough to do all tasks smoothly. BOINC engine not allow to zip all files on client side together itself. Obviously we can send to volunteers another executable file which will pack result files together but... This doesn't have any sense as the files must be unpacked back on server which will cause in more disk operations and overall server load (but single result files are zipped on client side). Anyway, we will probably upgrade systems with new grant in about 2 years (obviously if we get it - but with current project results I strongly believe that we get it). Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 23 Apr 22 Posts: 167 Credit: 69,772,000 RAC: 0 |
Universe@Home are 3 servers:I can easily see a 7 year old Xeon even with SSDs struggling with the current workload- low core/thread count, low clock speeds, and low maximum possible RAM even when maxed out for caching when compared to even a 2 year old system, let alone what current hardware is capable of. As there is a separate Science database sever, then reducing the time Validated Tasks are kept before being transitioned from the Main server to the Science database i expect would provide a big boost in helping the Main server's performance. On my Task list, i've presently got 3 weeks worth there- that's 1,475 Tasks. Reducing that to 5 days (or even just a week) would bring that number down by 2/3rds (or more). Out of all of my Tasks (In Progress, Pending, Inconclusive, Valid, Invalid, Error), that would reduce the total number by 20% Taking 20% off of the Main server database load would have to result in a good boost in it's performance- as fast as SSDs are compared to HDDs, RAM is so much faster again than SSDs. The greater the portion of the database that can be cached in RAM, then the better things will perform. Anyway, we will probably upgrade systems with new grant in about 2 years (obviously if we get it - but with current project results I strongly believe that we get it).Good luck with that- finger crossed. It's the usual chicken & egg problem- you need the results to get the funding for the hardware, but to get the results you need the hardware to produce those results... Thanks for the info. Grant Darwin NT |
Send message Joined: 9 Nov 17 Posts: 21 Credit: 563,207,000 RAC: 0 |
Grant (SSSF) wrote: Several times a day i just keep hitting Retry pending transfers till they all clear, then Update till the Scheduler finally doesn't error out & doesn't give a "Project has no available Tasks" response, then hit Retry pending transfers again till all of the downloads have finally come through.There are rather obvious alternatives to what you say you are doing. a) The steps which you describe are simple mechanical steps. You could let a computer do them for you. Computers excel at repetitive trivial tasks. b) There are so many other DC projects out there waiting for your computer capacity. If you like U@h much more than the other projects, no problem, the duration of the Pentathlon's 'Obstacle Run' at Universe@home is public. Things will be back to normal at U@h one or two days after the competition. |
Send message Joined: 23 Apr 22 Posts: 167 Credit: 69,772,000 RAC: 0 |
In progress rapidly climbing again, and so uploads & downloads now work, but the usual Scheduler response to work requests now is "Project has no Tasks available" once again. Grant Darwin NT |
Send message Joined: 10 May 20 Posts: 310 Credit: 4,733,484,700 RAC: 0 |
Einstein unreachable now. A proud member of the OFA (Old Farts Association) |
Send message Joined: 23 Apr 22 Posts: 167 Credit: 69,772,000 RAC: 0 |
In progress rapidly climbing again, and so uploads & downloads now work, but the usual Scheduler response to work requests now is "Project has no Tasks available" once again. I spoke too soon- file transfers are still having issues. Not as much as they were, bit still a problem. Grant Darwin NT |
Send message Joined: 12 Aug 17 Posts: 21 Credit: 58,957,280 RAC: 0 |
Einstein unreachable now. and another one bites the dust... SG should be banned for creating this havoc each and every year - but thats just my 5 cents. |