Message boards : Number crunching : Upload server is straining under the load
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,138,700
RAC: 2,851,044
Message 4343 - Posted: 7 Jun 2020, 3:54:35 UTC

Anyone else notice that the upload server is taking a long time to complete uploads. The website isn't all that fast right now either.

Guess it's the added Seti orphans effect again.

A proud member of the OFA (Old Farts Association)
ID: 4343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 236
Credit: 181,887,914
RAC: 197,437
Message 4344 - Posted: 7 Jun 2020, 15:48:36 UTC - in response to Message 4343.  
Last modified: 7 Jun 2020, 15:52:02 UTC

There were less than 2000 users a few months ago (around 1850 in fact), and now well over 3000 if you add BHspin + ULX.
I have laid off Universe for a while, so that you can do them.
ID: 4344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,138,700
RAC: 2,851,044
Message 4345 - Posted: 7 Jun 2020, 18:38:30 UTC

I'm going to stick with the project. It just takes a little bit of extra manual intervention to periodically look for stalled uploads and get them moving. I could implement the same script I was using for Seti if it really starts to be a big problem.
I just like that the 3950X does so well on both applications compared to any other architecture. The apps really like the enhanced FPU of Zen 2. I just don't see similar gains on any of my other cpu projects like Einstein or Rosetta. I guess whatever math function calls the Universe apps use, just play into the Zen 2 wheelhouse and the other project apps don't use all the same instructions.

A proud member of the OFA (Old Farts Association)
ID: 4345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 236
Credit: 181,887,914
RAC: 197,437
Message 4346 - Posted: 7 Jun 2020, 20:24:48 UTC - in response to Message 4345.  

I just like that the 3950X does so well on both applications compared to any other architecture. The apps really like the enhanced FPU of Zen 2.

The problem I have is that the 3950X is good at everything, so I reserve it for Folding on the CPU, where I average over 600 k PPD. That is unheard of for a CPU, and up in GPU territory.
But it allows me to keep going during the summer on some machines that I might otherwise have to shut down.
ID: 4346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,138,700
RAC: 2,851,044
Message 4347 - Posted: 8 Jun 2020, 0:12:39 UTC

I'm still sticking to BOINC white-listed projects because of Gridcoin testing.

A proud member of the OFA (Old Farts Association)
ID: 4347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,138,700
RAC: 2,851,044
Message 4349 - Posted: 10 Jun 2020, 4:53:43 UTC

The upload server is definitely having issues. I've noticed a repeating pattern where an upload progresses to 100% then stalls out. Then the upload restarts again from 0% and moves through to 100% again.

I believe what is happening is that the upload makes it all the way to the server but the server is too busy to send the client an ACK that the file was received correctly.

So the client sends the file again. Multiple times until it gets an ACK. If the client does not get an ACK received after three attempts, the upload goes into a backoff timer countdown. Rinse and repeat.

A proud member of the OFA (Old Farts Association)
ID: 4349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 236
Credit: 181,887,914
RAC: 197,437
Message 4350 - Posted: 10 Jun 2020, 7:12:41 UTC - in response to Message 4349.  

OK, I will keep my eye out for it. I decided to put 10 cores of a Ryzen 2700 on it (Ubuntu 18.04.4) on BHSpin v2 and ULX..
The run times I am seeing are not that bad - maybe the usual 30% slower than a Ryzen 3600 for example, though I don't have that many yet.
But I figured out how to keep it cool in the summer - I placed it out on the porch. It is an infinite heat sink.

I wonder if it is a local effect because you have so many fast cores that you are seeing it more than usual? Though I would think the server would only care about the overall load, but there may be some other limit in the system.
ID: 4350 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,138,700
RAC: 2,851,044
Message 4352 - Posted: 10 Jun 2020, 16:17:09 UTC - in response to Message 4350.  

I send just as work back to other projects on the same systems and they don't have as much problems. Only GPUGrid has a bit of an issue with uploads. But it has always had issues since the Seti orphan migration. I learned how to mitigate that pretty well with my project cooldown setting in my client.

But when I first started Universe, it did not have the issue only in the last week or so has the problem cropped up to be really annoying.

A proud member of the OFA (Old Farts Association)
ID: 4352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,138,700
RAC: 2,851,044
Message 4365 - Posted: 11 Jun 2020, 0:12:09 UTC

Krzysztof posted in the Upload fails thread. They only have a single gigabit connection that everything travels through. And of course the upload stalls started with the big batch of ULX jobs that cropped up. Those are big uploads and the ones that most often fail several times before succeeding.

And there were no ULX jobs when I first joined, so explains perfectly the issue I am seeing. I also got a big mismatch in ULX/BHspin tasks on this daily driver with ULX taking over the cache, so the worst possible jobs to have to constantly push uphill. I've turned those off temporarily to work the cache down of that type so I get a more even balance of work before turning them back on. My stalls have dropped out and the host is running much better. I just wish I had a bigger upload pipe. Only around 768kbps on average. Sadly my internet connection is the best I can get from any provider here.

A proud member of the OFA (Old Farts Association)
ID: 4365 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Upload server is straining under the load




Copyright © 2021 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek