Message boards : Number crunching : Upload/download problems and SETIBZH/Sprint
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile marsinph

Send message
Joined: 22 Mar 18
Posts: 29
Credit: 24,402,488
RAC: 0
Message 4087 - Posted: 19 Mar 2020, 23:37:47 UTC

Hello, Krzystof
What happend with up/download ?.
Server status seem to be OK. Transfert are about a few bytes and some other few kb)
Not forget SETIBZH / FormulaBoinc, have choose your project for the actual sprint.
Normally, it is synchro of results/credits each hours in case of sprint;
Already two hours, and nothing..

The worst : most of WU are cancelled by serveur
Let me be very clear : WU returned , waiting validation, then suddenly cancelled by server !!!
If there are cancelled after a few second of crunch, no problem. But not after some hours !!!

I have returned 110WU . Only one validated (a BHSPIN), 109 cancelled by server after crunching and 4 "invalid, it can happen. I had about 20 WU "validation pending". Nice, but after some hours they go to "cancelled" !!!

If there are problems, then solve it and stop the Sprint !
In 2019; there were similar problems. Nothing change !
ID: 4087 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Magiceye04

Send message
Joined: 21 Feb 15
Posts: 13
Credit: 147,679,556
RAC: 0
Message 4097 - Posted: 20 Mar 2020, 15:38:26 UTC - in response to Message 4087.  

Some projects support the challenges and other projects actively sabotage the sprinters.
Guess which type you have here...
ID: 4097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
scole of TSBT

Send message
Joined: 22 Feb 15
Posts: 24
Credit: 250,365,396
RAC: 0
Message 4099 - Posted: 20 Mar 2020, 20:23:23 UTC - in response to Message 4097.  

That's not true. Krzysztof is very committed to the projects he supports and very thankful for the support of crunchers. The new ULX WUs create large result files and that's what's impacting the uploads and downloads. Everyone continually updating the project only makes it worse.
ID: 4099 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 169
Credit: 317,253,046
RAC: 0
Message 4101 - Posted: 20 Mar 2020, 21:12:48 UTC
Last modified: 20 Mar 2020, 21:13:13 UTC

ID: 4101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 10 Apr 18
Posts: 9
Credit: 3,112,750,341
RAC: 0
Message 4103 - Posted: 21 Mar 2020, 0:53:01 UTC - in response to Message 4101.  

People hoarding thousands of tasks for weeks increases the load on the servers.
https://stats3.free-dc.org/stats.php?page=user&proj=uni&name=56939
https://stats3.free-dc.org/stats.php?page=user&proj=uni&name=48820
https://stats3.free-dc.org/stats.php?page=user&proj=uni&name=49710


I thought you're more knowledgeable than that. What does that have to do with the server load?
FYI, the bunkers we "hoarded" had been uploaded to the server long before the Sprint started. Just waited for the right time and we'll report and get credits. There was no upload whatsoever when all this mess happened.
If you read this thread https://universeathome.pl/universe/forum_thread.php?id=507
The admin released a new batch of ULX work, didn't go right so he decided to cancel it. Somehow, that also cancel the current batch of BHSpin hence why we got a lot of invalids ... "Completed, can't validate" ... because those WUs didn't make it to the wingman to run.
Then, he put out a new batch of ULX again without generating BHSpin. That caused the server only had ULX works and users were forced to download ULX works, which has big results file sizes (5MB - 50MB).
When users started to return those results, that's when the server couldn't handle and became sluggish.

Don't create more hate to people that do bunker. Bunkers had been widely use for competitions, you know it ... I know it. Not everyone like it and I respect that. But to blame this sheit on us is kinda low. There are ways to do it, if you do it right ... the server hardly felt any kind of hit at all.
So we released our bunkers already, why the server still sluggish? explain
ID: 4103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PecosRiverM
Avatar

Send message
Joined: 4 May 18
Posts: 15
Credit: 5,647,520,667
RAC: 0
Message 4104 - Posted: 21 Mar 2020, 1:28:46 UTC - in response to Message 4103.  

If I was part of the problem then I must ask:
Why on 11 March when I dropped a much larger bunker there wasn't any problem?

But then I've got Big Shoulders so I'll take the blame.
Everything is BIGGER In Texas you know. ;-)
ID: 4104 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
davidBAM

Send message
Joined: 18 Oct 18
Posts: 6
Credit: 1,324,610,333
RAC: 0
Message 4105 - Posted: 21 Mar 2020, 2:18:06 UTC - in response to Message 4101.  

People hoarding thousands of tasks for weeks increases the load on the servers.
https://stats3.free-dc.org/stats.php?page=user&proj=uni&name=56939
https://stats3.free-dc.org/stats.php?page=user&proj=uni&name=48820
https://stats3.free-dc.org/stats.php?page=user&proj=uni&name=49710


Well, I don't know about you but I tend to report WU only after a sprint actually STARTS. Do you not think it more likely that hundreds of attaches and thousands of downloads when the sprint was ANNOUNCED were equally likely to deplete resources and saturate bandwidth.

Anyhow, as VietOz says, most completed WU are uploaded long in advance of even the announcement

Oh, and if you got any of the ULX work units, you may have noticed that some of the results files are 5Mb, 50Mb and larger

So sir, next time you try to 'name and shame' try to research a bit better
ID: 4105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Magiceye04

Send message
Joined: 21 Feb 15
Posts: 13
Credit: 147,679,556
RAC: 0
Message 4107 - Posted: 21 Mar 2020, 12:59:44 UTC

Wouldnt it be a good idea to limit the number of WUs per CPU-Cores and per host ?
e.g. SETI@Home only gives out 100/150 WUs per CPU or GPU.
For people running 24/7 this makes no difference but it could reduce extrem large bunkers.
ID: 4107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 10 Apr 18
Posts: 9
Credit: 3,112,750,341
RAC: 0
Message 4108 - Posted: 21 Mar 2020, 14:05:31 UTC - in response to Message 4107.  

Wouldnt it be a good idea to limit the number of WUs per CPU-Cores and per host ?
e.g. SETI@Home only gives out 100/150 WUs per CPU or GPU.
For people running 24/7 this makes no difference but it could reduce extrem large bunkers.


There are still ways around it. The top SETI guys can get up to 6k of WUs per instance. And they all had to do it to avoid the down time every Tuesday.
Limit WUs only impact the regular users, not the bunker guys (note: they also run 24/7). Why?
If the server has some problem or need down time to maintain, these regular users won't have enough WUs while the bunker guys still working on their cache. Bunkers always find ways to bunker. Limit only hurts the regular users and make things more inconvenient.
ID: 4108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Jun 16
Posts: 169
Credit: 317,253,046
RAC: 0
Message 4109 - Posted: 21 Mar 2020, 14:19:26 UTC
Last modified: 21 Mar 2020, 14:27:08 UTC

Do you both think that the data upload is the only processing that happens with tasks? BOINC servers keep track of the tasks, who has them, etc, etc. If you actually watch the attempted uploads, the progress bar completes. Some of the files are only 20 bytes so its not the bandwidth that is the issue here but the # of connections and database transactions going on in the background. The website is loading fine, another clue its not a bandwidth issue. Database performance seems to be the issue and you're both just attempting to justify your e-peen measurement contest at the expense of the project and everyone else. Do your own once of research.

emoga was temp banned at a project for doing the same thing so its not exactly appreciated by admins.

Magiceye04: More clients can always be setup to bypass tasks per core limit.
ID: 4109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 10 Oct 19
Posts: 5
Credit: 100,594,667
RAC: 0
Message 4110 - Posted: 21 Mar 2020, 15:09:24 UTC
Last modified: 21 Mar 2020, 15:09:42 UTC

Just made some popcorn for myself. But I have some cheese for you to go with your whine :)
ID: 4110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 10 Apr 18
Posts: 9
Credit: 3,112,750,341
RAC: 0
Message 4111 - Posted: 21 Mar 2020, 15:36:15 UTC - in response to Message 4109.  

Some of the files are only 20 bytes so its not the bandwidth that is the issue here but the # of connections and database transactions going on in the background. The website is loading fine, another clue its not a bandwidth issue.


https://universeathome.pl/universe/server_status.php

Look at that page and look at what you've just said. Tell me what's wrong?
    Download server
    Upload server
    Scheduler
    feeder


Also explain why the server still slow up to this point when all the "hoarded" tasks had been credited almost 24hrs ago?

ID: 4111 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
davidBAM

Send message
Joined: 18 Oct 18
Posts: 6
Credit: 1,324,610,333
RAC: 0
Message 4112 - Posted: 21 Mar 2020, 15:49:23 UTC - in response to Message 4109.  
Last modified: 21 Mar 2020, 15:50:02 UTC

@mmonnin

People who live in glasshouses, shouldn't throw stones https://en.wikipedia.org/wiki/Those_who_live_in_glass_houses_should_not_throw_stones

I can see that you had speculatively bunkered on Universe yourself for own your team in last week's Formula Boinc sprint. Uploaded them on 11-Mar presumably when WCG was announced. Then nothing but pendings for a week until you dropped again on 19-Mar. Funny that, eh?

So, would you have been posting your crap in this thread if your own e-peen had been boosted by bunkers for the correct project last week? I think not.

PS: I am struggling to understand why you say "at the expense of the project". I'm sure any project who doesn't want the huge points boost from the Formula Boinc circus coming to town, need only contact Seb
ID: 4112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 4113 - Posted: 21 Mar 2020, 18:07:26 UTC

Firstly. I'm apologise for the recent problems. There was two of them:
1. I release new batch of ULX work units which generates large result files AND large temporary files. Firstly, I found only first issue and cancel tasks when I noticed what happens. Then I make changes to outgoing template, generate second batch and was happy... Wrong. The second issue impact loads of crunchers with impossibility to send back results.
I got info, that this application generate 25MB results data and had created results template up to 50MB, unfortunately some of tasks had 66-70MB of data...
Finally, I cancelled most of the tasks.

2. Second problem was size of results in aspect of bandwidth. We have 1GB/s symmetric line (approx. 250MB/s) used for server and NFS connection to store machine. This gives us about 100MB/s usage bandwidth for results. After few simple calculations I recognised, that this is not enough to serve large number of connections with this size of data.

Finally.
On next week I will make credits for all users who had finish tasks but not granted credit due to tasks cancellation and not enough space problem on their disk. I just need firstly to play with some tasks on server which are already started.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 4113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 10 Apr 18
Posts: 9
Credit: 3,112,750,341
RAC: 0
Message 4114 - Posted: 21 Mar 2020, 19:05:42 UTC - in response to Message 4113.  

Firstly. I'm apologise for the recent problems. There was two of them:
1. I release new batch of ULX work units which generates large result files AND large temporary files. Firstly, I found only first issue and cancel tasks when I noticed what happens. Then I make changes to outgoing template, generate second batch and was happy... Wrong. The second issue impact loads of crunchers with impossibility to send back results.
I got info, that this application generate 25MB results data and had created results template up to 50MB, unfortunately some of tasks had 66-70MB of data...
Finally, I cancelled most of the tasks.

2. Second problem was size of results in aspect of bandwidth. We have 1GB/s symmetric line (approx. 250MB/s) used for server and NFS connection to store machine. This gives us about 100MB/s usage bandwidth for results. After few simple calculations I recognised, that this is not enough to serve large number of connections with this size of data.

Finally.
On next week I will make credits for all users who had finish tasks but not granted credit due to tasks cancellation and not enough space problem on their disk. I just need firstly to play with some tasks on server which are already started.


No worries Krzysztof! Sometime sheit happens.
I appreciated the explanations and for me that's important. I wasn't gonna post anything and just crunch on. But if someone frame sheit on me and my friends, I ain't gonna stay quiet.
ID: 4114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile PecosRiverM
Avatar

Send message
Joined: 4 May 18
Posts: 15
Credit: 5,647,520,667
RAC: 0
Message 4115 - Posted: 21 Mar 2020, 20:13:12 UTC - in response to Message 4114.  

No it's all my fault..

I shouldn't have added that 14.4k modem on top of the 28.8k one.
I feel bad that I uploaded too many wu's. but I've still got more wu's to do.

I should fire up the other 10 systems and do some more.
ID: 4115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marsinph

Send message
Joined: 22 Mar 18
Posts: 29
Credit: 24,402,488
RAC: 0
Message 4117 - Posted: 21 Mar 2020, 23:05:40 UTC - in response to Message 4113.  

Hello
I make difference between all the problems
My excuses, but it seem Universe have only problems

So In oder of read : ULX, as good as nobody take those WU
I do not know where everyone speak about up/down of megabytes. At me all my returened and validated WU are a few bytes I repeat a few bytes. To be very clear a few hundreds of bits. the biggest are about one kilo byte.

Server can manage about one Giga , so 0.0000001%
I do not think there are about one million users ons this project !

Then all my WU (down, and up) are in a few bytes. Admin , where you find a results of WU of 25MB or 50Mb ???
The biggest transfert was at project intirtilizationt ( 5mb)
After , WU are a few bytes, I repeat a few bytes. Not kb, not mb. Only bytes
My latest results sent, and validated : 134 bytes (yes : 1072 bits, plus protocols TCP/IP)
Your connections is unable ta manage a ferw bits ???
You have only a 1Gbp/s then you would gbe able ta manage about one million connection at same time !!!
Let us reduce with network problem to 500kl each second !!!


Then , Krizstof, you explain it will be fixed next week. OK so far.
But you have agree your project participate at SETIBZH
This competition ends on 22 march 20:59UTC
And you will try to solve after ???
let me remember you that in 2018, 2019 it was the same problem.
You produce new WXU ( ULX), but your server cancel it .
I am sure you do your best . I am sure your scientists needs calculation power.
Only look year after years the number of users...
Make again your project attractive ...




Firstly. I'm apologise for the recent problems. There was two of them:
1. I release new batch of ULX work units which generates large result files AND large temporary files. Firstly, I found only first issue and cancel tasks when I noticed what happens. Then I make changes to outgoing template, generate second batch and was happy... Wrong. The second issue impact loads of crunchers with impossibility to send back results.
I got info, that this application generate 25MB results data and had created results template up to 50MB, unfortunately some of tasks had 66-70MB of data...
Finally, I cancelled most of the tasks.

2. Second problem was size of results in aspect of bandwidth. We have 1GB/s symmetric line (approx. 250MB/s) used for server and NFS connection to store machine. This gives us about 100MB/s usage bandwidth for results. After few simple calculations I recognised, that this is not enough to serve large number of connections with this size of data.

Finally.
On next week I will make credits for all users who had finish tasks but not granted credit due to tasks cancellation and not enough space problem on their disk. I just need firstly to play with some tasks on server which are already started.
ID: 4117 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
scole of TSBT

Send message
Joined: 22 Feb 15
Posts: 24
Credit: 250,365,396
RAC: 0
Message 4118 - Posted: 22 Mar 2020, 0:43:12 UTC

krzyszp, don't worry nobody else understands what marsinph rants about either. Thank you for your commitment to your projects. If I ever hit the powerball jackpot, you'll be the one the first beneficiaries. .
ID: 4118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Magiceye04

Send message
Joined: 21 Feb 15
Posts: 13
Credit: 147,679,556
RAC: 0
Message 4121 - Posted: 22 Mar 2020, 10:20:41 UTC

Then i want to apologize that i misunderstood the comments about hamsters.
At the moment a lot of "hamsters" buy masses of food, toilet paper etc. - i thought here the hamsters are meant as users that grab all WUs from the servers they get.
Sorry...
ID: 4121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 847
Credit: 144,180,465
RAC: 0
Message 4123 - Posted: 22 Mar 2020, 18:34:09 UTC - in response to Message 4121.  

I have added credits to volunteers with problematic WU's.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 4123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Upload/download problems and SETIBZH/Sprint




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek