Message boards : Number crunching : Upload fails
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
mikey
Avatar

Send message
Joined: 4 Apr 15
Posts: 46
Credit: 43,128,567
RAC: 0
Message 4273 - Posted: 9 May 2020, 2:30:16 UTC - in response to Message 4262.  

2,537.91 2,517.00 100.00 Universe ULX v0.15 x86_64-pc-linux-gnu

4,756.02 4,756.02 pending Universe BHspin v2 v0.19 x86_64-pc-linux-gnu

You are sorta right...my BH tasks DO take longer than my ULX tasks but only twice as long


I'm exactly right for my tasks. We may be looking at different versions of ULX, Krzysztof said he'd made them smaller. Or I'm looking at a different CPU - I have 4 wildly different computers, I just glanced through my completed tasks on the server and took a rough average.

This is using a NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 390.13 OpenCL: 1.2


Using a what?! I thought Universe was CPU only? Just checked my preferences on the site, there is an option for AMD, but I don't think it does anything. I've switched it on just in case! But there's no Nvidia option.


YES YOU ARE RIGHT...I guess I'm having a VERY bad couple of days at posting, I am NOT using my gpu's here ONLY my cpu cores!!!
ID: 4273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 4 Apr 15
Posts: 46
Credit: 43,128,567
RAC: 0
Message 4274 - Posted: 9 May 2020, 2:32:29 UTC - in response to Message 4261.  

It's too late to edit this BS I posted earlier so just ignore it!!! The times are right BUT I am NOT using my gpu's here I am ONLY using my cpu's here!!!



2,537.91 2,517.00 100.00 Universe ULX v0.15 x86_64-pc-linux-gnu

4,756.02 4,756.02 pending Universe BHspin v2 v0.19 x86_64-pc-linux-gnu

You are sorta right...my BH tasks DO take longer than my ULX tasks but only twice as long

This is using a NVIDIA GeForce GTX 1080 Ti (4095MB) driver: 390.13 OpenCL: 1.2
ID: 4274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
vaughan

Send message
Joined: 4 Feb 15
Posts: 7
Credit: 158,219,834
RAC: 0
Message 4275 - Posted: 9 May 2020, 7:18:25 UTC

Is there a way to automate "Retry now" so as to force the uploads?

How about Autoitscript?
ID: 4275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jon Melusky
Avatar

Send message
Joined: 4 Mar 16
Posts: 6
Credit: 8,342,333
RAC: 0
Message 4276 - Posted: 9 May 2020, 7:52:32 UTC

I was able to clear my upload fails. I had two fails in the transfer tab. I clicked one of them and clicked retry now. When that one WU said "Active", I quickly selected the other one and also clicked retry now for that one. Retying them separately didn't work, but sometimes it does work for some WUs. Some of the WUs that are stuck show as very tiny so perhaps the server doesn't recognize them as WUs? Sometimes the stuck WUs are cleared from the transfer tab, but then they are moved to the tasks tab and they say Ready to Report. Other times, the stuck WUs seem to be cleared directly to the Universe servers and they don't show in the tasks tab. Not all WUs get stuck, just 2 to 4 WUs a day seem to get stuck. None of the WUs are showing as errors thankfully. (^:
https://boincstats.com/signature/-1/user/4394448/sig.png
ID: 4276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 4 Apr 15
Posts: 46
Credit: 43,128,567
RAC: 0
Message 4277 - Posted: 9 May 2020, 12:23:17 UTC - in response to Message 4275.  

Is there a way to automate "Retry now" so as to force the uploads?

How about Autoitscript?


The backoff time is designed to not keep banging on the door of the upload server so others have a chance to get theirs in too. Years ago Serti did a test and found it took like 20ms for the Server to connect with each pc and then around 20 seconds to tranfer the data and then another 20ms to close the port and get ready to open it for the next computer. With over half a million pc's per day there just wan't enough time in the day for everyone to connect as often as we users wanted to so they designed the backoff, and allow bigger caches, to slow things down a bit. With todays always on internet people have smaller caches, sometimes a zero cache so it's a do a workunit, return it and get another workunit, meaning ALOT more of us trying to connect to the Server. Unfortunately there are some things you just can't speed up without MUCH more expensive hardware and most Projects don't have that kind of money.

That's the long answer, the short answer is yes but most Projects would prefer you not do it as other people who can't get thru will leave, one person has already done that here.
Your RAC will not suffer very much and you may even get a slight pop once the Pentathlon folks are done sending in all their units.
ID: 4277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
vaughan

Send message
Joined: 4 Feb 15
Posts: 7
Credit: 158,219,834
RAC: 0
Message 4278 - Posted: 9 May 2020, 14:07:29 UTC

Yes Mikey, that is my plan. Return the work and move back to Rosetta until Amicable becomes the active project for the Pentathlon. The server here just cannot cope with a race.
ID: 4278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 30 Oct 16
Posts: 183
Credit: 18,395,933
RAC: 11
Message 4279 - Posted: 9 May 2020, 14:50:10 UTC - in response to Message 4273.  
Last modified: 9 May 2020, 14:51:21 UTC


YES YOU ARE RIGHT...I guess I'm having a VERY bad couple of days at posting, I am NOT using my gpu's here ONLY my cpu cores!!!


I'm off work during the virus and seem to be busier than usual!

It's too late to edit this BS I posted earlier so just ignore it!!!


ROTFPMSL! First time I've heard someone refer to what they said themselves as BS :-)
ID: 4279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 30 Oct 16
Posts: 183
Credit: 18,395,933
RAC: 11
Message 4280 - Posted: 9 May 2020, 14:54:57 UTC - in response to Message 4278.  

Yes Mikey, that is my plan. Return the work and move back to Rosetta until Amicable becomes the active project for the Pentathlon. The server here just cannot cope with a race.


It copes fine. Does it really matter if you have a queue waiting to send? Just increase your buffer so your computer only has to succeed connecting to Universe once a day.
ID: 4280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 30 Oct 16
Posts: 183
Credit: 18,395,933
RAC: 11
Message 4281 - Posted: 9 May 2020, 14:55:58 UTC - in response to Message 4276.  
Last modified: 9 May 2020, 14:56:52 UTC

I was able to clear my upload fails. I had two fails in the transfer tab. I clicked one of them and clicked retry now. When that one WU said "Active", I quickly selected the other one and also clicked retry now for that one. Retying them separately didn't work, but sometimes it does work for some WUs. Some of the WUs that are stuck show as very tiny so perhaps the server doesn't recognize them as WUs? Sometimes the stuck WUs are cleared from the transfer tab, but then they are moved to the tasks tab and they say Ready to Report. Other times, the stuck WUs seem to be cleared directly to the Universe servers and they don't show in the tasks tab. Not all WUs get stuck, just 2 to 4 WUs a day seem to get stuck. None of the WUs are showing as errors thankfully. (^:


I just click "retry all". If you can't see that option, it must be a function of Boinctasks, which I use to control my 4 (soon to be 6) computers. Much easier to have everything on one screen.
ID: 4281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 9 Nov 17
Posts: 21
Credit: 563,207,000
RAC: 1
Message 4282 - Posted: 10 May 2020, 9:36:25 UTC - in response to Message 4252.  
Last modified: 10 May 2020, 9:38:39 UTC

mikey wrote:
MOST of the time Projects get notified that they were selected to be a part of the Pentathlon about a week ahead of time. Sometimes Projects say 'no' but it's proably too late and they suffer for it,
From what I remember, the current (and only feasible) process is that the Pentathlon organizers
    – ask project admins whether or not they are OK with taking part,
    – keep only projects for selection whose admins responded positively (IOW, remove projects from their set of candidates if the admins declined or never responded).

This is only from my memory; I don't have a primary source to quote.

--------

Jon Melusky wrote:

I was able to clear my upload fails. I had two fails in the transfer tab. I clicked one of them and clicked retry now. When that one WU said "Active", I quickly selected the other one and also clicked retry now for that one. Retying them separately didn't work, but sometimes it does work for some WUs. Some of the WUs that are stuck show as very tiny so perhaps the server doesn't recognize them as WUs? Sometimes the stuck WUs are cleared from the transfer tab, but then they are moved to the tasks tab and they say Ready to Report. Other times, the stuck WUs seem to be cleared directly to the Universe servers and they don't show in the tasks tab.
The transfer tab shows files, not tasks.
Each successfully computed BHspin v2 task produces 6 result files.
After all 6 files were uploaded successfully, a task becomes ready to report.
ID: 4282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jon Melusky
Avatar

Send message
Joined: 4 Mar 16
Posts: 6
Credit: 8,342,333
RAC: 0
Message 4288 - Posted: 11 May 2020, 4:16:01 UTC - in response to Message 4281.  

Yes indeed. "retry all" is in Boinctasks. Thank you, first I have heard of it. Configuring it now.
https://boincstats.com/signature/-1/user/4394448/sig.png
ID: 4288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 308
Credit: 4,733,484,700
RAC: 1,384
Message 4289 - Posted: 11 May 2020, 15:55:41 UTC

New to the project. Was not aware of the project being part of the Pentathlon. Is the availability of work because of the contest? Or is the quantity available normal?

Are the BHspin2 tasks a special set for the contest? Or normal? Are the normal tasks all the same in runtimes? I

I have two hosts with very large differences in runtimes and I am trying to determine why? The set of tasks given to each host were from different species of work based on my assumption in tasknames. Is there a post explaining the makeup of tasknames? There are no parameters visible for each task when hovering over a task unlike what Milkyway or Einstein shows. Can someone explain the task construction?

Or is one computer fundamentally faster than the other even though both have the same clocks.

A proud member of the OFA (Old Farts Association)
ID: 4289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 308
Credit: 4,733,484,700
RAC: 1,384
Message 4291 - Posted: 11 May 2020, 17:27:30 UTC

I found the statistics and top computers and see it dominated by AMD Zen 2 hosts with similar compute times. So the architecture is the difference.
That also showed me that a large number of hosts loaded up work for the City Run part of the Pentathlon which ends in a few hours and they are aborting huge quantities of work. Not good BOINC etiquette in my opinion.

A proud member of the OFA (Old Farts Association)
ID: 4291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 9 Nov 17
Posts: 21
Credit: 563,207,000
RAC: 1
Message 4292 - Posted: 11 May 2020, 18:57:22 UTC - in response to Message 4289.  
Last modified: 11 May 2020, 19:00:46 UTC

Keith Myers wrote:
New to the project. Was not aware of the project being part of the Pentathlon. Is the availability of work because of the contest? Or is the quantity available normal?
It is normal. See the green line (rts = ready to send) in these graphs by @kiska:
https://munin.kiska.pw/munin/Munin-Node/Munin-Node/results_universe.html


Keith Myers wrote:
Are the BHspin2 tasks a special set for the contest? Or normal?
They are normal.


Keith Myers wrote:
Are the normal tasks all the same in runtimes?
On a given hardware + operating system, run times of most BHspin v2 are variable, but not to a large extent — with the exception of a (usually) small number of tasks which you get (usually) only occasionally which take longer, e.g. maybe ~5 times as long as the usual average.


Keith Myers wrote:
I have two hosts with very large differences in runtimes and I am trying to determine why? The set of tasks given to each host were from different species of work based on my assumption in tasknames. Is there a post explaining the makeup of tasknames? There are no parameters visible for each task when hovering over a task unlike what Milkyway or Einstein shows. Can someone explain the task construction?
I don't have an answer for these two.


Keith Myers wrote:
Or is one computer fundamentally faster than the other even though both have the same clocks.
From a very quick look at your two computers, the difference in task run times is larger than to be expected indeed. The current top 20 valid tasks of each computer were downloaded on two different times on Sunday, May 10. On this day, several batches with tasks with increased run time were emitted. So a possible explanation is indeed that one of your computers happened to receive many tasks out of such a more intensive batch. Check how the two computers fare on other days.


Keith Myers wrote:
That also showed me that a large number of hosts loaded up work for the City Run part of the Pentathlon which ends in a few hours and they are aborting huge quantities of work. Not good BOINC etiquette in my opinion.
I agree with you in principle. But it needs to be pointed out that part of the problem is the increase in average task run times on Sunday, as mentioned. This probably caused many computers to fetch more tasks than desired by the users, due to BOINC underestimating the new task run times. — That said, I guess you may have seen hosts which had a disproportional count of aborted tasks, which would indicate lack of planning (or care) by the computer owners indeed.
ID: 4292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
vpf3

Send message
Joined: 29 Aug 18
Posts: 30
Credit: 244,745,211
RAC: 0
Message 4293 - Posted: 11 May 2020, 19:36:18 UTC
Last modified: 11 May 2020, 20:11:04 UTC

Hello Keith,

since U@H is CPU only, the CPU and the OS are the two biggest factors.
Currently Linux systems are about factor 2 faster than Windows. I tested this on my own and it is software dependent.
But since you use Ubuntu Bionic and Focal, there should be only marginal differences.

Technically your Ryzen 9 3950x (same I have) is a beast on its own. By cores it should be 133%, by frequency 155% compared to your TR 2920x.
But ZEN2+, specially on the 3950x with
- the PBO (better heat control via more temp read areas),
- the better Mainboard upscaling with power supply and
- no cap on memory speeds (main limiting factor on all 1xxx and 2xxx AMDs)

leads to 265% performance in your case. This is more than I expected, but maybe there is some deviation with tasks too.

This sounds amazing but I recognized the same with my change from Ryzen 7 1800x (fastest CPU when I bought it) to Ryzen 9 3950x (got my hands on the very first ones).
The tech scene was amazed about the real life performance of the Ryzen 9 flag ship.
When compared to TR even (e.g. Linus Tech Tips) came to the conclusion that the Ryzen 9 somehow killed all TR 1xxx/2xxx line up completly.

Hope than explains the part that xii5ku couldn't explain.
Greeting from Germany,
vpf3
ID: 4293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 308
Credit: 4,733,484,700
RAC: 1,384
Message 4294 - Posted: 11 May 2020, 20:07:19 UTC - in response to Message 4293.  

Hello vpf3, There is not that much difference in the two hosts other than a slight memory clock disadvantage for the Threadripper.
Both machines are fixed multiplier locked at 4100Mhz. So no PBO or other shenanigans involved.

The TR is only able to run stable memory clocks at 3466Mhz @ CL14 Fast timings. But it does have 4 channel over 2 channel bandwidth advantage over the 3950X.
The 3950X runs 3600Mhz @CL14 Fast timings. So a bit more memory clock speed. The difference in benchmarks shows up in Geekbench4 for example on some previous benchmarks. Cpu clocks aren't quite identical but close enough. Big difference in benchmark ratings. I think it all falls down to advanced microarchitecture for the 3950X.

Threadripper 2920X
Ryzen 9 3950X

A proud member of the OFA (Old Farts Association)
ID: 4294 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
user_212416

Send message
Joined: 6 May 20
Posts: 1
Credit: 50,506,667
RAC: 0
Message 4295 - Posted: 11 May 2020, 20:08:21 UTC - in response to Message 4293.  
Last modified: 11 May 2020, 20:12:25 UTC

Right, this is TR 2000/ Zen+ versus Ryzen 3000/Zen2, which makes a difference here.
ID: 4295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 308
Credit: 4,733,484,700
RAC: 1,384
Message 4296 - Posted: 11 May 2020, 21:19:55 UTC - in response to Message 4295.  

Right, this is TR 2000/ Zen+ versus Ryzen 3000/Zen2, which makes a difference here.

Yep, very obvious here after looking at the Top Hosts list. Nothing but Threadripper 3000's and Ryzen 3000's. The codepath through the FPU must be really optimized for the BHspin2 application.

Does anybody know if it uses advanced SIMD instructions like AVX2 or FMA? Those are the major architectural differences between Zen+ and Zen 2 plus the 256bit wide AVX registers.

A proud member of the OFA (Old Farts Association)
ID: 4296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 30 Oct 16
Posts: 183
Credit: 18,395,933
RAC: 11
Message 4297 - Posted: 11 May 2020, 21:21:24 UTC - in response to Message 4291.  

I found the statistics and top computers and see it dominated by AMD Zen 2 hosts with similar compute times. So the architecture is the difference.


Yes, I've noticed my vastly different CPUs go at different speeds for different projects. I know with my GPUs I can choose single or double precision projects, but I don't know which CPU projects are best for which CPUs. Perhaps there's a list somewhere that's been compiled of which instruction sets work best? I was thinking maybe the SSE2 etc that Boinc reports at bootup might be of use in deciding what project to put on which machine?

That also showed me that a large number of hosts loaded up work for the City Run part of the Pentathlon which ends in a few hours and they are aborting huge quantities of work. Not good BOINC etiquette in my opinion.


Indeed, it's like the selfish people hogging toilet roll and food. My computers have a 3+3 hour buffer - so they're not constantly contacting the server, especially for the likes of Milkyway which has very short tasks. But I see no need to download loads of tasks.
ID: 4297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 30 Oct 16
Posts: 183
Credit: 18,395,933
RAC: 11
Message 4298 - Posted: 11 May 2020, 21:29:06 UTC - in response to Message 4296.  

Right, this is TR 2000/ Zen+ versus Ryzen 3000/Zen2, which makes a difference here.

Yep, very obvious here after looking at the Top Hosts list. Nothing but Threadripper 3000's and Ryzen 3000's. The codepath through the FPU must be really optimized for the BHspin2 application.

Does anybody know if it uses advanced SIMD instructions like AVX2 or FMA? Those are the major architectural differences between Zen+ and Zen 2 plus the 256bit wide AVX registers.


Can you not see this in the Boinc startup? I have this line:
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx smx tm2 dca pbe
ID: 4298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : Upload fails




Copyright © 2024 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek