Message boards : Number crunching : Odd performance on different computers
Message board moderation

To post messages, you must log in.

AuthorMessage
CallMeFoxie

Send message
Joined: 14 Mar 20
Posts: 5
Credit: 46,801,300
RAC: 31,626
Message 4181 - Posted: 4 May 2020, 9:32:53 UTC

Hi, I'm crunching bh2 spin on several different types of computers and I have a very odd performance :)

i7-3770k 3.7GHz, 4c8t, 32GB RAM (limited to 8GB in LXC, 4 tasks in parallel) crunches a task in less than 2 hours
Ryzen 2700 3.3GHz, 8c16t, 32GB RAM (unlimited memory, 12 tasks in parallel) crunches a task in almost 3.5 hours?!
PIne64+ 1.152GHz (Cortex A53, 4c4t, 2GB RAM, 4 tasks in parallel) takes about 16 hours (I can understand that one, slower cores and very slow clock)

why is Ryzen so much slower when it is much newer? Is there some compile time flag optimized for Intel? I've even tried disabling smp on the Ryzen machine but no change there, still over 3 hours per tasks.

Cheers
Ashley :)
ID: 4181 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 748
Credit: 140,962,465
RAC: 8,986
Message 4182 - Posted: 4 May 2020, 10:00:49 UTC - in response to Message 4181.  

I see only ARM machines on your account, did you use another account for other architectures?
Also, both Intel and AMD are Linux based?

Just for info - our tasks not use much memory, but ULX can use some space for temporary files and slow disks can delay finishing time (not much, but always a little).
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 4182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CallMeFoxie

Send message
Joined: 14 Mar 20
Posts: 5
Credit: 46,801,300
RAC: 31,626
Message 4183 - Posted: 4 May 2020, 11:39:35 UTC - in response to Message 4182.  
Last modified: 4 May 2020, 11:40:14 UTC

yeah using a team account atm

The Ryzen is Windows based, would it make *that* much difference?

And the Ryzen is NVMe SSD whereas the i7 is slow 7k2 RPM HDDs. Since it uses very little RAM I'd expect it to trash cache at least if not something else, but even there the Ryzen wins :)
ID: 4183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 748
Credit: 140,962,465
RAC: 8,986
Message 4184 - Posted: 4 May 2020, 12:20:06 UTC - in response to Message 4183.  

The Ryzen is Windows based, would it make *that* much difference?

Yes.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 4184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CallMeFoxie

Send message
Joined: 14 Mar 20
Posts: 5
Credit: 46,801,300
RAC: 31,626
Message 4185 - Posted: 4 May 2020, 12:20:39 UTC - in response to Message 4184.  

Wow. In that case I may as well move the BOINC to a linux virtualized guest and see how that fares! :) thanks for the info.
ID: 4185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 25 Mar 16
Posts: 12
Credit: 361,482,367
RAC: 248,035
Message 4187 - Posted: 4 May 2020, 14:35:11 UTC

With the Ryzen using Linux, WU durations should come down to around an hour. Unless the i7 is doing other CPU stuff as well, running just 4 threads benefits run times as well vs. 8 threads running on SMT cores.
ID: 4187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mikey
Avatar

Send message
Joined: 4 Apr 15
Posts: 26
Credit: 28,896,567
RAC: 0
Message 4190 - Posted: 5 May 2020, 15:06:23 UTC - in response to Message 4181.  

Hi, I'm crunching bh2 spin on several different types of computers and I have a very odd performance :)

i7-3770k 3.7GHz, 4c8t, 32GB RAM (limited to 8GB in LXC, 4 tasks in parallel) crunches a task in less than 2 hours
Ryzen 2700 3.3GHz, 8c16t, 32GB RAM (unlimited memory, 12 tasks in parallel) crunches a task in almost 3.5 hours?!
PIne64+ 1.152GHz (Cortex A53, 4c4t, 2GB RAM, 4 tasks in parallel) takes about 16 hours (I can understand that one, slower cores and very slow clock)

why is Ryzen so much slower when it is much newer? Is there some compile time flag optimized for Intel? I've even tried disabling smp on the Ryzen machine but no change there, still over 3 hours per tasks.

Cheers
Ashley :)


On the i7 youhave 4c8t and you run 4 tasks at a time...time to crunch under 2 hours
on the AMD you have 8c16t and you run 12 tasks at a time...time to crunch almost 3.5 hours

Notice anything funny there...you are running too many threads on the AMD and are using the virtual cores for 4 of the tasks reducing the efficiency of each task, cut it back to 8 threads at a time and see if it doesn't speed up alot.
ID: 4190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 236
Credit: 181,888,581
RAC: 197,147
Message 4192 - Posted: 5 May 2020, 16:11:32 UTC - in response to Message 4190.  

Notice anything funny there...you are running too many threads on the AMD and are using the virtual cores for 4 of the tasks reducing the efficiency of each task, cut it back to 8 threads at a time and see if it doesn't speed up alot.

(1) Cutting it back to 8 threads (in effect, 8 full cores) increases the speed of each task, but it does not increase the efficiency. It reduces it. The total throughput will be greater if you use the virtual cores (all 16 of them).

(2) Yes, Linux is better than Windows.
Use Windows on WCG/MCM, or Rosetta, or if you want astronomy on MilkyWay n-body or Einstein. They all do well on Windows.

(3) I have crunched a lot of BHspin v2 on both Ryzens and Intel. The speed on a Ryzen 2700 is almost the same (maybe slightly more) than on an i7-3700 under comparable conditions; i.e., using the same percentage of the cores.

(4) At the moment, I am using a Ryzen 2600 under Ubuntu 18.04.4, with 11 cores devoted to Universe and 1 core reserved for a GPU, but not in use at the moment. I am averaging 1 hour 33 minutes on BHspin v2 and about 32 minutes 31 seconds (very consistently) for ULX.
ID: 4192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CallMeFoxie

Send message
Joined: 14 Mar 20
Posts: 5
Credit: 46,801,300
RAC: 31,626
Message 4198 - Posted: 7 May 2020, 5:57:43 UTC - in response to Message 4192.  

yup, drained the windows host, fired up a linux VM with 12c/16GB RAM and the tasks are much faster than on windows (even 25 - 30% faster than on the i7-3770) on all U@H, Rosetta, WCG, ...

and SMT may slow down about 10% - 15% each thread BUT you get double the amount of them in parallel. The final number of crunches/second is much higher.
ID: 4198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Markus Torstensson

Send message
Joined: 28 Apr 20
Posts: 2
Credit: 7,430,367
RAC: 0
Message 4203 - Posted: 7 May 2020, 9:20:26 UTC - in response to Message 4181.  

Maybe you got heating issues? It will likely downclock itself to not get damaged.

I checked a few of the new AMD processors, and it seems like all of them are wattage beasts (serveral hundred watts)
ID: 4203 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CallMeFoxie

Send message
Joined: 14 Mar 20
Posts: 5
Credit: 46,801,300
RAC: 31,626
Message 4212 - Posted: 7 May 2020, 12:24:47 UTC - in response to Message 4203.  

nope it was pure windows vs linux apparently. No idea why but it's just the way it is.

and R2700 on stock is nothing hard to cool, I had overclocked i5-2500k cooled down before with this cooler :P
ID: 4212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,181,367
RAC: 2,849,006
Message 4290 - Posted: 11 May 2020, 16:03:31 UTC - in response to Message 4184.  
Last modified: 11 May 2020, 16:07:31 UTC

The Ryzen is Windows based, would it make *that* much difference?

Yes.

Two new hosts working on BHSpin2 tasks. Both Linux based. Both running the same cpu clocks. One host is 3X faster than the other.
AMD Ryzen9 3950X runtimes = 37 minutes
AMD Threadripper 2920X runtimes = 93 minutes

Are there different species of tasks with very large differences in runtimes? Is a "311" task very different from a "308" task?

[Edit] Does the BHspin2 application use any advanced SIMD instruction like SSE3/4 or AVX/2?

[Edit2] Neither host is overcommitted on cpu core usage.

A proud member of the OFA (Old Farts Association)
ID: 4290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 10 May 20
Posts: 141
Credit: 1,679,181,367
RAC: 2,849,006
Message 4446 - Posted: 31 Aug 2020, 0:12:55 UTC - in response to Message 4290.  

The Ryzen is Windows based, would it make *that* much difference?

Yes.

Two new hosts working on BHSpin2 tasks. Both Linux based. Both running the same cpu clocks. One host is 3X faster than the other.
AMD Ryzen9 3950X runtimes = 37 minutes
AMD Threadripper 2920X runtimes = 93 minutes

Are there different species of tasks with very large differences in runtimes? Is a "311" task very different from a "308" task?

[Edit] Does the BHspin2 application use any advanced SIMD instruction like SSE3/4 or AVX/2?

[Edit2] Neither host is overcommitted on cpu core usage.

Main difference between the two hosts was the OS. Ubuntu 20.04 for the 3950X and Ubuntu 18.04 for the 2920X. Soon as I upgraded the 2920X to Ubuntu 20.04 the times between the two hosts matched pretty much. The 3950X does have a 100Mhz core clock advantage though. Both hosts running a fixed all-core clock. Both hosts running 90% of all cores. So the throughput of the 32 core is better of course.

I am not the only one to notice the almost 2X speed improvement of moving from Ubuntu 18 to Ubuntu 20. Three other team members saw the exact same improvement. That was on both Intel and AMD hardware. And that was with the same 5.4.0-42-generic kernel on both OS'.

A proud member of the OFA (Old Farts Association)
ID: 4446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 19 Aug 18
Posts: 11
Credit: 2,255,279
RAC: 0
Message 4460 - Posted: 3 Sep 2020, 20:22:48 UTC - in response to Message 4446.  

I confirm the same improvement due to the upgrade from OS guest Ubuntu 18.04 to 20.04 for my host ID 544662 : with AMD A10-7800 Radeon R7.OS Windows host 2004 with virtualbox 5.2.32 .
The average cpu time decreased from 151 minutes to 71 minutes.(doubled performance)
The measured speed for floating point calculations has increased from 2.64 to 3.19 billion operations by second (according to the project estimate).
Trying to compare the situation with other crunchers , i tried to select the fastest computers , present on this project , only looking at the cpu times of the work units processed.

Top 5 :

Host ID 522605 :
Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
Arch Linux [5.7.11-arch1-1|libc 2.31 (GNU libc)]
Host ID 549269 :
AMD Ryzen 9 3950X
Gentoo Base System release 2.7 [5.8.5-gentoo-x86_64|libc 2.32 (Gentoo 2.32-r1 p1)]
Host ID 569321 :
Intel Core Processor (Skylake, IBRS)
Fedora 32 (Workstation Edition) [5.7.12-200.fc32.x86_64|libc 2.31 (GNU libc)]
Host ID 506492 :
AMD Ryzen 9 3950X
Linux Mint 20 [5.4.0-45-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]
Host ID 570669 :
Intel(R) Core(TM) i5-9400 CPU @ 2.90GHz
Ubuntu 20.04.1 LTS [5.4.0-44-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9)]

I don't know if there is overclocking...

So without to try to say which is the best configuration , i only advise windows crunchers to at least run this project with a VM and a Linux OS guest if they want to not waste their resources .

To install Boinc inside a VM is not so complicated :

Ubuntu
Mint
ID: 4460 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rsNeutrino

Send message
Joined: 1 Nov 17
Posts: 16
Credit: 167,440,267
RAC: 156,330
Message 4461 - Posted: 4 Sep 2020, 2:58:35 UTC - in response to Message 4460.  

I advise against using Mint because I tried both Mint and Ubuntu and Mint had some tasks running at Windows speed, so no performance increase at all, with a runtime of around 4 hours, and some at Ubuntu speed with around 1 hour runtime.
Maybe it is also hardware dependent how often it works, with things like CPU cache.
Or Mint got an update in the last 2-3 months introducing the changes Ubuntu 20.04 already got.
But I have yet to encounter an instance where Ubuntu 20.04 DIDN'T work, so I think it would be the safer and easier route to just use that.

Also, did you try setting the thread count for the VM (VirtualBox) near the total threads of the Windows machine?
I had massive lag problems with the newest version, wenn setting 13 or more of 16 threads usable by the VM. Only Hyper-V works smoothly.
ID: 4461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PHILIPPE

Send message
Joined: 19 Aug 18
Posts: 11
Credit: 2,255,279
RAC: 0
Message 4463 - Posted: 4 Sep 2020, 19:40:43 UTC - in response to Message 4461.  
Last modified: 4 Sep 2020, 19:41:53 UTC

Sorry , rsNeutrino , but i don't have the appropriate hardware to do a bedtest .
Maybe another cruncher could answer you...
But i know that the virtualbox performances decrease with the number of VMs running simultaneously.This bad effect appears mainly on the computers with numerous processors.It's negligible for small computers with 4 threads for instance.
ID: 4463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Odd performance on different computers




Copyright © 2021 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek