Message boards :
Number crunching :
extreme long wu's
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Are you saying that only about 10% of computers are experiencing issues? It is strange because there was a performance drop by 70% during last month: http://universeathome.pl/universe/history.php Because is no new tasks really since then. Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Please help me understand this. I have a Universe@Home task with the following properties: App: Universe BHspin v2 0.01 Task: universe_bh2_160803_107_1_20000_1-999999_940000_1 URL: http://universeathome.pl/universe/result.php?resultid=21698177 Elapsed: 11:32:39 CPU Time: 17:42:40 CPU Time since checkpoint: 17:10:42 Estimated time remaining: 1d 00:00:32 Status: Will not suspend correctly, and looks like it'll never complete. Questions: 1) Does this task look like it is a "problem" task that will never complete? 2) What information can I get you to help you solve the problem? Note: I copied the slot and project folder, so I might be able to run it outside of BOINC (if you give me instructions how), if that helps. 3) Is any progress being made to fix this issue? Setting "No New Tasks" again ... :/ |
Send message Joined: 1 Oct 16 Posts: 32 Credit: 268,033 RAC: 0 |
You're still trying Jacob. Fair enough. I don't crunch here anymore, his attitude is damaging to the entire BOINC environment. It is a real shame. |
Send message Joined: 28 Feb 15 Posts: 253 Credit: 200,562,581 RAC: 0 |
Maybe just lucky here, but no problems since I resumed on 27 April, and almost 300 validated by now. That includes mainly Universe BHspin v2 v0.01 but also some Universe Ultraviolet reionization v0.05. And I just started my first Universe QuarkStars v0.03 for a while, and they may have had problems before but I don't quite recall what, so we will see. http://universeathome.pl/universe/results.php?hostid=68706 But I run Linux on that machine 24/7 and don't reboot, which may help. |
Send message Joined: 30 Apr 17 Posts: 8 Credit: 64,583 RAC: 0 |
If there is anything you'd like me to test or try, tell me what to do and I'll do it. I want it solved, and am willing to try things for you. i have a feeling that defined library calls or something could be the cause of your problem.... #define Jippers=Nod24 #call Blender.Jazz ill defined and so redefined .... in code without clear. ***** AMD Platform Optimization - please read for all developers https://community.amd.com/thread/213045 http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Optimizing-For-AMD-Ryzen.pdf http://www.agner.org/optimize/ http://www.agner.org for example : Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx sse4a osvw xop wdt fma4 topx page1gb rdtscp bmi1 http://esa-space.blogspot.com/ boinc optimization > http://esa-space.blogspot.rs/2017/04/boinc.html T/C/RNG/Entropy Drivers and sources > http://esa-space.blogspot.ru/2017/04/rng-and-random-web.html |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Please help me understand this. For BHspin for last 45k units I found 4 with "EXIT_TIME_LIMIT_EXCEEDED" and some with "Unknown error number" (about a 100 work units). To start outside BOINC client just simply start executable file in command line. But if tasks are not make checkpoints I strongly suspect that is frozen... Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
If I try to run the BHspin2_1_windows_intelx86.exe ... from an Admin Command Prompt, it runs for a couple seconds, then quits, and this is in the stderr.txt: 13:11:18 (12760): BOINC client no longer exists - exiting 13:11:18 (12760): timer handler: client dead, exiting Should I try anything else? More importantly: Is any progress being made to fix this issue? |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Delete all files except param.in and executable file from folder then run it again. Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Alright. I now have 2 standalone instances of the exe running: 1) "From beginning": Started with a folder that only had BHspin2_1_windows_intelx86.exe and param.in 2) "From checkpoint": Started with a folder that had everything the BOINC slots folder had, except I removed: boinc_lockfile, boinc_task_state.xml, init_data.xml, job.xml We'll see how long each takes... and see if the "From checkpoint" one gets stuck in an infinite loop. If so, I could send you a .zip of the files, for you to try. I feel like I'm trying to solve this problem alone, sometimes. |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
How long is this task supposed to run? So far, it has run for 15.6 hours, even from a fresh standalone folder outside of BOINC, which only had the .exe and the param.in file... and is still running. How long should I let it continue, for it to be useful for us to diagnose the issue? OS: Windows 10 Pro x64, Insider Fast Ring, Build 16184 CPU: Intel i7-5960X Executable: BHspin2_1_windows_intelx86.exe (Executed from an Admin Command Prompt) param.in: BHspin2 v:160803 SET num_tested 20000 SET hub_val 1000 SET idum -940000 SET OUTPUT 3 SET Mmina 5.0 SET Mminb 3.0 SET golambda 1 SET Beta 0.5 SET Fa 1.0 SET Sigma3 265 SET Sal -2.3 SET SS 0 SET ZZ 0.0001 error.dat: error: in Renv_con() unknown Ka type: 1, iidd_old: 757382error: in Menv_con() unknown Ka type: 1, iidd_old: 757382 checkpoint.dat: 6000 703458 0 1 2 log.txt: 00:00:00 00:00:00 PROGRAM START: Tue May 02 22:52:30 2017 |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
@Jacob Klein Please, stop it at all, please send zipped folder to me (krzyszp @ interia . pl), because it shows that particular function have problem. Also, if you experience long delay in checkpoints, please before you cancel it send mi whole folder to email provided above (if you just cancel it I don't have possibility to read files). Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Email sent. I hope that it helps you! Question: You said "if you experience long delay in checkpoints" ...... but I need more info. For a task that is working correctly: What is the longest amount of expected CPU Time between 2 checkpoints? |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
I think if time between two checkpoint is above two hours it is enough... I suspect - because it this application relay on some random factors. Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Okay. If CPU Time between checkpoints is > 2 hours, it is a problem. Do examples (like the one I emailed to you), help you to solve the problem?? |
Send message Joined: 4 Feb 15 Posts: 847 Credit: 144,180,465 RAC: 0 |
Definatelly, because I get proper data to find source of the problem... Krzysztof 'krzyszp' Piszczek Member of Radioactive@Home team My Patreon profile Universe@Home on YT |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Okay. If CPU Time between checkpoints is > 2 hours, it is a problem. Have you learned anything new, using the example that I sent to you? I put forth effort to test it for hours, and then to get it to you, was hoping you would have replied by now. |
Send message Joined: 13 May 15 Posts: 87 Credit: 4,320,738 RAC: 0 |
Just a note I had 3 BHSpin together with this issue. I aborted two yesterday and the third one today after letting it run (it was still early in the crunching). I don't remember if there were others as I recently realized I hadn't turned Universe back on and didn't really pay attention other than to note I had Ultraviolet and Quarkstars too (which is good to see :) ). ~Y |
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
Okay. If CPU Time between checkpoints is > 2 hours, it is a problem. Hello? Progress? |
Send message Joined: 5 Feb 17 Posts: 6 Credit: 2,135,900 RAC: 0 |
|
Send message Joined: 21 Feb 15 Posts: 53 Credit: 1,385,888 RAC: 0 |
This is still an issue! :( Can you PLEASE put more effort to STOP WASTING OUR CPU RESOURCES on this bug?? FIX IT! I'm now PERMANENTLY setting "No New Tasks" for your project, because it ABUSES my resources! I Just had another task, where it wasn't responding to suspend, and wasn't checkpointing, despite running for many hours. I confirmed that running it standalone exhibited the same problematic behavior. I aborted it. Details below. OS: Windows 10 Pro x64, Insider Fast Ring, Build 16232 CPU: Intel i7-5960X Executable: BHspin2_1_windows_intelx86.exe (Executed from an Admin Command Prompt) param.in: BHspin2 v:160803 error.dat: error: in Renv_con() unknown Ka type: 1, iidd_old: 687281error: in Menv_con() unknown Ka type: 1, iidd_old: 687281 error.dat2 warning: derivative from dlnRdt() not accurate, error: 1e+030, K: 8, 291867 error.dat3 warning: derivative from dlnRdt() not accurate, error: 1e+030, K: 8, 291867 checkpoint.dat: 5000 600133 0 1 2 log.txt: 00:00:00 00:00:00 PROGRAM START: Fri Jun 30 09:49:13 2017 |