Message boards : News : Work Units cancelled
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Cruncher Pete

Send message
Joined: 23 Feb 15
Posts: 11
Credit: 154,346,470
RAC: 15,937
Message 261 - Posted: 5 Apr 2015, 23:06:21 UTC

Krzyztof, I appreciate that you are doing your best and keep us informed as to what is going on. I also appreciate that sometimes it is not possible to judge how long it will take to find out what causes failed workunits and how long it will take to rectify.

All I am hoping for is that you keep us informed regularly what is the current progress as you have done to date. Failing to do that will risk loosing your support base of volunteers for they will move to other projects. In addition, please remember that in order to get validated recognition from BOINC-Devs it is a requirement that you keep the users informed of your status and publish your results of your research.
ID: 261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 812
Credit: 143,843,798
RAC: 236
Message 262 - Posted: 6 Apr 2015, 7:12:33 UTC - in response to Message 261.  

Thank you.

Yes, I'm trying to keep you up to date with our work progress.
At the moment, we have identified two separate problems with our new application.

First of them is strange problem with start parameters.
Our WU' are based on incrementation of one of variables (called "idum"). As long as idum are under 300'000 everything is ok, after this WU fails...

Another problem comes with new application (currently tested on test server), where we have slightly changed checkpoint system and it doesn't work correctly (yet).

As soon as we solve both problems we will go forward on production server.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 5 Mar 15
Posts: 4
Credit: 21,537,127
RAC: 3
Message 271 - Posted: 22 Apr 2015, 8:42:13 UTC

You guys need to CAREFULLY prepare and check your WU packets in the future BEFORE you release them into the wild if you like people to continue to help you with your project: I just wasted 12 hrs of CPU time PER CORE on multiple quad- and octacore machines because of your carelessness.

I really don't care about the credits not granted but I do care very much about the energy bill which I have to pay for and which, under these circumstances, just contributes to global warming.

Michael.
ID: 271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Oncle Bob

Send message
Joined: 28 Feb 15
Posts: 2
Credit: 17,364,833
RAC: 29
Message 272 - Posted: 22 Apr 2015, 9:32:26 UTC
Last modified: 22 Apr 2015, 9:33:16 UTC

Hello,

I started some units yesterday on an i7-2600K@4.2 GHz.

The 8 units are still running after more than 12 hours, and are stuck to 99.999%

Those units are universe_xray_12_20000_1_100000_2xx
ID: 272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 812
Credit: 143,843,798
RAC: 236
Message 273 - Posted: 22 Apr 2015, 9:35:34 UTC - in response to Message 271.  

You guys need to CAREFULLY prepare and check your WU packets in the future BEFORE you release them into the wild if you like people to continue to help you with your project: I just wasted 12 hrs of CPU time PER CORE on multiple quad- and octacore machines because of your carelessness.

I really don't care about the credits not granted but I do care very much about the energy bill which I have to pay for and which, under these circumstances, just contributes to global warming.

Michael.

You are completely right and I apologise for this. I have cancelled those WU's as soon as I noticed the bug.

Currently I'm still generating new batch and have checked WU parameters file twice...
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 5 Mar 15
Posts: 4
Credit: 21,537,127
RAC: 3
Message 274 - Posted: 22 Apr 2015, 11:05:01 UTC - in response to Message 273.  

You are completely right and I apologise for this. I have cancelled those WU's as soon as I noticed the bug.

Currently I'm still generating new batch and have checked WU parameters file twice...

Thank you very much for your understanding.

Michael.
ID: 274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr. NeXuS

Send message
Joined: 23 Feb 15
Posts: 1
Credit: 409,978
RAC: 2,790
Message 275 - Posted: 22 Apr 2015, 11:22:18 UTC - in response to Message 273.  

If the erroneous WU`s have been canceled, why they are active in the BOINC Manager?. My computer is still working with them...
ID: 275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 812
Credit: 143,843,798
RAC: 236
Message 276 - Posted: 22 Apr 2015, 12:10:00 UTC - in response to Message 275.  

If the erroneous WU`s have been canceled, why they are active in the BOINC Manager?. My computer is still working with them...

Just update project, BM should stop it immediately after updating.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile AF-Lorraine-bernardP

Send message
Joined: 21 Feb 15
Posts: 6
Credit: 1,052,653
RAC: 0
Message 277 - Posted: 22 Apr 2015, 15:47:43 UTC

error

22/04/2015 17:42:30 | Universe@Home | Computation for task universe_xray_13_20000_1-100000_634_1 finished
22/04/2015 17:42:30 | Universe@Home | Output file universe_xray_13_20000_1-100000_634_1_2 for task universe_xray_13_20000_1-100000_634_1 absent
22/04/2015 17:42:30 | Universe@Home | Output file universe_xray_13_20000_1-100000_634_1_3 for task universe_xray_13_20000_1-100000_634_1 absent
22/04/2015 17:42:30 | Universe@Home | Output file universe_xray_13_20000_1-100000_634_1_5 for task universe_xray_13_20000_1-100000_634_1 absent
ID: 277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile bcavnaugh
Avatar

Send message
Joined: 28 Mar 15
Posts: 8
Credit: 13,516,522
RAC: 1,775
Message 278 - Posted: 22 Apr 2015, 16:15:05 UTC

So far 76 Completed and 66 the Server cancelled.
And now back to not work. I hope we get this some good working tasks soon.
ID: 278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tnt-tiesi

Send message
Joined: 24 Mar 15
Posts: 1
Credit: 17,111
RAC: 0
Message 279 - Posted: 22 Apr 2015, 17:13:25 UTC

all WU´s with - calculation error.

Sorry guys - you make a terrible work at the moment. I stop this Project now until the next 3-4 month.

Hope you will learn in the meantime .
ID: 279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 26 Mar 15
Posts: 52
Credit: 1,737,270
RAC: 0
Message 280 - Posted: 22 Apr 2015, 21:46:41 UTC
Last modified: 22 Apr 2015, 22:04:05 UTC

It always creates _0, _1 and _4 files but never _2. _3 and _5 (same existing / missing numbers for all 16 "xray_13" results I had today).

Files in the slot directory of a result at 97.5% :

boinc_task_state.xml
checkpoint.dat2
checkpoint.dat3 <= same content as checkpoint.dat2
data.dat
error.dat <= empty
error.dat2 <= empty
error.dat3 <= empty
gwMT.dat
gwMT.dat2
gwMT.dat3 <= same content as gwMT.dat2
gwWIND.dat
gwWIND.dat2
gwWIND.dat3 <= same content as gwWIND.dat2
init_data.xml
job.xml
param.in
stderr.txt
tmp5467.dat <= empty

I saved those so if you need any of them for the analysis, I can put it somewhere for download. I wonder what purpose all the duplicate files might serve :-/

The boinc_lockfile was locked (couldn't copy it) but that never has any contents so it plays no role.

(btw.: if I would test my software like this, it would probably not be easy for me to get a job)
ID: 280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Krzysztof Piszczek - wspieram ...
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 4 Feb 15
Posts: 812
Credit: 143,843,798
RAC: 236
Message 282 - Posted: 23 Apr 2015, 8:48:11 UTC - in response to Message 280.  

I think I have sorted problem out and I see successfully finished WU's in my panel right now. Also, my machine reports finished tasks.
Krzysztof 'krzyszp' Piszczek

Member of Radioactive@Home team
My Patreon profile
Universe@Home on YT
ID: 282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 14,116
Message 284 - Posted: 23 Apr 2015, 14:26:27 UTC - in response to Message 282.  

I just finished the first one of the new series. It took 2 hours 11 minutes on a Haswell i7-4771. Good work. All new projects have start-up problems (and old projects too).
ID: 284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Quebec]MDodier

Send message
Joined: 7 Mar 15
Posts: 6
Credit: 9,006,760
RAC: 25,625
Message 285 - Posted: 23 Apr 2015, 15:43:53 UTC - in response to Message 284.  

I abort 8 workunits, I was "crunching" for 24 hours..... they were at 100% for hours...

I abort the 81 others workunits I had in the "Cache"

All xray_12 version

I'm mad.

I will give another try on a new version.

It better be working properly, as I will pass to others projects for a while.............

MDodier
ID: 285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 28 Feb 15
Posts: 253
Credit: 200,562,581
RAC: 14,116
Message 286 - Posted: 23 Apr 2015, 16:21:00 UTC - in response to Message 285.  

All xray_12 version

The latest ones are 0.03 Universe X-ray sources v2 universe_xray_15
ID: 286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ioannis

Send message
Joined: 25 Apr 15
Posts: 1
Credit: 220,556
RAC: 0
Message 329 - Posted: 25 May 2015, 17:52:58 UTC

I have the message WRONG IN CALCULATIONS on universe x-ray sources 2 0.03 universe_xray_17_20000_150001
and so on
ID: 329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : News : Work Units cancelled




Copyright © 2022 Copernicus Astronomical Centre of the Polish Academy of Sciences
Project server and website managed by Krzysztof 'krzyszp' Piszczek