Multithreading and browsers
Now Easter's done with I turn my thoughts back to work. Really helpfully at 3.40am this morning. We've been having problems with some client side Javascript that is not working the way it should on all a machines on all networks. This morning I suddenly realized what was going on.
The fault reported was vague, and unreproducable.
Other problems we had had with the setup meant that it had been masked under other errors, but now most of them were receding, this one was becoming more apparent.
What was happening was the most common type of error you see around the web, that is not an outright typo - a timing problem.
In HTML you can include script in a js file. You can do this many times on the page. The browser to increase performance starts a seperate thread for each script block that is declared. If all the script blocks are on the same page, chances are they will all be loaded in sequence. The onload event can then be used to call any of the functions and everything will be initialized.
The problem occurs when the script you wish to call is in another frame. This code may not be loaded due to this multithreaded nature the scripts are run.
Of course, maybe it is loaded. And there in lies the problem: it's all too easy to find something working on one particular configuration, and totally miss that this is going on. Common ways for this to manifest itself are function or variable not found.
I get this problem surprisingly often - especially when wanting to load data into a frame and then find out when it had loaded. These days I tend to use callbacks at the end of the script to continue the flow. Something like:
var o;
function n()
{
...
}
if(window.parent.Callbackfn)
window.parent.Callbackfn();
This works quite well as a solution, but what if you do not have control over the script because it was created by a third party, or already deployed?
To solve that problem you need to be a little more creative. Either by detecting the readyState of the script block, with a setTimeout loop or by checking that a variable has been initialized before attempting to run the script. Even the first mechanism can have issues as if the page declaring the block has not been initialized, that too has to be caught and handled.
Oh and if its an ActiveX control, they seem to often be loaded after the body onload event, so don't think a single page makes the problem go away entirely.
All in all it quickly becomes a suprisingly complicated problem, that is hard to test. I personnally think that the whole thing has been made harder than it needs to. There are insufficient flags in HTML to say "Only run this code when everything else is done", and the events raised are not as clear as they should be. A complex client side application in DHTML can have ticking timebomb error, that only happens when it reaches the furthest ends of the net.
Invariably the client finds them though...
Which brings me round to the whole multithreading debate. These new dual core cpus that are on the way are obviously going to be best when doing two things at once. This means that complex code like that roughly described above is required to get maximum benefit out of them. It is noticable in Microsoft applications that multithreaded UIs are avoided everywhere. Browse a network - it dies. Look on a floppy it dies. I suspect the main reason is the difficulties involved with getting this type of code to work correctly. It worries me that a company with the testing resources of MS cannot make a multithreaded file browser, but I think it highlights the problem quite well:
Multithreading is hard, and should only be attempted in a controlled fashion.
So anyway hopefully this bug is fixed, and everything will start to work again. The fix involved waiting at a certain point to make sure that the other frame was really ready to do something, and not just pretending to be ready.
I wonder over the next few years how many more complex race conditions we'll force users to find, now that we have multiple browsers, CPUs and operating systems. And how long it'll be before those bugs are found!
The fault reported was vague, and unreproducable.
Other problems we had had with the setup meant that it had been masked under other errors, but now most of them were receding, this one was becoming more apparent.
What was happening was the most common type of error you see around the web, that is not an outright typo - a timing problem.
In HTML you can include script in a js file. You can do this many times on the page. The browser to increase performance starts a seperate thread for each script block that is declared. If all the script blocks are on the same page, chances are they will all be loaded in sequence. The onload event can then be used to call any of the functions and everything will be initialized.
The problem occurs when the script you wish to call is in another frame. This code may not be loaded due to this multithreaded nature the scripts are run.
Of course, maybe it is loaded. And there in lies the problem: it's all too easy to find something working on one particular configuration, and totally miss that this is going on. Common ways for this to manifest itself are function or variable not found.
I get this problem surprisingly often - especially when wanting to load data into a frame and then find out when it had loaded. These days I tend to use callbacks at the end of the script to continue the flow. Something like:
var o;
function n()
{
...
}
if(window.parent.Callbackfn)
window.parent.Callbackfn();
This works quite well as a solution, but what if you do not have control over the script because it was created by a third party, or already deployed?
To solve that problem you need to be a little more creative. Either by detecting the readyState of the script block, with a setTimeout loop or by checking that a variable has been initialized before attempting to run the script. Even the first mechanism can have issues as if the page declaring the block has not been initialized, that too has to be caught and handled.
Oh and if its an ActiveX control, they seem to often be loaded after the body onload event, so don't think a single page makes the problem go away entirely.
All in all it quickly becomes a suprisingly complicated problem, that is hard to test. I personnally think that the whole thing has been made harder than it needs to. There are insufficient flags in HTML to say "Only run this code when everything else is done", and the events raised are not as clear as they should be. A complex client side application in DHTML can have ticking timebomb error, that only happens when it reaches the furthest ends of the net.
Invariably the client finds them though...
Which brings me round to the whole multithreading debate. These new dual core cpus that are on the way are obviously going to be best when doing two things at once. This means that complex code like that roughly described above is required to get maximum benefit out of them. It is noticable in Microsoft applications that multithreaded UIs are avoided everywhere. Browse a network - it dies. Look on a floppy it dies. I suspect the main reason is the difficulties involved with getting this type of code to work correctly. It worries me that a company with the testing resources of MS cannot make a multithreaded file browser, but I think it highlights the problem quite well:
Multithreading is hard, and should only be attempted in a controlled fashion.
So anyway hopefully this bug is fixed, and everything will start to work again. The fix involved waiting at a certain point to make sure that the other frame was really ready to do something, and not just pretending to be ready.
I wonder over the next few years how many more complex race conditions we'll force users to find, now that we have multiple browsers, CPUs and operating systems. And how long it'll be before those bugs are found!