The System Time

Virtually all the timing done inside BeOS is based on one function: system_time(). The system_time() call returns the current system uptime in microseconds as a bigtime_t (int64) value. When booting the system, the kernel will calculate the CPU frequency and a conversion factor (cv_factor) that tells how many microseconds per CPU cycle will pass (this will actually be a fraction, but this is an implementation detail you'll see below). When calling system_time() the function will read how many cycles the CPU already passed using the tsc, the time stamp counter. The time stamp counter is a counter that is simply incremented by one with each CPU cycle. We can therefore calculate how much time has passed since the system was started by multiplying the current tsc by the cv_factor.

The calculation also includes another variable, the system_time_base. This is a base time stamp that is saved at boot time. It is subtracted from the tsc before calculating the uptime. The reason for this is that we don't want to count the uptime of the CPU before actually booting BeOS (i.e. the time spent waiting for the BIOS or the boot manager). This value will come in very handy later on.

The final calculation for system_time() looks as follows:

				(tsc + system_time_base) * cv_factor
				------------------------------------ = system_time()
				            0x100000000

The system_time_base is a negative value and the 0x100000000 division makes a fraction out of the cv_factor. The actual order in which this is calculated is quite important to not loose precision, but for us that does not really matter, so we can stick with this formula.

2. Problem 1

The actual reason, why the BeOS has problems with CPUs above 2.1GHz is simple: The CPU frequency is a 64bit value. Reading it is not the problem, but the calculation thereafter is simply wrong. It does convert the value to a signed 32bit integer. This gives us a max CPU frequency of 2^31 Hz (2147483648Hz = 2147.483648MHz). Values above this simply wrap around to the negative side again. That's why you get a below zero CPU frequency in the about BeOS window or Pulse. The cv_factor calculated from this is obviously wrong too.

This can more or less easily be corrected by a driver (that is executed in kernel space) that simply calculates a correct cv_factor and overwrites the detection routine with one that handles the 64bits correctly. That is exactly what cpu_fix does. Patching the function for 64bits is not really as easy as it sounds here, as the assembled code requires to be equal or less in size (byte wise) than the original function. But luckily we can use the MMX movq instruction to move the 64bits at once (and still taking the same amount of bytes as a normal move).

For most of us with a standard P4 or corresponding CPU all problems are solved now. But there really remain some closly related problems that I will describe below.

3. Problem 2

Note that this section only applies to BeOS R5 and BONE. Dano based systems use a shared area that exchanges the factors between the kernel and userspace. The cpu_fix driver will reexport the fixed values to this area automatically so that all running applications will use them.

Now, the cv_factor is fixed and the BeOS actually gets the frequency right, the clock works as expected and sound plays fine. But what happens when the CPU is suddenly slower or faster than it was when the conversion factor was calculated (when using SpeedStep for example)? The system_time() is still based on the tsc and cv_factor, but while the cv_factor remains the same, the tsc will be incremented at a different frequency. Calling cpu_fix again (using "rescan cpu_fix") will recalculate the cv_factor for the new CPU frequency and put this value into the kernel. The kernel system_time() is now correct again. But this is only the kernel space. For userspace, where all the applications and servers run, this will not fix anything. The reason for this is simple: To save the time to ask the kernel for every system_time() call from userspace (requiering a context switch to the kernel and back to userspace each time), there is a userspace version of system_time() in libroot. The calculation that is done is the same. But where does it get the cv_factor and system_time_base from? It makes a syscall (_kget_system_time_parms) as soon as the libroot is loaded to request these values from the kernel and save them into the application's address space. Obviously this means that all the running applications will not be fixed by a "rescan cpu_fix". Worse yet, the applications and the kernel will get out of sync.

There are some ways to address this issue:

Don't use a tsc based approach for time keeping at all
Notify all running applications of the change and ask them to update their copy of the cv_factor
Reroute the userspace system_time() to use the (fixed) kernel system_time()

Not using a tsc based time keeping would certainly make sense considering what problems come with those dynamic CPU speeds. But finding the right alternative is not as simple. There is the PIT, the RTC, the APIC timer, the ACPI timer and recently also the HPET was added. I will not explain them here as you can find more details on the net. The general problem with these timers is that they are all external to the CPU (and they have to be to be independant of the CPU clock). This means that they are in most cases less easily accessed than the tsc, which has it's own assembler instruction (rdtsc) and is written directly into CPU registers. Also not all of those timers are present in all systems. Other OSes have a sort of score for each of these timers and then pick the one that fits best. Something like this should be or even has to be done for Haiku, but let's just say that this is way to complicated and to large (code wise) to fit into a patch for R5.

Notifying all running applications would be a very nice solution, but is obviously not possible without either compiling it into the applications or having sources for libbe or libroot for example.

As an actually possible solution for this the third option remains. It is possible to just change (hack / patch) the userspace system_time() to ask the kernel for the time instead of doing its own calculation. This would make the kernel system_time() the only resource for getting time information and fixing the variables in kernelspace would also fix them for all the running applications. This comes with the price I already mentioned. Since we have to ask the kernel to do the calculation we always have to context switch to it and back. We will also utilise a syscal that has to be set up and executed. This is not a nice thing to have, but it is acceptable.

So what do we need to get this working? We will have to alter libroots system_time(), hijack some syscall for our purpose and adjust the kernel to do what we need when this syscall is triggered. Since we do not care about the system_time_base and cv_factor in the applications address space we can use them to transport the values. And since we will not need the _kget_system_time_parms syscall anymore we also have a free syscall to use. Patching the kernel to just call the kernels system_time() instead of copying the system_time_base and cv_factor is not very hard to do. We will just push the current system time into the address where originally system_time_base was (both are 64bit). And read and return this value from system_time() after the syscall is executed. This requires some assembler tinkering, but it is certainly not too complicated.

A problem with this setup is just that the two most central system files have to be edited. When the patches do not work as intended, the system will not be able to boot into a desktop anymore, as the app_server will just hang. Considering that this is the case with the original, and pretty much any approach, too, this should be bearable though.

4. Problem 3

Ok so the clock is fixed and runs fine again, even after reducing or raising the CPU frequency. But when we make the system faster, it still hangs and does not update the screen. Where does this come from? Reconsidering the above formula for calculating the uptime, we will suddenly get very different values after correcting the cv_factor. Imagine a tsc of 100000 and a microseconds per cycle factor of 0.5, leaving out the rest of the formula: system_time() = 100000 * 0.5 = 50000. Now we change the frequency, recalculate the cv_factor and get a factor of 0.75. Suddenly we have: system_time() = 100000 * 0.75 = 75000. This means that all the timeouts or alarms that were set to maybe 50100 would be triggered. This is not really that much of a problem. Some timeouts will happen maybe and maybe some apps will get confused. But imagine the other way around: system_time() = 100000 * 0.25 = 25000. So it is now earlier than it was and we have to wait even longer until updates or alarms are triggered. On such a small scale this is not really a problem, but in practice you will get numbers that are way larger and the difference in time can range from some milliseconds to some minuts or hours.

Remember what I said about system_time_base. It is exactly what we need to solve this problem. What we want to achieve is simple: We want to get exactly the same system_time() value with the new cv_factor as we got with the old one. We can extend the above formula to express this:

				(tsc + system_time_base) * cv_factor   (tsc + new_system_time_base) * new_cv_factor
				------------------------------------ = --------------------------------------------
				             0x100000000                                   0x100000000

Removing the division:

				(tsc + system_time_base) * cv_factor = (tsc + new_system_time_base) * new_cv_factor;

Dividing by the new cv_factor:

				(tsc + system_time_base) * cv_factor
				------------------------------------ = tsc + new_system_time_base
				           new_cv_factor

And finally subtracting the tsc again:

				(tsc + system_time_base) * cv_factor
				------------------------------------ - tsc = new_system_time_base
				           new_cv_factor

I hope I got this right so that maths in school was not a complete waste of time. So we can now calculate a system_time_base that will result in the same initial system_time() and still including the fixed cv_factor. This can now be included into the cpu_fix driver right along fixing the cv_factor itself. Problem solved, we can now change the CPU frequency at will and "rescan cpu_fix" to fix the timing as well as keeping the same uptime.

The System Time

... and how to fix it

0. Preface

1. Basic Concept

2. Problem 1

3. Problem 2

4. Problem 3

5. Conclusions