Friday, January 24, 2014

Sampling Interval, Smoothing, and Rate-of-Rise

Let me try to explain my perspective on the processing of the raw data gathered from a temperature probe implanted into a coffee roaster. To understand what's going on in roast logging one has to look at the full chain of data processing from the probes to the screen. I assume here that all data is transmitted flaw-less from component to component and will especially avoid mixing this discussion with the spike issue as this can be treated separately.

The active parts in the chain are
  • the probes
  • the meter
  • the sampling module (a component of Artisan)
  • the drawing module (another component of Artisan)

Obviously, the chain builds from the probes, where the raw data is gathered, via the connected meter to the sampling and drawing modules of the logging app.


There are at least two possible modes of operation possible to organize the information transfer between the meter and the logging app. Either the logging app is explicitly requesting a reading from the meter, or the meter is constantly producing readings and the app takes those needed. Artisan uses this first request-reply pattern of communicating with the meter, while RoastLogger takes the second approach.

Data is also transferred from the probe to the meter. The probe measures the "analog" temperature and converts this in case of thermocouples into a voltage difference caused by the clever use of two different metal in the wires. The meter now has the job to transfer this analog data into a digital temperature value. This is done by the sampling loop within the meter that constantly takes analog voltage deltas, quantifies those into digital values and applies some further processing. This loop runs quite fast (depending on the processor power of the meter) and delivers a large number of values per second. Due to noise induced by the measurement apparatus and the quantification, this results in a stream of values fluctuating around the real temperature value of the physical system (in our case the bean temperature). To be able to show a stable value on the display of the meter (for those meters that feature one), this stream of values is further process by applying some mathematical smoothing (see below for a further discussion on smoothing). Note that some meters apply different smoothing to the values displayed on then meter and to the values communicated via their communication link. The Amprobe is one example that shows quite stable values on the display, but a lot of "noise" in a connected app. Further note, that in some meters you can influence the processing applied on the device. The smoothing done by the aArtisan sketch of the Arduino/TC4 can be configured at compile time. The quantification applied by the Phidgets can be configured via a system control panel.

However, processing in the app and on the device is not all of the processing that happens in the chain. The probe itself is also doing some "processing", which depends on its exact physical construction. The heavier the probe is shielded, the more are the outside temperatures "smoothed" by thermal lag before they are communicated to the connected meter. This corresponds to the wisdom that unshielded probes "react" faster to temperature changes than heavily shielded once. See also the Probe Guide recently posted by Cropster.

Let's assume the probe we observe is placed in the bean mass of a roaster measuring (an approximation of) the bean temperature (BT) during a roast. At any time t_0 there is a temperature RBT_0 corresponding to the real bean at that time. The probe picks this up by "measuring" the outside temperature of the bean at t_0 reporting BT_0 at the time t_0. However, this value BT_0 depends not only RBT_0, but potentially also on previous temperatures of the probe. So if the bean temperature changed just at the time of measurement to RBT_0 from another temperature RBT_p that is much lower, the probe might read a value somewhere between those two temperatures at t_0 depending on its amount of thermal lack. This "thermal lack" behavior is essentially a physical smoothing process. So let's turn to smoothing now.


You might have observed that temperatures are somewhat fluctuating in the roasting app window although the roasting machine seems to hold a constant temperature. This is due to measuring noise that exist in all measuring systems. The world is not that stable as it pretends to be. So let's assume we have a signal v as a series of discrete values v = v_0,..,v_n equally distributed over time (corresponding to the measurements recorded by the sampling module of Artisan during a roast). One mathematical method to smooth this signal is based on convolution of a scaled window involving reflected copies of the signal. This processing leads to a smoothed signal s = s_0,..,s_n that is less fluctuating (has a lower standard derivation per n). Nice. However, one has to keep in mind that s contains less information than v. This can be understood easily if one look at the extremes. An extreme form of s would be just a straight line. This line obviously contains less information than v (in case v was not already a straight line). Just imagine that v started and ended at a low temperature, but has an intermediate "hill" of some higher temperature. That "hill" get's smaller and smaller, the more smoothing we apply. So smoothing results in information lose.

Smoothing can have another disadvantage too. In the smoothing process just discussed assumed that the full signal v is available. It applied smoothing in a static way to the full signal. In the case of roast logging, only the prefix of the signal v is available during roasting, up to the last reading we just did. At any time in the roast, we do not know how that signal is evolving in the future. So smoothing can only be applied based on the previous values. A simple process of smoothing that can be applied in such "live" situations is a running average. At any point in the roast, we take the last reading r_i as well as some previous readings r_i-1, r_i-2,..r_i-n into account. In the simplest case we just take the sum of all r_i and divide that by n+1 to compute s (i.e., s_i = (r_i + r_i-1 + .. + r_i-n)/(n+1) at any time t_i). Choosing a higher n results in a more heavily smoothed signal s. Since we apply smoothing only "left-sided", so taking the past, but not the (right-side) future into account, to compute the smoothed value s_i at t_i, the smoothed signal will shift to the left. This is based on the "time-lag" introduced by the smoothing. Eg. a constant temperature in the roaster that is abruptly increased at time t, will result in a smoothed signal that shows this temperature increase at a time t+tl, where the time lag tl depends on the amount of smoothing applied. Note that the smoothing via convolution does not produce any time lag (if done correctly), but cannot be applied live.

As said before, smoothing is done in all parts of the processing chain (incl. the probe). The probe, meter and sampling module of the app have to apply live smoothing with the consequence of producing some time lag. This time lag depends on the size of the history taken into account. A probe with less mass reacts faster, applies less "history" and smoothing to the signal. A meter usually runs a very fast sampling loop taking a lot values per time (some per ms). Applying smoothing by taking n of those values into account introduce a lower time-lag than smoothing applied by the app on n value, as it runs on a way slower sampling interval (down to 1s). This indicates that (heavy) live smoothing should be applied as early in the chain as possible to avoid a large time lag of the resulting signal.

The sampling module of Artisan takes readings according to the user specified sampling interval without applying any smoothing. If oversampling (introduced in Artisan v0.7.4) is active, two readings are taken per interval and the average is computed and taken as reading, replacing the two raw readings, which are dropped. The drawing module applies live smoothing on the values recorded by the sampling mode according to the amount specified by the user during recording and static smoothing after roasting and for displaying the background graphs. Therefore, a time lag might be observed during roasting that is eliminated afterwards. The smoothed curves are only computed for display purposes in Artisan. Artisan always stores and loads the raw values gathered by the sampling module.

So we have to deal with a principle tread of. Nice smooth curves potentially inducing a shift in time and some information lose, or very accurate curves showing a lot of noise that are hard to interpret by humans.


The case of the rate-of-rise curves is more demanding than the basic temperature curves. The rate-of-rise curve (called DeltaET/BT in Artisan) of another curve (here BT/ET) is mathematically the first derivation. In a time-discrete situation as with roast logging the rate-of-rise DeltaBT_0 of a curve BT_0 at t_0 is computed as DeltaBT_0 = (BT_0 - BT_-1) / (t_0 - t_-1). Again, one can see here that this computation takes the past into account, but not the future. In a live situation the corresponding time lag is again not avoidable. In a static computation one could avoid that by taking also BT_1/t_1 into account. Moreover, if the delta between t_0 and t_-1 get's too small compared to the temperature delta the resulting signal tends to fluctuate a lot. This hints to take a larger sampling interval to achieve a smoother RoR curve. Further, the smoother the underlying curve (here BT) is the smoother will the RoR curve is. Therefore, the RoR is usually computed from a smoothed BT curve. Therefore, a larger smoothing applied to the BT curve in Artisan results in a smoother RoR curve. However, due to its amplifying nature this smoothing is not enough and one has usually to smooth the raw RoR curve additionally to make the resulting curve useful. That is done in Artisan and the amount of that additional smoothing can be controlled by the user. Note that all predictions done by Artisan (like the phases LCDs prediction of the DE or FCs times, as well as the projection lines) are based on the smoothed RoR values.


Based on the above, the suggestions to minimize the information lose and time shift but end up with some useful data are to apply smoothing (=noise reduction) as early as possible in the processing chain. Use probes that are not too fast (in contrast to the suggestions one reads all over), use a meter that does smoothing well, select a large enough sampling interval (5s are more than enough to my experience not to miss out on important developments), apply smoothing to the ET/BT curves in the logging app and if needed apply additional smoothing to the RoR curves. As the hard- and software setups used for roast logging out there are quite diverse, settings are different from case to case and have to be adjustable all over the chain. In the end one has to balance accuracy with usefulness. An overly "accurate" reading is useless in a cloud of noise. A nice smoothed curve is useless if the time lag is too large to drive successfully control decision (like changing air-flow) that avoid overshooting.