Benchmark

Throughput

We conduct experiments on the throughput of the update and probability operations, which are the key operations of the density estimation algorithm. The results show that the throughput has been improved significantly for both stationary and non-stationary data streams, compared with the alternatives.

update 

[fig:throughput-update]{reference-type=”ref” reference=”fig:throughput-update”} plots the throughputs of the update operation of different density estimation algorithms, and the throughputs of each algorithm are measured for both stationary and non-stationary normal distribution data streams. The throughput of Sketch is 1.1 Mops/sec, which is 16$\times$, 16$\times$ and 1800$\times$ more than that of [oKDE]{.sans-serif}, [SPDT]{.sans-serif}, and [SPDTw]{.sans-serif}, respectively, when performing the update operation for a stationary data stream. For a non-stationary data stream, the throughput of Sketch is 0.37 Mops/sec, which is 5.8$\times$, 5.7$\times$ and 570$\times$ more than that of [oKDE]{.sans-serif}, [SPDT]{.sans-serif}, and [SPDTw]{.sans-serif}, respectively.

This result demonstrates that the performance of the density estimation of Sketch is most significantly improved when major concept drift rarely occurs. Moreover, it shows a noticeable improvement even for a data stream with frequent concept drift.

infographic

probability

[fig:throughput-probability]{reference-type=”ref” reference=”fig:throughput-probability”} is the measurement of throughputs after performing three hundreds update operations of normal distribution for different density estimation algorithms. The throughput of Sketch is 0.47 Mops/sec, which is 0.13$\times$, 1.0$\times$ and 1.1$\times$ more than that of [oKDE]{.sans-serif}, [SPDT]{.sans-serif}, and [SPDTw]{.sans-serif}, respectively. In order to improve the querying speed, we cache the result. The throughput of Sketch with cache is 4.1 Mops/sec, which is 1.2$\times$, 9.2$\times$ and 9.4$\times$ more than that of [oKDE]{.sans-serif}, [SPDT]{.sans-serif}, and [SPDTw]{.sans-serif}, respectively.

infographic

Memory Usage

\vspace{-3pt} [fig:memory]{reference-type=”ref” reference=”fig:memory”} demonstrates the amount of memory used in the data structures of different density estimation algorithms when the update operation is performed for one thousand data of the data stream of normal distribution $\mathcal{N}(0, 1)$. Our experimental result demonstrates that Sketch consumes 6.2 KB of memory, which is 1.3$\times$, 1.4$\times$ and 0.68$\times$ more than that of [oKDE]{.sans-serif}, [SPDT]{.sans-serif}, and [SPDTw]{.sans-serif}, respectively.

The parameters of Sketch are selected so as to have similar accuracy to the alternatives when it estimates for a stationary data stream. The result indicates that it requires more memory to achieve the same accuracy as [oKDE]{.sans-serif} and [SPDT]{.sans-serif}. However, Sketch requires less than 3$\times$ of the memory usage of [SPDT]{.sans-serif} although it ensembles three histograms. As mentioned in 3{reference-type=”ref” reference=”sec:error”}, the Sketch framework manages to correct errors of each model successfully, making it acceptable for individual models to have higher errors.

infographic