Understanding the microstructure of the financial market requires the
processing of a vast amount of data related to individual trades, and
sometimes even multiple levels of quotes. Analyzing such a large
volume of data requires tremendous computing power that is not easily
available to financial academics and regulators. Fortunately, public
funded High Performance Computing (HPC) power is widely available at
the National Laboratories in the US. In this paper we demonstrate
that the HPC resource and the techniques for data-intensive sciences
can be used to greatly accelerate the computation of an early warning
indicator called Volume-synchronized Probability of Informed trading
(VPIN). The test data used in this study contains five and a half
year's worth of trading data for about 100 most liquid futures
contracts, includes about 3 billion trades, and takes 140GB as text
files. By using (1) a more efficient file format for storing the
trading records, (2) more effective data structures and algorithms,
and (3) parallelizing the computations, we are able to explore 16,000
different ways of computing VPIN in less than 20 hours on a 32-core
IBM DataPlex machine. Our test demonstrates that a modest computer is
sufficient to monitor a vast number of trading activities in real-time
-- an ability that could be valuable to regulators.
Our test results also confirm that VPIN is a strong predictor of
liquidity-induced volatility. With appropriate parameter choices, the
false positive rates are about 7% averaged over all the futures
contracts in the test data set. More specifically, when VPIN values
rise above a threshold (CDF > 0.99), the volatility in the
subsequent time windows is higher than the average in 93% of the
cases.