Multivariate Anomaly Detection in Asset Returns: A Machine Learning Perspective

In financial markets, critical signals often emerge from subtle deviations hidden within seemingly typical patterns of asset behavior. This article examines machine learning techniques for detecting multivariate anomalies in the joint movements of SPY (equities), TLT (long-term bonds), and GLD (gold). Multivariate anomalies refer to unusual patterns or data points that deviate from expected behavior across multiple variables simultaneously. For example, an anomaly may occur when the returns of multiple assets—such as SPY, TLT, and GLD—move together in an unexpected or unexplained manner, potentially signaling market disruption. Detecting such anomalies provides valuable informational content because joint deviations across these distinct asset classes may reflect hidden market dynamics or structural shifts. These patterns can reveal shifts in investor sentiment, changes in macroeconomic conditions, emerging risks, or market inefficiencies that may not be apparent through traditional single-asset analysis. As a result, identifying multivariate anomalies supports more effective risk management, portfolio diversification, and strategic decision-making by uncovering early signals of potential market instability or regime changes.

For example, an “all up day,” where SPY (equities), TLT (long-term bonds), and GLD (gold) simultaneously increase, may represent an anomaly because it defies typical asset class correlations. Normally, equities and bonds tend to move inversely, reflecting investor preferences either for risk-taking (equities up, bonds down) or risk aversion (equities down, bonds up). Additionally, gold often acts as a safe-haven asset that rises when uncertainty grows. Thus, a scenario in which all three asset classes rise together might signal unusual market conditions, such as widespread uncertainty coupled with strong liquidity injections or a sudden reassessment of macroeconomic expectations, indicating a potential structural shift or regime change in the markets.

Detecting multivariate anomalies is more complex than detecting univariate anomalies because the relationships between variables must be considered holistically rather than individually. This requires understanding correlations or dependencies among variables and identifying outliers that deviate from these relationships. Techniques such as Isolation Forest, One-Class SVM, and Autoencoders are commonly used for this purpose, as they can model interactions between multiple variables and detect instances where normal patterns break down.

Isolation Forest works by creating multiple decision trees to isolate data points. The key idea is that anomalies, being rare and different from the majority of the data, are easier to isolate and require fewer splits in the decision tree. This technique is efficient and works well for high-dimensional datasets.

Isolation Forest

Imagine you’re playing a guessing game where you have to find a hidden object in a room. If the object is out in the open (like an anomaly in data), you’ll find it quickly. But if it’s well-hidden among many similar objects (like a normal data point), it will take longer to isolate.
In Isolation Forest, the path length is the number of “yes/no” questions (splits) needed to isolate a data point in a tree.
Anomalies are isolated quickly (short path length) because they are different from most data. Normal points take longer (long path length) because they blend in with the rest.

So, if a point gets “isolated” in very few splits, it’s likely an outlier!

One-Class SVM, on the other hand, is a machine learning model that learns a boundary around the normal data. It treats the majority of the data as belonging to a single class and tries to detect anomalies as those points that fall outside this boundary. It’s particularly useful when the data is mostly normal, and anomalies are rare but impactful.

Autoencoders, which are a type of artificial neural network, are another powerful approach for anomaly detection. They learn to compress (encode) the input data into a simpler representation and then reconstruct it. When anomalies are encountered, the reconstruction error (the difference between the input and its reconstruction) tends to be higher, making them easy to spot.

The table below summarizes the main features of the three techniques:

To identify anomalies in the joint behavior of SPY, TLT, and GLD, we used the following Python libraries and classes:

The primary output of the analysis is a set of labeled anomaly dates, which are visualized and presented in the article. Specifically, we display daily price charts of SPY, TLT, and GLD in a single image, with the anomaly dates highlighted as overlays on the SPY chart, as shown below:

Detected Anomaly Dates – Isolation Forest Method

Detected Anomaly Dates – One-class SVM Method

Detected Anomaly Dates – Autoencoder Method

Despite their methodological differences, the three approaches produce broadly consistent results, identifying anomalies around major inflection points and periods of elevated market movement. Although the anomalies are visualized as overlays on the SPY price chart, they are derived from the joint behavior of SPY, TLT, and GLD. Interestingly, Isolation Forest and Autoencoders each detect 63 anomaly dates, while One-Class SVM flags slightly more (67) across the 1,256 observations from May 13, 2020, to May 12, 2025, indicating that SVM is less conservative in this context. The slightly higher number of anomalies detected by One-Class SVM may reflect differences in model sensitivity, potentially including a higher false positive rate, as is sometimes observed with this method in low-dimensional, nonlinear contexts. Although Isolation Forest and Autoencoders detect the same number of anomalies, the latter appears more responsive to nonlinear transitions near turning points. Together, these methods provide complementary views of abnormal joint movements across equity, bond, and gold markets.

Like (0)

Share this: