Can AI Agents Actually Learn to Trade?

Figure 1: Simudyne’s paradigm for autonomous agentic completion of a complex task.

Using agents is quickly becoming the most talked about thing in 2026, using agents for mundane daily tasks is now a normality and anyone who is anyone is trying to “optimise workflows” and have “the robot” perform some task for them.

We know agents can perform simple tasks to a high standard. The next question is: how complex a task can they handle?

We wanted to develop a process that results in agents autonomously making complex multi-stage decisions. When humans make these sorts of complex decisions, we often call it “instinct.” In reality, instinct is the result of an internal world model built over years of experience. Replicating that in AI is not straightforward.

LLMs show incredible aptitude at writing and debugging code. In addition, due to AI being trained on the internet, they have a very good idea about what humans find interesting, giving them strength in their ability to critique ideas. This means that we can use AI to build and refine models at a rapid rate, compressing years of model evolution into a short time span. We call this process model discovery.

The proposed agentic process is structured as several tasks (shown as the blue bubbles in Figure 1), each task is composed of multiple specialised agents. We use these agents to build and refine a model of their environment. We refer to this environment as a “world view” (more on that in a second). Then, from this “world view” model, we are able to derive actionable insights.

With this paradigm defined we could look at implementing this in an environment. We chose to look at financial trading.

To Trade Or Not To Trade?

First, let’s establish what the “world view” is for financial trading? A world view is all of the data that is required for someone to perform an action. For example, for a thermostat to perform its action (making a room the correct temperature) it needs to know the current temperature of the room, that’s its “world view”.

For trading, we simplified this “world view” to two inputs:

The current price
Company news snippets

To implement this framework, we begin by discovering a model of the “world view” — in this example the stock price. Using historical price data, the agent builds, refines and discovers a stochastic model that approximates market behaviour.

Because the model is stochastic, we can sample from it to generate hundreds of synthetic price trajectories that preserve its structural characteristics. Analysing these trajectories reduces noise sensitivity and enables the extraction of underlying risk and trend signals.

These signals are then combined with some analysis of recent company news and finally passed to a trading agent, which makes the final decision: Buy, Sell, or Hold. A schematic of this process is shown below.

Figure 2: Simudyne’s Agentic Trading Paradigm

Each of these tasks (shown as the blue bubbles in Figure 2) are performed by a collection of agents. Below we go into more detail about how each task works.

Task 1 — Financial Model Generation

First, we use agents to build a model of the financial market. The agent takes in the critique of the previous models (stored in our agentic memory base) and proposes a stochastic differential equation (SDE) to model the price evolution. This allows us to sample the stochastic component repeatedly and generate multiple plausible price trajectories.

SDEs are foundational in quantitative finance for modelling the evolution of asset prices, interest rates, and volatility, which are inherently uncertain. This is why we selected SDEs as a practical and interpretable model scaffolding.

Task 2 — Calibration and Iterative Critique

These SDEs are written with parameters which need to be calibrated. The agent implements the model in JAX using the Diffrax library, allowing rapid calibration of these parameters.

After calibration, we evaluated models fit using a range of statistical metrics. The model and its statistical metrics are then stored in memory. A separate agent analysed these outputs and produced a structured critique. This critique was then passed back to the builder agent to design an improved model. The process iterates until convergence, defined by the model fitting the price data to within a predefined threshold.

Task 3 — Monte Carlo Risk & Trend Extraction

Now that we have a model of our market we can use this to generate an array of synthetic price paths and get a novel and generalised view of the current market state.

These Monte Carlo simulations are then used to generate risk and trend metrics for that particular symbol. This is one of the key insights passed to the trader to make their decision.

Task 4 — Balanced News Analysis

Two separate agents were tasked to read news headlines about the company in question and produce either a bullish or bearish summary. This was to get a view of the news cycle which is equally positive and negative to help give the trader a balanced view.

This was the second key insight passed to the trader.

Task 5 — Trading Decision

A trading agent is then given the insight metrics and the news analysis and told to make a simple decision: Buy, Sell or Hold. It is also tasked to give a reasoning output to help the interpretability of the model. It makes this decision daily.

Results

We designed this paper as a benchmarking exercise of the current frontier LLMs. We wanted to assess several key skills all combined into one task. The specific skills include; code writing, model suggestion, novelty, semantic analysis, summarisation and decision making. Our key results were as follows:

Firstly, we found that Anthropic’s Sonnet 3.7 was the best model builder, making the most diverse suggestions while also being able to actually implement these in code without significant errors.

Secondly, we found that Llama 3.3 gave the most diverse and interesting model suggestions but unfortunately lacked the coding ability to actually build these complicated models so they would run without human input.

Thirdly, we found that the LLMs which built comparatively worse SDE models (quantified using the goodness of fit metrics) also performed worse in terms of profit. This shows that bad models will result in bad results and great care should be taken to ensure the best model is generated.

We also performed an ablation study where we assessed the trader agent with and without the SDE derived insights. We found that on average, when the trader agent was given these insights, the trader agent performed better. This shows that the model building and refinement step were crucial to the trader eventually making profitable decisions. This reinforces the title of our paper: ‘An Agentic Approach to Estimating Market Risk Improves Trading Decisions’.

The key question: is it profitable? The short answer is… maybe. Although some LLMs were consistently profitable, the test was not conducted at sufficient scale to draw firm conclusions about profitability.

Intelligent Back-Test

A major concern in LLM-based financial research is data leakage. Because we conducted these experiments using historical backtesting rather than live deployment, there is a risk that the models had already seen similar news events and corresponding price movements during training.

For example, events such as the NVIDIA crash following the DeepSeek announcement formed part of our test period and may plausibly exist within model training data.

To mitigate this risk, we constructed a fully synthetic environment. We generated realistic price paths using a development version of our Horizon product and paired them with synthetic news generated by a separate language model.

When evaluated in this synthetic setting, the performance was worse, but the model did not collapse. The agents remained profitable in the synthetic setting, but performance declined, with smaller and less consistent returns than in the historical backtest. This suggests that data leakage was likely to have influenced the results in the historical back test.

Conclusion

The key finding is that an agentic model discovery process can materially improve decision quality. Agents that reason about their environment outperform agents that only react to it. The system demonstrates coordinated multi-agent reasoning across modelling, calibration, critique, and decision stages.

Impact for Simudyne

The architecture described here forms the foundation of Horizon, our platform for autonomous financial intelligence. Horizon orchestrates multi-agent systems (similar to the one proposed) to construct factor models, extract risk signals, and generate structured insights across companies and portfolios.