Project Page

GitHub

Exploring the boundaries of deep reinforcement learning in simulated environments: a study on financial trading and lot-sizing.


Given todays rapidly changing and complex environment, crafting robust methodologies for decision-making is essential. In algorithmic decision-making processes, the Reinforcement Learning (RL) paradigm has progressively asserted itself as a preeminent methodology. This approach is especially proficient when dealing with environments characterized by both dynamic and non-deterministic attributes. However, it is essential to analyze the suitability of RL for each problem application. In this thesis, we use a unified mathematical structure based on stochastic control that helps us identify the main characteristics of a problem, allowing the discovery of more effective methods for better convergence in the solution space. With this mathematical framework, we develop and describe the two significant contributions made in this thesis. Firstly, we propose a classification method named Residual Network Long Short-Term Memory Actor (RSLSTM-A) to solve the Active Single-Asset Trading Problem (ASATP). Our proposed supervised method presented results that are superior to state-of-the-art RL methods. Since the ASATP is a type of problem where the transition probability matrix is not dependent on the agents actions, it is reasonable to assume that Supervised Learning might achieve better results than RL. Also, assuming that in this problem instance, we do not face an exploration-exploitation dilemma, the contextual bandit methods may need to be revised, and Supervised Learning establishes itself as the best approach. In the second part of the results of this thesis, we validate the potential of RL techniques in another problem instance, the Stochastic Discrete Lot-Sizing Problem (SDLSP), by proposing a multi-agent approach that outperforms the leading RL techniques. Furthermore, we apply post-decision states to build an Approximate Dynamic Programming method that can outperform baseline and Deep Reinforcement Learning methods in various SDLSP settings.

https://www.teses.usp.br/teses/disponiveis/3/3142/tde-26082024-093343/en.php

GitHub

Reinforcement learning approaches for the stochastic discrete lot-sizing problem on parallel machines


This paper addresses the stochastic discrete lot-sizing problem on parallel machines, which is a computationally challenging problem also for relatively small instances. We propose two heuristics to deal with it by leveraging reinforcement learning. In particular, we propose a technique based on approximate value iteration around post-decision state variables and one based on multi-agent reinforcement learning. We compare these two approaches with other reinforcement learning methods and more classical solution techniques, showing their effectiveness in addressing realistic size instances. 


The problem arises when the time step considered does not allow for the production of simultaneous different items on the same machine. This characteristic leads to a difficult integer programming problem that requires heuristics to be solved. We propose two heuristics: one based on approximate dynamic programming and leveraging a branch and bound technique to solve the nonlinear optimization problem for selecting the action (ADP), and one based on multi-agent reinforcement learning applied to an initial action generated by a simple decision rule (LSCMA).

Average total costs, holding, lost sales, and setup costs percentage with respect to the perfect information agent.

Policy visualization of the value iteration

GitHub

Outperforming algorithmic trading reinforcement learning systems: a supervised approach in the cryptocurrency market


The interdisciplinary relationship between machine learning and financial markets has long been a theme of great interest among both research communities. Recently, reinforcement learning and deep learning methods gained prominence in the active asset trading task, aiming to achieve outstanding performances compared with classical benchmarks, such as the Buy and Hold strategy. This paper explores both supervised learning and reinforcement learning approaches applied to active asset trading, drawing attention to the benefits of both approaches. This work extends the comparison between the supervised approach and reinforcement learning by using state-of-the-art strategies in both techniques. We propose adopting the Resnet architecture, one of the best deep learning approaches for time-series classification, into the Resnet-LSTM actor (RSLSTM-A). We compare RSLSTM-A against classical and recent reinforcement learning techniques, such as recurrent reinforcement learning, deep Q-network, and advantage actor-critic. We simulated a currency exchange market environment with the price time-series of the Bitcoin, Litecoin, Ethereum, Monero, and Dash cryptocurrencies to run our tests. We show that our approach achieves better overall performance, confirming that supervised learning can outperform reinforcement learning for trading. We also present a graphic representation of the features extracted from the Resnet neural network to identify which type of characteristics each residual block generates. 

XMR Crypto Coin Time series comparison between our method and other RL methods.

Results comparison considering zero transaction costs (in the paper, we have all the results, including transaction costs)

GitHub

Intelligent Trading Systems: A Sentiment-Aware Reinforcement Learning Approach

The feasibility of making profitable trades on a single asset on stock exchanges based on patterns identification has long attracted researchers. Reinforcement Learning (RL) and Natural Language Processing have gained notoriety in these single-asset trading tasks, but only a few works have explored their combination. Moreover, some issues are still not addressed, such as extracting market sentiment momentum through the explicit capture of sentiment features that reflect the market condition over time and assessing the consistency and stability of RL results in different situations. Filling this gap, we propose the Sentiment-Aware RL (SentARL) intelligent trading system that improves profit stability by leveraging market mood through an adaptive amount of past sentiment features drawn from textual news. We evaluated SentARL across twenty assets, two transaction costs, and five different periods and initializations to show its consistent effectiveness against baselines. Subsequently, this thorough assessment allowed us to identify the boundary between news coverage and market sentiment regarding the correlation of price-time series above which SentARL's effectiveness is outstanding.

Reinforcement Learning (RL) architecture is employed to accommodate the sentiment signals as features for the RL agent.

Comparison of the vanilla RL method against the one employing sentiment.

GitHub

Perishable Discrete Choice Methods

The need for optimal inventory control strategies for perishable items is of the utmost importance to reduce the large share of food products that expire before consumption and to achieve responsible food stocking policies. Our study allows for a multi-item setting with substitution between similar goods, deterministic deterioration, delivery lead times and seasonality. Namely, we model demand by a linear discrete choice model to represent a vertical differentiation between products. The verticality assumption is further applied in a novel way within product categories. Specifically, the same product typology is vertically decomposed according to the age of the single stock-keeping unit in a quality-based manner. We compare two different policies to select the daily size of the orders for each product. On the one hand, we apply one of the most classical approaches in inventory management, relying on the Order-Up-To policy, modified to deal with the seasonality. On the other hand, we operate a state-of-the-art actor-critic technique: Soft Actor-Critic (SAC). Although similar in terms of performance, the two policies show diverse replenishment patterns, handling products differently.

Environment simulation logic. This figure shows how we simulate perishable inventories to be optimized using metaheuristics.

Results comparison surrogated optimization and SAC (Soft Actor Critic) RL approach.

Comparative study of bitcoin price prediction using wavenets, recurrent neural networks and other machine learning methods

Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with their many variations that can effectively forecast. However, with the recent advancement in the computational capacity of computers and more importantly developing more advanced machine learning algorithms and approaches such as deep learning, new algorithms have been developed to forecast time series data. This article compares different methodologies such as ARIMA, Random Forest (RF), Support Vector Machine (SVM), Long Short-Term Memory (LSTM) and WaveNets for estimating the future price of Bitcoin.


Also check my medium post with the comparison discussion of forecasting models

Time series of the Bitcoin price rolling window evaluation method explaneid

We show the metric comparison between the methods after a hyperparameters search, showing that for the prediction horizon of 1 day, the ARIMA and SVR had the best results for the uni-variate case.