Skip to content

Conversation

bergio13
Copy link
Contributor

@bergio13 bergio13 commented Jul 10, 2025

Fixes #648

Initial draft for the integration of reinforcement learning methods within Agents.jl. It is a working sketch still to be polished, refined and improved. The file rl_interface contains the code that allows to train the agents of an abm model using reinforcement learning. Examples on how to use this interface are provided in rl_interface_examples. These examples can be compared with the ones implemented without the interface (see boltzmann_local and wolfsheep) to see how much the interface simplifies the code the user needs to implement.

@codecov-commenter
Copy link

codecov-commenter commented Jul 10, 2025

Codecov Report

❌ Patch coverage is 20.15707% with 305 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.30%. Comparing base (8b5b456) to head (4eb1985).
⚠️ Report is 195 commits behind head on main.

Files with missing lines Patch % Lines
ext/AgentsRL/src/rl_training_functions.jl 0.00% 132 Missing ⚠️
ext/AgentsRL/src/rl_utils.jl 0.00% 116 Missing ⚠️
ext/AgentsRL/src/step_reinforcement_learning.jl 0.00% 37 Missing ⚠️
src/reinforcement_learning.jl 79.31% 18 Missing ⚠️
src/Agents.jl 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1170      +/-   ##
==========================================
+ Coverage   70.12%   77.30%   +7.18%     
==========================================
  Files          42       42              
  Lines        2718     2992     +274     
==========================================
+ Hits         1906     2313     +407     
+ Misses        812      679     -133     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bergio13
Copy link
Contributor Author

bergio13 commented Jul 17, 2025

I am trying to implement the new model type in the following way:

struct ReinforcementLearningABM{
    S<:SpaceType,
    A<:AbstractAgent,
    C<:Union{AbstractDict{Int,A},AbstractVector{A}},
    T,G,K,F,P,R<:AbstractRNG} <: AgentBasedModel{S}
    # Standard ABM components
    agents::C
    agent_step::G
    model_step::K
    space::S
    scheduler::F
    properties::P
    rng::R
    agents_types::T
    agents_first::Bool
    maxid::Base.RefValue{Int64}
    time::Base.RefValue{Int64}

    # RL-specific components
    rl_config::Base.RefValue{Any}
    trained_policies::Dict{Type,Any}
    training_history::Dict{Type,Vector{Float64}}
    is_training::Base.RefValue{Bool}
end

# Extend mandatory internal API for AgentBasedModel
containertype(::ReinforcementLearningABM{S,A,C}) where {S,A,C} = C
agenttype(::ReinforcementLearningABM{S,A}) where {S,A} = A
discretimeabm(::ReinforcementLearningABM= true

function ReinforcementLearningABM(
    A::Type,
    space::S=nothing,
    rl_config=nothing;
    agent_step!::G=dummystep,
    model_step!::K=dummystep,
    container::Type=Dict,
    scheduler::F=Schedulers.Randomly(),
    properties::P=nothing,
    rng::R=Random.default_rng(),
    agents_first::Bool=true,
    warn=true,
    kwargs...
) where {S<:SpaceType,G,K,F,P,R<:AbstractRNG}

    # Initialize agent container using proper construction
    agents = construct_agent_container(container, A)
    agents_types = union_types(A)
    T = typeof(agents_types)
    C = typeof(agents)

    model = ReinforcementLearningABM{S,A,C,T,G,K,F,P,R}(
        agents,
        agent_step!,
        model_step!,
        space,
        scheduler,
        properties,
        rng,
        agents_types,
        agents_first,
        Ref(0),
        Ref(0),
        Ref{Any}(rl_config),
        Dict{Type,Any}(),
        Dict{Type,Vector{Float64}}(),
        Ref(false)
    )

    return model
end 

I am able to create the model yet when I try to use the function add_agent I get a notimplemented error. Does this mean I have to redefine add_agent, remove_agent, etc... functions or extend the model_accessing_api functions ?

What I mean is the following, is it better to do something like:
const DictRLABM = ReinforcementLearningABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}
or should I try to extend the existing functions like this: const DictABM = Union{StandardABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, EventQueueABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, ReinforcementLearningABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}}

@Datseris
Copy link
Member

you need to make the model a subtype of AgentBasedModel.

@bergio13
Copy link
Contributor Author

bergio13 commented Jul 17, 2025

I thought I was already doing it here:
struct ReinforcementLearningABM{ S<:SpaceType, A<:AbstractAgent, C<:Union{AbstractDict{Int,A},AbstractVector{A}}, T,G,K,F,P,R<:AbstractRNG} <: AgentBasedModel{S}

So this is not sufficient ?

@Datseris
Copy link
Member

ah, sorry, I didn't see this correctly. I am a bit short on time right now, but I can spend some time to help here next weekend, 26-27 of July. @Tortar perhaps you can give some advice before that?>

@bergio13
Copy link
Contributor Author

No worries, I think I figured it out. I went this solution const DictABM = Union{StandardABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, EventQueueABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}, ReinforcementLearningABM{S,A,<:AbstractDict{<:Integer,A}} where {S,A}}. But during the next meeting we can discuss this and if you don't like it i can change it.

@Datseris
Copy link
Member

Yes, this DictABM design is not optimal, because it is extended by type while it should be extended by functions. Instead of trying to use multiple dispatch to dispatch on models that are dict-based, we should instead have a central function with a bunch of if statements that calls agent_container(model), examines its type, and then acts accordingly? Let's go over this in our next meeting. I am up for weekly meetings on Thursdays.

@bergio13
Copy link
Contributor Author

We have our next meeting scheduled on the 24/07 at 11. By the way is there any chance we can move that later in the day ? I might have some problems connecting at that time. Furthermore, there are also some design choices regarding the implementation of the POMDPs.initialstate() function that this refactoring from GeneralRLEnvironment to ReinforcementLearningABM requires to take into account in my opinion.

@Datseris
Copy link
Member

at 11

Which timezone? We can move, give me a timewindow (and zone) of preferrence and i'll put a date there assuming Adriano is also available.

@Tortar
Copy link
Member

Tortar commented Jul 18, 2025

Yes, this DictABM design is not optimal, because it is extended by type while it should be extended by functions. Instead of trying to use multiple dispatch to dispatch on models that are dict-based, we should instead have a central function with a bunch of if statements that calls agent_container(model), examines its type, and then acts accordingly? Let's go over this in our next meeting. I am up for weekly meetings on Thursdays.

Actually the best way to solve this is to make DictABM = AgentBasedModel{S, A,<:AbstractDict{<:Integer,A}} where {S,A}. For now AgentBasedModel doesn't have all those parameters though, but they can be added.

@Tortar
Copy link
Member

Tortar commented Jul 18, 2025

For me it's okay at any hour in the afternoon by the way so I'll let you decide for the hour of our meeting

@bergio13
Copy link
Contributor Author

We could make it after lunch, would 2.30pm (CEST) work ?

@Datseris
Copy link
Member

yes, i update hte invite

bergio13 added 3 commits July 22, 2025 14:50
…erface for RL

- Reorganized old interface examples
- Added ReinforcementLearningABM and RLEnvironmentWrapper to enable compatibility with POMDPs-based RL algorithms provided by Crux.
- Implemented necessary POMDPs functions: actions, observations, observation, initialstate, initialobs, gen, isterminal, discount, and state_space.
- Added step_rl! and rl_agent_step! functions for RL agent behavior.
- Added examples to show how the new model type works
@Datseris
Copy link
Member

hello, i am sorry but i am on sick leave for two weeks and i cannot meet next week. I am available again on the 11th of August and would be happy to meet that week!

@Datseris
Copy link
Member

Datseris commented Aug 8, 2025

Hello, I am returning from my sick leave next week, would you like to have a videocall on thursday?

@bergio13
Copy link
Contributor Author

bergio13 commented Aug 11, 2025

Hello, I am returning from my sick leave next week, would you like to have a videocall on thursday?

At what time would you like to meet ?

@Datseris
Copy link
Member

I can do 2pm UK time if that works.

@bergio13
Copy link
Contributor Author

I can do 2pm UK time if that works.

For me it works

@Datseris Datseris marked this pull request as ready for review August 22, 2025 13:35
target_agent = model[agent_id]
agent_pos = target_agent.pos
width, height = getfield(model, :space).extent
observation_radius = model.rl_config[][:observation_radius]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be a model property instead of being passed in rl_config

# decreases in the Gini coefficient. This creates an incentive for agents to learn
# movement patterns that promote wealth redistribution.

function boltzmann_calculate_reward(env, agent, action, initial_model, final_model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably env can be spared and something like boltzmann_calculate_reward(agent, action, previous_model, current_model) could be better

Comment on lines +219 to +222
properties = Dict{Symbol,Any}(
:gini_coefficient => 0.0,
:step_count => 0
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are probably not needed

@Tortar
Copy link
Member

Tortar commented Aug 22, 2025

I added review comments about what we discussed @bergio13 @Datseris in the meeting as well as other example code simplifications

@Tortar
Copy link
Member

Tortar commented Aug 23, 2025

I have taken care of the most part of these review things myself. I've also generated these videos showing how random agents differ from RL agents in our example

boltzmann.mp4
rl_boltzmann.mp4

RL agents are clearly smarter :-)

Great work Giorgio!

Copy link
Member

@Tortar Tortar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified locally that documentation works, only the videos are lacking at the moment because needs to be included with html code after being produced. Apart from that, I think that the PR is in pretty good shape and so I will approve it, but before merging it, it would probably be useful if @Datseris has a final look at it (and the few open review comments be tackled)

Co-authored-by: George Datseris <datseris.george@gmail.com>
@Datseris
Copy link
Member

Thanks a lot for your work both @bergio13 and @Tortar . I agree that this looks great, and it is very close to finishing! However I would really like to have a final in-depth look before that. The only problem is that I am currently under a lot of pressure from my main job and therefore lack time. I will try to work on this on the coming weekend if that's okay. Here and there during the evenings I will be adding comments to the review (you won't see them until I submit).

The PR is approved and as far as I can tell @bergio13 had a great GSOC project!

@bergio13
Copy link
Contributor Author

I have taken care of the most part of these review things myself. I've also generated these videos showing how random agents differ from RL agents in our example

boltzmann.mp4
rl_boltzmann.mp4
RL agents are clearly smarter :-)

Great work Giorgio!

Thanks a lot for your work both @bergio13 and @Tortar . I agree that this looks great, and it is very close to finishing! However I would really like to have a final in-depth look before that. The only problem is that I am currently under a lot of pressure from my main job and therefore lack time. I will try to work on this on the coming weekend if that's okay. Here and there during the evenings I will be adding comments to the review (you won't see them until I submit).

The PR is approved and as far as I can tell @bergio13 had a great GSOC project!

Thanks to you and to @Tortar for your help throughout this GSoC !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-Agent RL
4 participants