Environment
The environment is implemented as a Gymnasium-Envioronment. For an in depth tutorial on how to create environment instances see here.
We chose a task-based manager architecture, where we employ managers to calculate Rewards, Observations and Terminations. Under the hood, the main task of the environment is to send actions to the TMInteraction-Process and gather raw-observations (images and game-states). Each manager, i.e. Reward-, Observation- and Termination-Manager has access to these raw-observations.
In this project, multiple versions of an environment are implemented. The main one that is talked about in this article is trackmania_env.envs.single_agent_env2.TMNF_Single_Agent_Env. Other environments that are implemented use this environment. Other environments include:
envs.sec_env.CrashProofEnvironmentthis is just a wrapper for the original environment, as we sometimes had problems with the game not resetting after the agent reached the finish-line. This led to crashes during trainings. This wrapper-environment catches these crashes and restarts the whole pipeline. We mainly use this environment.vectorized.VectorizedTMEnvironmentis a vectorized version of the environment. It stores the environments and exposes a batched interface.vectorized.SB3Vectorizedis a wrapper-environment forVectorizedTMEnvironmentthat is compatilble with stable-baselines3. This is used inscripts/sb3_train_vectoried.
Order of Operations
The order of operations in the environment in the step-method.
- Send
step-command to the Process-Wrapper - Advance the reference line
- Calculate Terminations
- Calculate Observations
- Calculate Rewards
Vectorized Environment
We also implemented a vectorized version of the environment, which allows you to collect rollouts from multiple game-instances at the same time, speeding up training time. Training with a vectorized environment and stable-baselines3 is implemented in
python scripts/sb3_train_vectorized.py
A script for testing and manual inputs to the environment is implemented in
python scripts/tests/step_vectorized.py
Spaces
The vectorized environment expects action in the shape of (N,), returns rewards and terminations in the same shape, if N is the number of environments. The info returned by the vecroized is a list with N entries, one for each environment. The shape of vectorized observations are documented with the observataion-manager.
Resources
For each environment, a new game-instances is spawned. Therefore you have to test how many environments your system can handle. On a system using 32 GB of Ram and a 5070 TI (16GB VRAM) 5 environments seem to work fine.
Multiple tracks
The vectorized environment also supports the stepping on multiple tracks, i.e. you can pass 10 track but only spawn 5 environmetns, in order to diversify your domain.
Continuous Actions
We have also begun implementing a continous actionspace. As of now; this, and can be used via scripts/tests/step_continuous.py.
Note
This is still in early development. Once we confirm it works as intended workings will be described here. Until then, code is your friend.
2D Continuous
In this mode, the action-space is two dimensional, i.e. \(a \in [-1,1]^2\). The first index is the steering, the second one the acceleration. For steering, less than zero is left, >= 0 is right. For acceleration; greater than 0 is gas, smaller than 0 is breaking.
The specifics of the continuization will be documented after more tesing.
4D Continuous
Not implemented yet.