🛡️ VizDoom Survival Agent

Autonomous Defense via Reinforcement Learning

A reinforcement learning agent that evolves from random movement to a precision shooter using only raw visual input.

Report Bug · Request Feature

👥 Team & Responsibilities

Member	Role & Contributions
Gamze Çetin (2102967)	Algorithm & Architecture: Implemented the PPO algorithm and the CNN Feature Extractor backbone.
Fazıl Eren Çiftdemir (2103573)	Environment Dynamics: Designed the custom Reward Function and configured VizDoom scenario parameters.
Melis Bahar Kurşun (2101834)	Training & Analysis: Managed the training pipeline, hyperparameter optimization, and Tensorboard visualization.

🎥 Visual Proof: Agent Evolution

evolution_side_by_side.gif: The agent's journey from chaos to mastery.

Left: Untrained (Random) | Center: Half-Trained (Spinning) | Right: Fully Trained (Expert)

🧠 Methodology

The Math: Reward Function

To courage survival and effective combat, we implemented a custom Shaped Reward Function.

$$ R_{total} = R_{base} + \underbrace{\left[ 0.05 + 0.10 \cdot \Delta H - 0.03 \cdot \Delta A \right]}_{\text{Shaping Term}} $$

Component	Weight	Purpose
Living Bonus	`+0.05`	Incentivizes maximizing episode duration.
Health Delta ($\Delta H$)	`+0.10`	Penalizes damage heavily; encourages dodging.
Ammo Penalty ($\Delta A$)	`-0.03`	Discourages "spray and pray"; forces precision.

The Model: Architecture & Algorithm

We utilized Proximal Policy Optimization (PPO) due to its proven stability in continuous and discrete control tasks from visual inputs.

Input: (100, 160, 1) Grayscale Tensor (Raw Pixels)
Backbone: CNN Feature Extractor (Conv2d Layers + ReLU)
Action Space: Discrete(3) (Turn Left, Turn Right, Attack)
Hyperparameters:
- Learning Rate: 1e-4 (tuned for stability)
- Batch Size: 256
- Gamma: 0.99
- Steps: 2M

📈 Training Analysis

The following graphs demonstrate the agent's learning progress over 2 Million Timesteps.

1. Reward Convergence (Mean Episode Reward)

Analysis: This graph illustrates the agent's overall performance. The blue line shows the raw reward per episode, which is highly volatile due to the random spawning of enemies. The orange line (Moving Average) reveals the true trend:

0 - 1M Steps: The agent is in the "Exploration" phase, struggling to find a winning strategy. Rewards are low.
1M - 2M Steps: A sharp increase indicates the "Exploitation" phase. The agent has learned that Aligning + Shooting yields positive reinforcement.
Significance: The steady climb proves the PPO algorithm successfully optimized the policy against the custom reward function.

2. Survival Time (Episode Length)

Analysis: This metric tracks how many frames the agent survived before dying or winning.

Correlation: Notice how this graph mirrors Figure 1. As the agent gets better at killing enemies (higher reward), it also lives longer.
Validation: This confirms the agent isn't "gaming" the system by finding a quick-suicide loop to avoid penalties. It is genuinely surviving the onslaught.

3. Tensorboard Validation

Analysis: This is the raw internal metric (rollout/ep_rew_mean) logged directly by Stable-Baselines3 during training.

Purpose: It serves as a verification of the custom plots.
Insight: The curve is smoother here because SB3 applies internal smoothing. It clearly documents the final convergence at a reward of approx ~30, matching our custom analysis.

⚠️ Challenges & Failures

1. The "Pacifist Spinner" (Spinning without Shooting)

The Problem: Initially, the agent would spin continuously to locate enemies but refused to fire. It settled on a strategy of optimizing the "Living Bonus" by passively navigating, rather than risking engagement.

The Solution: We reshaped the reward structure to make passive survival impossible:

Increased Health Weight (0.10): Taking damage became too expensive to ignore.
Result: The agent realized that the only way to preserve health was to eliminate the threat (the enemies) before they could fire, forcing it to transition from passive spinning to aggressive shooting.

2. Long-Range Visibility (The "Pixel Hunt")

The Problem: The agent struggled to detect and hit enemies at a distance. Due to the low resolution (100x160), distant monsters appeared as barely distinguishable clusters of pixels against the background, causing the agent to miss frequently.

The Solution:

Frame Stacking (Motion Perception): We implemented VecFrameStack (stacking 4 sequential frames).
Impact: Instead of relying on a single static blurry image, the agent perceives motion. This allows it to track the trajectory of distant, moving enemies effectively, even when they are just a few pixels large.

_{Built for the CMP4501 Project · 2026}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
runs/defend_center		runs/defend_center
scenarios		scenarios
scripts		scripts
src/vizdoom_survival		src/vizdoom_survival
tests		tests
.gitignore		.gitignore
README.md		README.md
_vizdoom.ini		_vizdoom.ini
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ VizDoom Survival Agent

Autonomous Defense via Reinforcement Learning

👥 Team & Responsibilities

🎥 Visual Proof: Agent Evolution

🧠 Methodology

The Math: Reward Function

The Model: Architecture & Algorithm

📈 Training Analysis

1. Reward Convergence (Mean Episode Reward)

2. Survival Time (Episode Length)

3. Tensorboard Validation

⚠️ Challenges & Failures

1. The "Pacifist Spinner" (Spinning without Shooting)

2. Long-Range Visibility (The "Pixel Hunt")

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛡️ VizDoom Survival Agent

Autonomous Defense via Reinforcement Learning

👥 Team & Responsibilities

🎥 Visual Proof: Agent Evolution

🧠 Methodology

The Math: Reward Function

The Model: Architecture & Algorithm

📈 Training Analysis

1. Reward Convergence (Mean Episode Reward)

2. Survival Time (Episode Length)

3. Tensorboard Validation

⚠️ Challenges & Failures

1. The "Pacifist Spinner" (Spinning without Shooting)

2. Long-Range Visibility (The "Pixel Hunt")

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages