Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions examples/rl/actor_critic_cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
import os

os.environ["KERAS_BACKEND"] = "tensorflow"
import gym
import gymnasium as gym
import numpy as np
import keras
from keras import ops
Expand Down Expand Up @@ -98,13 +98,13 @@
episode_count = 0

while True: # Run until solved
state = env.reset()[0]
obs, _ = env.reset()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The env.reset() call now correctly unpacks two values, but it assigns the observation to a new variable obs. The following code on line 106 expects the variable state, which is now undefined in this scope. This will lead to a NameError. To fix this, you should assign the observation to state.

Suggested change
obs, _ = env.reset()
state, _ = env.reset()

episode_reward = 0
with tf.GradientTape() as tape:
for timestep in range(1, max_steps_per_episode):

state = ops.convert_to_tensor(state)
state = ops.expand_dims(state, 0)
state = tf.convert_to_tensor(state)
state = tf.expand_dims(state, 0)
Comment on lines +106 to +107
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These lines have been changed to use tf directly, instead of the backend-agnostic keras.ops. The rest of the file uses keras.ops (e.g., on lines 116 and 160), so this change introduces an inconsistency. For better code style and to keep the example aligned with Keras best practices, it's recommended to use keras.ops here as well.

Suggested change
state = tf.convert_to_tensor(state)
state = tf.expand_dims(state, 0)
state = ops.convert_to_tensor(state)
state = ops.expand_dims(state, 0)


# Predict action probabilities and estimated future rewards
# from environment state
Expand Down