Skip to content

Commit

Permalink
docs: improve polyak description (#417)
Browse files Browse the repository at this point in the history
This commit improves the polyak parameter description to prevent
confusion with papers that use the soft replacement factor.
  • Loading branch information
rickstaa authored Feb 24, 2024
1 parent e8012bc commit 2404a4e
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 16 deletions.
11 changes: 7 additions & 4 deletions stable_learning_control/algos/pytorch/lac/lac.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,10 @@ def __init__(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to
1.). In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`)
where :math:`\\tau` is the soft replacement factor. Defaults to
``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down Expand Up @@ -991,8 +993,9 @@ def lac(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to 1.).
In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`) where
:math:`\\tau` is the soft replacement factor. Defaults to ``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down
11 changes: 7 additions & 4 deletions stable_learning_control/algos/pytorch/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,10 @@ def __init__(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to
1.). In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`)
where :math:`\\tau` is the soft replacement factor. Defaults to
``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down Expand Up @@ -856,8 +858,9 @@ def sac(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to 1.).
In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`) where
:math:`\\tau` is the soft replacement factor. Defaults to ``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down
11 changes: 7 additions & 4 deletions stable_learning_control/algos/tf2/lac/lac.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,8 +185,10 @@ def __init__(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to
1.). In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`)
where :math:`\\tau` is the soft replacement factor. Defaults to
``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down Expand Up @@ -922,8 +924,9 @@ def lac(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to 1.).
In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`) where
:math:`\\tau` is the soft replacement factor. Defaults to ``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down
11 changes: 7 additions & 4 deletions stable_learning_control/algos/tf2/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,10 @@ def __init__(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to
1.). In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`)
where :math:`\\tau` is the soft replacement factor. Defaults to
``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down Expand Up @@ -787,8 +789,9 @@ def sac(
.. math:: \\theta_{\\text{targ}} \\leftarrow
\\rho \\theta_{\\text{targ}} + (1-\\rho) \\theta
where :math:`\\rho` is polyak. (Always between 0 and 1, usually
close to 1.). Defaults to ``0.995``.
where :math:`\\rho` is polyak (Always between 0 and 1, usually close to 1.).
In some papers :math:`\\rho` is defined as (1 - :math:`\\tau`) where
:math:`\\tau` is the soft replacement factor. Defaults to ``0.995``.
target_entropy (float, optional): Initial target entropy used while learning
the entropy temperature (alpha). Defaults to the
maximum information (bits) contained in action space. This can be
Expand Down

0 comments on commit 2404a4e

Please sign in to comment.