Package evaluation of DeepQLearning on Julia 1.10.8 (92f03a4775*) started at 2025-02-25T14:52:19.029 ################################################################################ # Set-up # Installing PkgEval dependencies (TestEnv)... Set-up completed after 4.92s ################################################################################ # Installation # Installing DeepQLearning... Resolving package versions... Updating `~/.julia/environments/v1.10/Project.toml` [de0a67f4] + DeepQLearning v0.7.2 Updating `~/.julia/environments/v1.10/Manifest.toml` [621f4979] + AbstractFFTs v1.5.0 [7d9f7c33] + Accessors v0.1.41 [79e6a3ab] + Adapt v4.2.0 [66dad0bd] + AliasTables v1.1.3 [dce04be8] + ArgCheck v2.4.0 [ec485272] + ArnoldiMethod v0.4.0 [4fba245c] + ArrayInterface v7.18.0 [a9b6321e] + Atomix v1.1.0 [fbb218c0] + BSON v0.3.9 [198e06fe] + BangBang v0.4.3 [9718e550] + Baselet v0.1.1 [e1450e63] + BufferedStreams v1.2.2 [fa961155] + CEnum v0.5.0 [082447d4] + ChainRules v1.72.2 [d360d2e6] + ChainRulesCore v1.25.1 [35d6a980] + ColorSchemes v3.29.0 [3da002f7] + ColorTypes v0.12.0 [c3611d14] + ColorVectorSpace v0.11.0 [5ae59095] + Colors v0.13.0 [d842c3ba] + CommonRLInterface v0.3.3 [bbf7d656] + CommonSubexpressions v0.3.1 [f70d9fcc] + CommonWorldInvalidations v1.0.0 [34da2185] + Compat v4.16.0 [a33af91c] + CompositionsBase v0.1.2 [187b0558] + ConstructionBase v1.5.8 [6add18c4] + ContextVariablesX v0.1.3 [d38c429a] + Contour v0.6.3 [a8cc5b0e] + Crayons v4.1.1 [9a962f9c] + DataAPI v1.16.0 [a93c6f00] + DataFrames v1.7.0 [864edb3b] + DataStructures v0.18.20 [e2d170a0] + DataValueInterfaces v1.0.0 [de0a67f4] + DeepQLearning v0.7.2 [244e2a9f] + DefineSingletons v0.1.2 [8bb1440f] + DelimitedFiles v1.9.1 [163ba53b] + DiffResults v1.1.0 [b552c78f] + DiffRules v1.15.1 [31c24e10] + Distributions v0.25.117 [ffbed154] + DocStringExtensions v0.9.3 [da5c29d0] + EllipsisNotation v1.8.0 [4e289a0a] + EnumX v1.0.4 [cc61a311] + FLoops v0.2.2 [b9860ae5] + FLoopsBase v0.1.1 [5789e2e9] + FileIO v1.16.6 [1a297f60] + FillArrays v1.13.0 [53c48c17] + FixedPointNumbers v0.8.5 ⌅ [587475ba] + Flux v0.14.25 [f6369f11] + ForwardDiff v0.10.38 ⌅ [d9f16b24] + Functors v0.4.12 [0c68f7d7] + GPUArrays v11.2.2 [46192b85] + GPUArraysCore v0.2.0 [86223c79] + Graphs v1.12.0 [076d061b] + HashArrayMappedTries v0.2.0 [34004b35] + HypergeometricFunctions v0.3.27 [7869d1d1] + IRTools v0.4.14 [615f187c] + IfElse v0.1.1 [a09fc81d] + ImageCore v0.10.5 [d25df0c9] + Inflate v0.1.5 [22cec73e] + InitialValues v0.3.1 [842dd82b] + InlineStrings v1.4.3 [3587e190] + InverseFunctions v0.1.17 [41ab1584] + InvertedIndices v1.3.1 [92d709cd] + IrrationalConstants v0.2.4 [82899510] + IteratorInterfaceExtensions v1.0.0 [692b3bcd] + JLLWrappers v1.7.0 [b14d175d] + JuliaVariables v0.2.4 [63c18a36] + KernelAbstractions v0.9.34 [929cbde3] + LLVM v9.2.0 [b964fa9f] + LaTeXStrings v1.4.0 [2ab3a3ac] + LogExpFunctions v0.3.29 [c2834f40] + MLCore v1.0.0 ⌃ [7e8f7934] + MLDataDevices v1.5.3 [d8e11817] + MLStyle v0.4.17 [f1d291b0] + MLUtils v0.4.7 [1914dd2f] + MacroTools v0.5.15 [dbb5928d] + MappedArrays v0.4.2 [299715c1] + MarchingCubes v0.1.11 [128add7d] + MicroCollections v0.2.0 [e1d29d7a] + Missings v1.2.0 [e94cdb99] + MosaicViews v0.3.4 [872c559c] + NNlib v0.9.27 [77ba4419] + NaNMath v1.1.2 [71a1bf82] + NameResolution v0.1.5 [d9ec5142] + NamedTupleTools v0.14.3 [6fe1bfb0] + OffsetArrays v1.15.0 [0b1bfda6] + OneHotArrays v0.2.6 ⌅ [3bd65402] + Optimisers v0.3.4 [bac558e1] + OrderedCollections v1.8.0 [90014a1f] + PDMats v0.11.32 [f3bd98c0] + POMDPLinter v0.1.2 [7588e00f] + POMDPTools v1.1.0 [a93abf59] + POMDPs v1.0.0 [5432bcbf] + PaddedViews v0.5.12 [d96e819e] + Parameters v0.12.3 [2dfb63ee] + PooledArrays v1.4.3 [aea7be01] + PrecompileTools v1.2.1 [21216c6a] + Preferences v1.4.3 [8162dcfd] + PrettyPrint v0.2.0 [08abe8d2] + PrettyTables v2.4.0 [33c8b6b6] + ProgressLogging v0.1.4 [92933f4c] + ProgressMeter v1.10.2 [3349acd9] + ProtoBuf v1.0.16 [43287f4e] + PtrArrays v1.3.0 [1fd47b50] + QuadGK v2.11.2 [c1ae055f] + RealDot v0.1.0 [189a3867] + Reexport v1.2.2 [ae029012] + Requires v1.3.0 [79098fc4] + Rmath v0.8.0 [7e506255] + ScopedValues v1.3.0 [91c51154] + SentinelArrays v1.4.8 [efcf1570] + Setfield v1.1.1 [605ecd9f] + ShowCases v0.1.0 [699a6c99] + SimpleTraits v0.9.4 [a2af1166] + SortingAlgorithms v1.2.1 [dc90abb0] + SparseInverseSubset v0.1.2 [276daf66] + SpecialFunctions v2.5.0 [171d559e] + SplittablesBase v0.1.15 [cae243ae] + StackViews v0.1.1 [aedffcd0] + Static v1.1.1 [0d7ed370] + StaticArrayInterface v1.8.0 [90137ffa] + StaticArrays v1.9.12 [1e83bf80] + StaticArraysCore v1.4.3 [82ae8749] + StatsAPI v1.7.0 [2913bbd2] + StatsBase v0.34.4 [4c63d2b9] + StatsFuns v1.3.2 [892a3eda] + StringManipulation v0.4.1 ⌃ [09ab397b] + StructArrays v0.6.21 [3783bdb8] + TableTraits v1.0.1 [bd369af6] + Tables v1.12.0 [899adc3e] + TensorBoardLogger v0.1.25 [62fd8b95] + TensorCore v0.1.1 [28d57a85] + Transducers v0.4.84 [410a4b4d] + Tricks v0.1.10 [3a884ed6] + UnPack v1.0.2 [b8865327] + UnicodePlots v3.7.2 [013be700] + UnsafeAtomics v0.3.0 ⌅ [e88e6eb3] + Zygote v0.6.75 [700de1a5] + ZygoteRules v0.2.7 [dad2f222] + LLVMExtra_jll v0.0.35+0 [efe28fd5] + OpenSpecFun_jll v0.5.6+0 [f50d1b31] + Rmath_jll v0.5.1+0 [0dad84c5] + ArgTools v1.1.1 [56f22d72] + Artifacts [2a0f44e3] + Base64 [8bf52ea8] + CRC32c [ade2ca70] + Dates [8ba89e20] + Distributed [f43a241f] + Downloads v1.6.0 [7b1f6079] + FileWatching [9fa8497b] + Future [b77e0a4c] + InteractiveUtils [4af54fe1] + LazyArtifacts [b27032c2] + LibCURL v0.6.4 [76f85450] + LibGit2 [8f399da3] + Libdl [37e2e46d] + LinearAlgebra [56ddb016] + Logging [d6f4376e] + Markdown [a63ad114] + Mmap [ca575930] + NetworkOptions v1.2.0 [44cfe95a] + Pkg v1.10.0 [de0858da] + Printf [3fa0cd96] + REPL [9a3f8284] + Random [ea8e919c] + SHA v0.7.0 [9e88b42a] + Serialization [1a1011a3] + SharedArrays [6462fe0b] + Sockets [2f01184e] + SparseArrays v1.10.0 [10745b16] + Statistics v1.10.0 [4607b0f0] + SuiteSparse [fa267f1f] + TOML v1.0.3 [a4e569a6] + Tar v1.10.0 [8dfed614] + Test [cf7118a7] + UUIDs [4ec0a83e] + Unicode [e66e0078] + CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] + LibCURL_jll v8.4.0+0 [e37daf67] + LibGit2_jll v1.6.4+0 [29816b5a] + LibSSH2_jll v1.11.0+1 [c8ffd9c3] + MbedTLS_jll v2.28.2+1 [14a3606d] + MozillaCACerts_jll v2023.1.10 [4536629a] + OpenBLAS_jll v0.3.23+4 [05823500] + OpenLibm_jll v0.8.1+4 [bea87d4a] + SuiteSparse_jll v7.2.1+1 [83775a58] + Zlib_jll v1.2.13+1 [8e850b90] + libblastrampoline_jll v5.11.0+0 [8e850ede] + nghttp2_jll v1.52.0+1 [3f19e933] + p7zip_jll v17.4.0+2 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m` Installation completed after 10.9s ################################################################################ # Precompilation # Precompiling PkgEval dependencies... ┌ Warning: Could not use exact versions of packages in manifest, re-resolving └ @ TestEnv ~/.julia/packages/TestEnv/tgnBf/src/julia-1.9/activate_set.jl:63 Precompiling package dependencies... Precompilation completed after 392.03s ################################################################################ # Testing # Testing DeepQLearning ┌ Warning: Could not use exact versions of packages in manifest, re-resolving └ @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1829 Status `/tmp/jl_M9kKzF/Project.toml` [fbb218c0] BSON v0.3.9 [d842c3ba] CommonRLInterface v0.3.3 [de0a67f4] DeepQLearning v0.7.2 [da5c29d0] EllipsisNotation v1.8.0 ⌅ [587475ba] Flux v0.14.25 [f3bd98c0] POMDPLinter v0.1.2 [355abbd5] POMDPModels v0.4.21 [7588e00f] POMDPTools v1.1.0 [a93abf59] POMDPs v1.0.0 [d96e819e] Parameters v0.12.3 [90137ffa] StaticArrays v1.9.12 [2913bbd2] StatsBase v0.34.4 [899adc3e] TensorBoardLogger v0.1.25 [37e2e46d] LinearAlgebra [de0858da] Printf [9a3f8284] Random [8dfed614] Test Status `/tmp/jl_M9kKzF/Manifest.toml` [621f4979] AbstractFFTs v1.5.0 [7d9f7c33] Accessors v0.1.41 [79e6a3ab] Adapt v4.2.0 [66dad0bd] AliasTables v1.1.3 [dce04be8] ArgCheck v2.4.0 [ec485272] ArnoldiMethod v0.4.0 [4fba245c] ArrayInterface v7.18.0 [a9b6321e] Atomix v1.1.0 [fbb218c0] BSON v0.3.9 [198e06fe] BangBang v0.4.3 [9718e550] Baselet v0.1.1 [e1450e63] BufferedStreams v1.2.2 [fa961155] CEnum v0.5.0 [082447d4] ChainRules v1.72.2 [d360d2e6] ChainRulesCore v1.25.1 [35d6a980] ColorSchemes v3.29.0 ⌅ [3da002f7] ColorTypes v0.11.5 ⌃ [c3611d14] ColorVectorSpace v0.10.0 ⌅ [5ae59095] Colors v0.12.11 [d842c3ba] CommonRLInterface v0.3.3 [bbf7d656] CommonSubexpressions v0.3.1 [f70d9fcc] CommonWorldInvalidations v1.0.0 [34da2185] Compat v4.16.0 [a81c6b42] Compose v0.9.5 [a33af91c] CompositionsBase v0.1.2 [187b0558] ConstructionBase v1.5.8 [6add18c4] ContextVariablesX v0.1.3 [d38c429a] Contour v0.6.3 [a8cc5b0e] Crayons v4.1.1 [9a962f9c] DataAPI v1.16.0 [a93c6f00] DataFrames v1.7.0 [864edb3b] DataStructures v0.18.20 [e2d170a0] DataValueInterfaces v1.0.0 [de0a67f4] DeepQLearning v0.7.2 [244e2a9f] DefineSingletons v0.1.2 [8bb1440f] DelimitedFiles v1.9.1 [163ba53b] DiffResults v1.1.0 [b552c78f] DiffRules v1.15.1 [31c24e10] Distributions v0.25.117 [ffbed154] DocStringExtensions v0.9.3 [da5c29d0] EllipsisNotation v1.8.0 [4e289a0a] EnumX v1.0.4 [cc61a311] FLoops v0.2.2 [b9860ae5] FLoopsBase v0.1.1 [5789e2e9] FileIO v1.16.6 [1a297f60] FillArrays v1.13.0 [53c48c17] FixedPointNumbers v0.8.5 ⌅ [587475ba] Flux v0.14.25 [f6369f11] ForwardDiff v0.10.38 ⌅ [d9f16b24] Functors v0.4.12 [0c68f7d7] GPUArrays v11.2.2 [46192b85] GPUArraysCore v0.2.0 [86223c79] Graphs v1.12.0 [076d061b] HashArrayMappedTries v0.2.0 [34004b35] HypergeometricFunctions v0.3.27 [7869d1d1] IRTools v0.4.14 [615f187c] IfElse v0.1.1 [a09fc81d] ImageCore v0.10.5 [d25df0c9] Inflate v0.1.5 [22cec73e] InitialValues v0.3.1 [842dd82b] InlineStrings v1.4.3 [3587e190] InverseFunctions v0.1.17 [41ab1584] InvertedIndices v1.3.1 [92d709cd] IrrationalConstants v0.2.4 [c8e1da08] IterTools v1.10.0 [82899510] IteratorInterfaceExtensions v1.0.0 [692b3bcd] JLLWrappers v1.7.0 [682c06a0] JSON v0.21.4 [b14d175d] JuliaVariables v0.2.4 [63c18a36] KernelAbstractions v0.9.34 [929cbde3] LLVM v9.2.0 [b964fa9f] LaTeXStrings v1.4.0 [2ab3a3ac] LogExpFunctions v0.3.29 [c2834f40] MLCore v1.0.0 ⌃ [7e8f7934] MLDataDevices v1.5.3 [d8e11817] MLStyle v0.4.17 [f1d291b0] MLUtils v0.4.7 [1914dd2f] MacroTools v0.5.15 [dbb5928d] MappedArrays v0.4.2 [299715c1] MarchingCubes v0.1.11 [442fdcdd] Measures v0.3.2 [128add7d] MicroCollections v0.2.0 [e1d29d7a] Missings v1.2.0 [e94cdb99] MosaicViews v0.3.4 [872c559c] NNlib v0.9.27 [77ba4419] NaNMath v1.1.2 [71a1bf82] NameResolution v0.1.5 [d9ec5142] NamedTupleTools v0.14.3 [6fe1bfb0] OffsetArrays v1.15.0 [0b1bfda6] OneHotArrays v0.2.6 ⌅ [3bd65402] Optimisers v0.3.4 [bac558e1] OrderedCollections v1.8.0 [90014a1f] PDMats v0.11.32 [f3bd98c0] POMDPLinter v0.1.2 [355abbd5] POMDPModels v0.4.21 [7588e00f] POMDPTools v1.1.0 [a93abf59] POMDPs v1.0.0 [5432bcbf] PaddedViews v0.5.12 [d96e819e] Parameters v0.12.3 [69de0a69] Parsers v2.8.1 [2dfb63ee] PooledArrays v1.4.3 [aea7be01] PrecompileTools v1.2.1 [21216c6a] Preferences v1.4.3 [8162dcfd] PrettyPrint v0.2.0 [08abe8d2] PrettyTables v2.4.0 [33c8b6b6] ProgressLogging v0.1.4 [92933f4c] ProgressMeter v1.10.2 [3349acd9] ProtoBuf v1.0.16 [43287f4e] PtrArrays v1.3.0 [1fd47b50] QuadGK v2.11.2 [c1ae055f] RealDot v0.1.0 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.0 [79098fc4] Rmath v0.8.0 [7e506255] ScopedValues v1.3.0 [91c51154] SentinelArrays v1.4.8 [efcf1570] Setfield v1.1.1 [605ecd9f] ShowCases v0.1.0 [699a6c99] SimpleTraits v0.9.4 [a2af1166] SortingAlgorithms v1.2.1 [dc90abb0] SparseInverseSubset v0.1.2 [276daf66] SpecialFunctions v2.5.0 [171d559e] SplittablesBase v0.1.15 [cae243ae] StackViews v0.1.1 [aedffcd0] Static v1.1.1 [0d7ed370] StaticArrayInterface v1.8.0 [90137ffa] StaticArrays v1.9.12 [1e83bf80] StaticArraysCore v1.4.3 [82ae8749] StatsAPI v1.7.0 [2913bbd2] StatsBase v0.34.4 [4c63d2b9] StatsFuns v1.3.2 [892a3eda] StringManipulation v0.4.1 ⌃ [09ab397b] StructArrays v0.6.21 [3783bdb8] TableTraits v1.0.1 [bd369af6] Tables v1.12.0 [899adc3e] TensorBoardLogger v0.1.25 [62fd8b95] TensorCore v0.1.1 [28d57a85] Transducers v0.4.84 [410a4b4d] Tricks v0.1.10 [3a884ed6] UnPack v1.0.2 [b8865327] UnicodePlots v3.7.2 [013be700] UnsafeAtomics v0.3.0 ⌅ [e88e6eb3] Zygote v0.6.75 [700de1a5] ZygoteRules v0.2.7 [dad2f222] LLVMExtra_jll v0.0.35+0 [efe28fd5] OpenSpecFun_jll v0.5.6+0 [f50d1b31] Rmath_jll v0.5.1+0 [0dad84c5] ArgTools v1.1.1 [56f22d72] Artifacts [2a0f44e3] Base64 [8bf52ea8] CRC32c [ade2ca70] Dates [8ba89e20] Distributed [f43a241f] Downloads v1.6.0 [7b1f6079] FileWatching [9fa8497b] Future [b77e0a4c] InteractiveUtils [4af54fe1] LazyArtifacts [b27032c2] LibCURL v0.6.4 [76f85450] LibGit2 [8f399da3] Libdl [37e2e46d] LinearAlgebra [56ddb016] Logging [d6f4376e] Markdown [a63ad114] Mmap [ca575930] NetworkOptions v1.2.0 [44cfe95a] Pkg v1.10.0 [de0858da] Printf [3fa0cd96] REPL [9a3f8284] Random [ea8e919c] SHA v0.7.0 [9e88b42a] Serialization [1a1011a3] SharedArrays [6462fe0b] Sockets [2f01184e] SparseArrays v1.10.0 [10745b16] Statistics v1.10.0 [4607b0f0] SuiteSparse [fa267f1f] TOML v1.0.3 [a4e569a6] Tar v1.10.0 [8dfed614] Test [cf7118a7] UUIDs [4ec0a83e] Unicode [e66e0078] CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] LibCURL_jll v8.4.0+0 [e37daf67] LibGit2_jll v1.6.4+0 [29816b5a] LibSSH2_jll v1.11.0+1 [c8ffd9c3] MbedTLS_jll v2.28.2+1 [14a3606d] MozillaCACerts_jll v2023.1.10 [4536629a] OpenBLAS_jll v0.3.23+4 [05823500] OpenLibm_jll v0.8.1+4 [bea87d4a] SuiteSparse_jll v7.2.1+1 [83775a58] Zlib_jll v1.2.13+1 [8e850b90] libblastrampoline_jll v5.11.0+0 [8e850ede] nghttp2_jll v1.52.0+1 [3f19e933] p7zip_jll v17.4.0+2 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. Testing Running tests... 500 / 10000 eps 0.901 | avgR 0.060 | Loss 8.983e-02 | Grad 8.332e-02 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.093 | Loss 1.270e-01 | Grad 7.790e-02 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.183 | Loss 2.951e-02 | Grad 5.044e-02 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.138 | Loss 3.293e-02 | Grad 7.990e-02 | EvalR -Inf Evaluation ... Avg Reward 0.50 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.680 | Loss 2.252e-02 | Grad 3.773e-02 | EvalR 0.500 3000 / 10000 eps 0.406 | avgR 0.976 | Loss 2.839e-02 | Grad 5.273e-02 | EvalR 0.500 3500 / 10000 eps 0.307 | avgR 1.312 | Loss 7.485e-02 | Grad 6.857e-02 | EvalR 0.500 4000 / 10000 eps 0.208 | avgR 1.511 | Loss 3.037e-02 | Grad 2.293e-02 | EvalR 0.500 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 1.831 | Loss 2.080e-03 | Grad 2.427e-02 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 1.712 | Loss 3.493e-03 | Grad 1.619e-02 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.071 | Loss 8.385e-04 | Grad 2.264e-02 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 2.075 | Loss 8.742e-05 | Grad 9.398e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.069 | Loss 1.078e-04 | Grad 6.722e-03 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.059 | Loss 1.554e-05 | Grad 2.244e-03 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.077 | Loss 5.779e-06 | Grad 1.974e-03 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.027 | Loss 2.645e-04 | Grad 9.478e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.075 | Loss 6.036e-04 | Grad 1.045e-02 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.079 | Loss 1.584e-05 | Grad 3.153e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.078 | Loss 5.876e-05 | Grad 6.030e-03 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.076 | Loss 3.092e-06 | Grad 1.183e-03 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time vanilla DQN | 2 2 1m46.9s 500 / 10000 eps 0.901 | avgR 0.016 | Loss 1.799e-01 | Grad 1.210e-01 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.202 | Loss 4.552e-02 | Grad 2.823e-02 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.312 | Loss 2.686e-02 | Grad 2.667e-02 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.460 | Loss 1.133e-02 | Grad 1.889e-02 | EvalR -Inf Evaluation ... Avg Reward 2.00 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.945 | Loss 3.055e-02 | Grad 6.853e-02 | EvalR 2.000 3000 / 10000 eps 0.406 | avgR 1.316 | Loss 2.220e-02 | Grad 4.416e-02 | EvalR 2.000 3500 / 10000 eps 0.307 | avgR 1.262 | Loss 4.717e-03 | Grad 1.700e-02 | EvalR 2.000 4000 / 10000 eps 0.208 | avgR 1.566 | Loss 1.362e-02 | Grad 5.354e-02 | EvalR 2.000 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 1.527 | Loss 6.164e-03 | Grad 4.548e-02 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 1.977 | Loss 1.338e-03 | Grad 1.478e-02 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.007 | Loss 3.005e-04 | Grad 9.331e-03 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 2.053 | Loss 3.108e-05 | Grad 6.268e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.060 | Loss 1.056e-04 | Grad 2.645e-03 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.075 | Loss 6.503e-06 | Grad 2.834e-03 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.059 | Loss 7.630e-05 | Grad 4.529e-03 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.047 | Loss 4.517e-04 | Grad 1.448e-02 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.072 | Loss 6.573e-05 | Grad 5.537e-03 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.026 | Loss 9.171e-05 | Grad 5.089e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.077 | Loss 2.357e-04 | Grad 7.728e-03 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.064 | Loss 9.482e-06 | Grad 2.591e-03 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time double Q DQN | 1 1 4.2s 500 / 10000 eps 0.901 | avgR 0.022 | Loss 1.579e-01 | Grad 1.098e-01 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.126 | Loss 6.146e-02 | Grad 1.398e-01 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.204 | Loss 2.478e-02 | Grad 4.144e-02 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.558 | Loss 1.452e-02 | Grad 3.113e-02 | EvalR -Inf Evaluation ... Avg Reward 1.90 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.685 | Loss 1.745e-02 | Grad 2.675e-02 | EvalR 1.900 3000 / 10000 eps 0.406 | avgR 0.545 | Loss 5.050e-02 | Grad 1.302e-02 | EvalR 1.900 3500 / 10000 eps 0.307 | avgR 1.300 | Loss 4.863e-02 | Grad 6.351e-02 | EvalR 1.900 4000 / 10000 eps 0.208 | avgR 1.601 | Loss 2.512e-02 | Grad 1.820e-02 | EvalR 1.900 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 1.772 | Loss 2.537e-03 | Grad 2.057e-02 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 1.017 | Loss 1.091e-01 | Grad 4.979e-02 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.025 | Loss 2.230e-02 | Grad 7.461e-02 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 2.031 | Loss 4.917e-04 | Grad 2.762e-02 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.020 | Loss 9.007e-05 | Grad 5.960e-03 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.069 | Loss 1.972e-04 | Grad 2.752e-03 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.025 | Loss 1.470e-05 | Grad 2.778e-03 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.044 | Loss 1.267e-04 | Grad 4.041e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.076 | Loss 2.042e-06 | Grad 8.946e-04 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.035 | Loss 4.819e-06 | Grad 1.994e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.062 | Loss 1.043e-04 | Grad 7.553e-03 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.066 | Loss 7.276e-04 | Grad 6.518e-03 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time dueling DQN | 1 1 17.0s 500 / 10000 eps 0.901 | avgR -0.052 | Loss 1.466e-01 | Grad 3.202e-01 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR -0.048 | Loss 1.222e-01 | Grad 7.484e-02 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.102 | Loss 4.429e-01 | Grad 6.757e-01 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.399 | Loss 1.952e-02 | Grad 5.952e-02 | EvalR -Inf Evaluation ... Avg Reward 1.90 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.891 | Loss 6.463e-03 | Grad 1.277e-02 | EvalR 1.900 3000 / 10000 eps 0.406 | avgR 1.047 | Loss 1.611e-02 | Grad 2.480e-02 | EvalR 1.900 3500 / 10000 eps 0.307 | avgR 1.130 | Loss 2.892e-02 | Grad 8.497e-02 | EvalR 1.900 4000 / 10000 eps 0.208 | avgR 1.648 | Loss 5.029e-02 | Grad 6.314e-02 | EvalR 1.900 Evaluation ... Avg Reward 2.00 | Avg Step 5.00 Saving new model with eval reward 2.000 4500 / 10000 eps 0.109 | avgR 1.660 | Loss 5.424e-03 | Grad 1.342e-02 | EvalR 2.000 5000 / 10000 eps 0.010 | avgR 1.839 | Loss 7.941e-03 | Grad 4.210e-02 | EvalR 2.000 5500 / 10000 eps 0.010 | avgR 1.875 | Loss 2.863e-03 | Grad 1.683e-02 | EvalR 2.000 6000 / 10000 eps 0.010 | avgR 2.060 | Loss 2.043e-04 | Grad 4.577e-03 | EvalR 2.000 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.056 | Loss 1.205e-04 | Grad 4.706e-03 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.070 | Loss 2.188e-06 | Grad 1.306e-03 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.070 | Loss 4.278e-05 | Grad 5.403e-04 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.009 | Loss 1.051e-05 | Grad 3.259e-04 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.055 | Loss 2.004e-04 | Grad 2.083e-03 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.075 | Loss 4.374e-05 | Grad 2.273e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.068 | Loss 5.125e-06 | Grad 4.667e-04 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.079 | Loss 2.737e-07 | Grad 3.526e-04 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time Prioritized DDQN | 1 1 5.2s 500 / 10000 eps 0.901 | avgR 0.013 | Loss 1.327e-03 | Grad 5.254e-04 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.234 | Loss 4.059e-04 | Grad 1.038e-03 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.455 | Loss 6.279e-04 | Grad 2.058e-03 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.545 | Loss 3.746e-04 | Grad 5.091e-04 | EvalR -Inf Evaluation ... Avg Reward 2.10 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.430 | Loss 3.222e-04 | Grad 7.060e-04 | EvalR 2.100 3000 / 10000 eps 0.406 | avgR 0.816 | Loss 3.062e-04 | Grad 9.304e-04 | EvalR 2.100 3500 / 10000 eps 0.307 | avgR 1.129 | Loss 4.255e-04 | Grad 1.044e-03 | EvalR 2.100 4000 / 10000 eps 0.208 | avgR 1.442 | Loss 2.954e-04 | Grad 4.788e-04 | EvalR 2.100 Evaluation ... Avg Reward 2.00 | Avg Step 5.00 Saving new model with eval reward 2.000 4500 / 10000 eps 0.109 | avgR 1.826 | Loss 2.094e-04 | Grad 1.208e-03 | EvalR 2.000 5000 / 10000 eps 0.010 | avgR 1.989 | Loss 1.231e-04 | Grad 5.995e-04 | EvalR 2.000 5500 / 10000 eps 0.010 | avgR 2.075 | Loss 5.726e-05 | Grad 3.946e-04 | EvalR 2.000 6000 / 10000 eps 0.010 | avgR 2.070 | Loss 1.883e-04 | Grad 7.924e-04 | EvalR 2.000 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 1.662 | Loss 6.492e-05 | Grad 2.869e-04 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 1.993 | Loss 2.349e-05 | Grad 3.586e-04 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 0.866 | Loss 3.731e-04 | Grad 9.799e-04 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 0.924 | Loss 2.208e-05 | Grad 4.151e-04 | EvalR 2.100 Evaluation ... Avg Reward 0.70 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 1.944 | Loss 1.187e-05 | Grad 4.310e-04 | EvalR 0.700 9000 / 10000 eps 0.010 | avgR 2.026 | Loss 4.070e-07 | Grad 1.144e-04 | EvalR 0.700 9500 / 10000 eps 0.010 | avgR 2.054 | Loss 1.116e-06 | Grad 1.072e-04 | EvalR 0.700 10000 / 10000 eps 0.010 | avgR 2.065 | Loss 1.548e-06 | Grad 1.725e-04 | EvalR 0.700 Restore model with eval reward 2.100 Test Summary: | Pass Total Time TestMDP DRQN | 1 1 1m20.5s 500 / 10000 eps 0.901 | avgR -4.632 | Loss 6.367e-02 | Grad 5.812e-02 | EvalR -Inf Evaluation ... Avg Reward 2.03 | Avg Step 57.06 1000 / 10000 eps 0.802 | avgR -4.000 | Loss 8.373e-02 | Grad 8.560e-03 | EvalR 2.030 Evaluation ... Avg Reward 4.18 | Avg Step 37.64 1500 / 10000 eps 0.703 | avgR -0.458 | Loss 1.499e-02 | Grad 3.197e-03 | EvalR 4.180 Evaluation ... Avg Reward 1.33 | Avg Step 49.06 2000 / 10000 eps 0.604 | avgR 0.286 | Loss 5.161e-02 | Grad 4.960e-03 | EvalR 1.330 Evaluation ... Avg Reward 1.60 | Avg Step 50.97 2500 / 10000 eps 0.505 | avgR 0.103 | Loss 3.062e-02 | Grad 1.008e-02 | EvalR 1.600 Evaluation ... Avg Reward 2.27 | Avg Step 53.79 3000 / 10000 eps 0.406 | avgR 0.326 | Loss 3.706e-02 | Grad 8.228e-03 | EvalR 2.270 Evaluation ... Avg Reward 4.25 | Avg Step 33.17 Saving new model with eval reward 4.250 3500 / 10000 eps 0.307 | avgR 0.804 | Loss 1.214e-02 | Grad 5.378e-03 | EvalR 4.250 Evaluation ... Avg Reward 2.05 | Avg Step 52.54 4000 / 10000 eps 0.208 | avgR 1.275 | Loss 6.529e-02 | Grad 9.636e-03 | EvalR 2.050 Evaluation ... Avg Reward 3.23 | Avg Step 44.38 4500 / 10000 eps 0.109 | avgR 2.108 | Loss 3.367e-02 | Grad 1.560e-02 | EvalR 3.230 Evaluation ... Avg Reward 3.00 | Avg Step 52.62 5000 / 10000 eps 0.010 | avgR 2.059 | Loss 9.830e-02 | Grad 1.678e-02 | EvalR 3.000 Evaluation ... Avg Reward 4.03 | Avg Step 49.42 5500 / 10000 eps 0.010 | avgR 1.882 | Loss 3.033e-02 | Grad 9.217e-03 | EvalR 4.030 Evaluation ... Avg Reward -1.30 | Avg Step 53.57 6000 / 10000 eps 0.010 | avgR 2.176 | Loss 1.682e-02 | Grad 5.025e-03 | EvalR -1.300 Evaluation ... Avg Reward 3.23 | Avg Step 49.85 6500 / 10000 eps 0.010 | avgR 2.196 | Loss 7.109e-02 | Grad 1.213e-02 | EvalR 3.230 Evaluation ... Avg Reward 1.80 | Avg Step 47.21 7000 / 10000 eps 0.010 | avgR 2.794 | Loss 6.279e-02 | Grad 9.106e-03 | EvalR 1.800 Evaluation ... Avg Reward 2.24 | Avg Step 57.21 7500 / 10000 eps 0.010 | avgR 2.647 | Loss 7.553e-02 | Grad 1.191e-02 | EvalR 2.240 Evaluation ... Avg Reward 1.79 | Avg Step 43.06 8000 / 10000 eps 0.010 | avgR 2.931 | Loss 9.441e-04 | Grad 7.008e-03 | EvalR 1.790 Evaluation ... Avg Reward 1.29 | Avg Step 41.10 8500 / 10000 eps 0.010 | avgR 2.784 | Loss 1.968e-02 | Grad 5.099e-03 | EvalR 1.290 Evaluation ... Avg Reward 4.20 | Avg Step 47.77 9000 / 10000 eps 0.010 | avgR 3.157 | Loss 7.836e-02 | Grad 1.187e-02 | EvalR 4.200 Evaluation ... Avg Reward 3.81 | Avg Step 34.95 9500 / 10000 eps 0.010 | avgR 3.324 | Loss 4.519e-02 | Grad 1.004e-02 | EvalR 3.810 Evaluation ... Avg Reward 3.12 | Avg Step 47.82 10000 / 10000 eps 0.010 | avgR 3.412 | Loss 4.582e-02 | Grad 2.400e-02 | EvalR 3.120 Restore model with eval reward 4.250 Test Summary: | Pass Total Time GridWorld DDRQN | 1 1 41.8s 500 / 10000 eps 0.901 | avgR -25.705 | Loss 6.795e-03 | Grad 9.077e-03 | EvalR -Inf Evaluation ... Avg Reward 1.01 | Avg Step 101.00 1000 / 10000 eps 0.802 | avgR -25.310 | Loss 1.164e-02 | Grad 1.487e-02 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 1500 / 10000 eps 0.703 | avgR -25.411 | Loss 5.232e-03 | Grad 2.526e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 2000 / 10000 eps 0.604 | avgR -23.046 | Loss 4.594e-03 | Grad 6.015e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 2500 / 10000 eps 0.505 | avgR -22.020 | Loss 7.293e-03 | Grad 4.971e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 3000 / 10000 eps 0.406 | avgR -20.467 | Loss 6.843e-03 | Grad 4.465e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 Saving new model with eval reward 1.010 3500 / 10000 eps 0.307 | avgR -19.177 | Loss 7.364e-03 | Grad 3.389e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 4000 / 10000 eps 0.208 | avgR -17.733 | Loss 6.681e-03 | Grad 3.894e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 4500 / 10000 eps 0.109 | avgR -16.384 | Loss 3.760e-03 | Grad 2.973e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 5000 / 10000 eps 0.010 | avgR -14.810 | Loss 1.052e-02 | Grad 4.111e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 5500 / 10000 eps 0.010 | avgR -13.433 | Loss 4.597e-03 | Grad 9.595e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 6000 / 10000 eps 0.010 | avgR -12.264 | Loss 0.000e+00 | Grad 0.000e+00 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 Saving new model with eval reward 1.010 6500 / 10000 eps 0.010 | avgR -11.288 | Loss 2.494e-03 | Grad 5.838e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 7000 / 10000 eps 0.010 | avgR -10.436 | Loss 2.689e-03 | Grad 1.229e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 7500 / 10000 eps 0.010 | avgR -9.745 | Loss 1.248e-02 | Grad 5.993e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 8000 / 10000 eps 0.010 | avgR -9.082 | Loss 5.788e-04 | Grad 2.805e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 8500 / 10000 eps 0.010 | avgR -8.505 | Loss 2.393e-03 | Grad 9.529e-04 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 9000 / 10000 eps 0.010 | avgR -7.981 | Loss 5.087e-03 | Grad 4.824e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 Saving new model with eval reward 1.010 9500 / 10000 eps 0.010 | avgR -7.511 | Loss 1.449e-03 | Grad 2.450e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 10000 / 10000 eps 0.010 | avgR -7.089 | Loss 8.018e-03 | Grad 1.084e-02 | EvalR 1.010 Restore model with eval reward 1.010 Test Summary: | Pass Total Time TigerPOMDP DDRQN | 1 1 18.4s Test Summary: | Pass Total Time Static Array Env | 1 1 4.7s here Test Summary: | Pass Total Time Common RL Env | 1 1 2.7s 500 / 10000 eps 0.901 | avgR -2.750 | Loss 1.846e+00 | Grad 4.444e-01 | EvalR -Inf Evaluation ... Avg Reward -0.75 | Avg Step 73.95 1000 / 10000 eps 0.802 | avgR -1.097 | Loss 9.863e-01 | Grad 7.946e-01 | EvalR -0.750 Evaluation ... Avg Reward 1.51 | Avg Step 67.46 1500 / 10000 eps 0.703 | avgR -1.000 | Loss 5.639e-01 | Grad 9.882e-01 | EvalR 1.510 Evaluation ... Avg Reward 3.23 | Avg Step 53.66 2000 / 10000 eps 0.604 | avgR -1.000 | Loss 1.945e-01 | Grad 5.643e-01 | EvalR 3.230 Evaluation ... Avg Reward -6.46 | Avg Step 36.21 2500 / 10000 eps 0.505 | avgR -1.597 | Loss 3.884e-01 | Grad 2.384e-01 | EvalR -6.460 Evaluation ... Avg Reward -3.15 | Avg Step 55.46 3000 / 10000 eps 0.406 | avgR -1.685 | Loss 5.885e-01 | Grad 5.739e-01 | EvalR -3.150 Evaluation ... Avg Reward -1.17 | Avg Step 64.03 Saving new model with eval reward -1.170 3500 / 10000 eps 0.307 | avgR -1.500 | Loss 5.532e-01 | Grad 9.914e-01 | EvalR -1.170 Evaluation ... Avg Reward 0.30 | Avg Step 70.63 4000 / 10000 eps 0.208 | avgR -1.402 | Loss 4.396e-01 | Grad 2.155e-01 | EvalR 0.300 Evaluation ... Avg Reward -1.61 | Avg Step 72.15 4500 / 10000 eps 0.109 | avgR -1.168 | Loss 1.427e-01 | Grad 1.961e-01 | EvalR -1.610 Evaluation ... Avg Reward -3.74 | Avg Step 51.49 5000 / 10000 eps 0.010 | avgR -0.265 | Loss 2.963e-01 | Grad 7.521e-02 | EvalR -3.740 Evaluation ... Avg Reward -0.42 | Avg Step 69.38 5500 / 10000 eps 0.010 | avgR -0.049 | Loss 1.012e+00 | Grad 7.705e-01 | EvalR -0.420 Evaluation ... Avg Reward 0.73 | Avg Step 64.38 6000 / 10000 eps 0.010 | avgR -0.059 | Loss 6.648e-01 | Grad 3.683e-01 | EvalR 0.730 Evaluation ... Avg Reward 0.56 | Avg Step 71.17 Saving new model with eval reward 0.560 6500 / 10000 eps 0.010 | avgR 0.402 | Loss 8.137e-01 | Grad 1.913e-01 | EvalR 0.560 Evaluation ... Avg Reward 5.36 | Avg Step 41.73 7000 / 10000 eps 0.010 | avgR 1.245 | Loss 7.592e-01 | Grad 9.858e-02 | EvalR 5.360 Evaluation ... Avg Reward 0.88 | Avg Step 72.76 7500 / 10000 eps 0.010 | avgR 2.127 | Loss 7.492e-01 | Grad 1.808e-01 | EvalR 0.880 Evaluation ... Avg Reward 1.26 | Avg Step 29.69 8000 / 10000 eps 0.010 | avgR 3.294 | Loss 1.024e+00 | Grad 2.205e-01 | EvalR 1.260 Evaluation ... Avg Reward 2.56 | Avg Step 53.71 8500 / 10000 eps 0.010 | avgR 3.598 | Loss 5.535e-01 | Grad 2.380e-01 | EvalR 2.560 Evaluation ... Avg Reward 4.23 | Avg Step 14.64 9000 / 10000 eps 0.010 | avgR 3.461 | Loss 2.985e-01 | Grad 1.470e-01 | EvalR 4.230 Evaluation ... Avg Reward -0.24 | Avg Step 67.66 9500 / 10000 eps 0.010 | avgR 2.990 | Loss 2.295e-01 | Grad 3.904e-01 | EvalR -0.240 Evaluation ... Avg Reward 0.04 | Avg Step 52.92 10000 / 10000 eps 0.010 | avgR 3.265 | Loss 6.528e-01 | Grad 2.161e-01 | EvalR 0.040 Restore model with eval reward 0.560 Total discounted reward for 1 simulation: 0.0 500 / 10000 eps 0.901 | avgR -2.750 | Loss 1.846e+00 | Grad 4.444e-01 | EvalR -Inf Evaluation ... Avg Reward -0.75 | Avg Step 73.95 1000 / 10000 eps 0.802 | avgR -1.097 | Loss 9.863e-01 | Grad 7.946e-01 | EvalR -0.750 Evaluation ... Avg Reward 1.51 | Avg Step 67.46 1500 / 10000 eps 0.703 | avgR -1.000 | Loss 5.639e-01 | Grad 9.882e-01 | EvalR 1.510 Evaluation ... Avg Reward 3.23 | Avg Step 53.66 2000 / 10000 eps 0.604 | avgR -1.000 | Loss 1.945e-01 | Grad 5.643e-01 | EvalR 3.230 Evaluation ... Avg Reward -6.46 | Avg Step 36.21 2500 / 10000 eps 0.505 | avgR -1.597 | Loss 3.884e-01 | Grad 2.384e-01 | EvalR -6.460 Evaluation ... Avg Reward -3.15 | Avg Step 55.46 3000 / 10000 eps 0.406 | avgR -1.685 | Loss 5.885e-01 | Grad 5.739e-01 | EvalR -3.150 Evaluation ... Avg Reward -1.17 | Avg Step 64.03 Saving new model with eval reward -1.170 3500 / 10000 eps 0.307 | avgR -1.500 | Loss 5.532e-01 | Grad 9.914e-01 | EvalR -1.170 Evaluation ... Avg Reward 0.30 | Avg Step 70.63 4000 / 10000 eps 0.208 | avgR -1.402 | Loss 4.396e-01 | Grad 2.155e-01 | EvalR 0.300 Evaluation ... Avg Reward -1.61 | Avg Step 72.15 4500 / 10000 eps 0.109 | avgR -1.168 | Loss 1.427e-01 | Grad 1.961e-01 | EvalR -1.610 Evaluation ... Avg Reward -3.74 | Avg Step 51.49 5000 / 10000 eps 0.010 | avgR -0.265 | Loss 2.963e-01 | Grad 7.521e-02 | EvalR -3.740 Evaluation ... Avg Reward -0.42 | Avg Step 69.38 5500 / 10000 eps 0.010 | avgR -0.049 | Loss 1.012e+00 | Grad 7.705e-01 | EvalR -0.420 Evaluation ... Avg Reward 0.73 | Avg Step 64.38 6000 / 10000 eps 0.010 | avgR -0.059 | Loss 6.648e-01 | Grad 3.683e-01 | EvalR 0.730 Evaluation ... Avg Reward 0.56 | Avg Step 71.17 Saving new model with eval reward 0.560 6500 / 10000 eps 0.010 | avgR 0.402 | Loss 8.137e-01 | Grad 1.913e-01 | EvalR 0.560 Evaluation ... Avg Reward 5.36 | Avg Step 41.73 7000 / 10000 eps 0.010 | avgR 1.245 | Loss 7.592e-01 | Grad 9.858e-02 | EvalR 5.360 Evaluation ... Avg Reward 0.88 | Avg Step 72.76 7500 / 10000 eps 0.010 | avgR 2.127 | Loss 7.492e-01 | Grad 1.808e-01 | EvalR 0.880 Evaluation ... Avg Reward 1.26 | Avg Step 29.69 8000 / 10000 eps 0.010 | avgR 3.294 | Loss 1.024e+00 | Grad 2.205e-01 | EvalR 1.260 Evaluation ... Avg Reward 2.56 | Avg Step 53.71 8500 / 10000 eps 0.010 | avgR 3.598 | Loss 5.535e-01 | Grad 2.380e-01 | EvalR 2.560 Evaluation ... Avg Reward 4.23 | Avg Step 14.64 9000 / 10000 eps 0.010 | avgR 3.461 | Loss 2.985e-01 | Grad 1.470e-01 | EvalR 4.230 Evaluation ... Avg Reward -0.24 | Avg Step 67.66 9500 / 10000 eps 0.010 | avgR 2.990 | Loss 2.295e-01 | Grad 3.904e-01 | EvalR -0.240 Evaluation ... Avg Reward 0.04 | Avg Step 52.92 10000 / 10000 eps 0.010 | avgR 3.265 | Loss 6.528e-01 | Grad 2.161e-01 | EvalR 0.040 Restore model with eval reward 0.560 Test Summary: |Time README Examples | None 8.5s Testing DeepQLearning tests passed Testing completed after 336.92s PkgEval succeeded after 811.6s