Package evaluation of DeepQLearning on Julia 1.11.4 (a71dd056e0*) started at 2025-04-08T20:57:23.618 ################################################################################ # Set-up # Installing PkgEval dependencies (TestEnv)... Set-up completed after 8.5s ################################################################################ # Installation # Installing DeepQLearning... Resolving package versions... Updating `~/.julia/environments/v1.11/Project.toml` [de0a67f4] + DeepQLearning v0.7.2 Updating `~/.julia/environments/v1.11/Manifest.toml` [621f4979] + AbstractFFTs v1.5.0 [7d9f7c33] + Accessors v0.1.42 [79e6a3ab] + Adapt v4.3.0 [66dad0bd] + AliasTables v1.1.3 [dce04be8] + ArgCheck v2.5.0 [ec485272] + ArnoldiMethod v0.4.0 [4fba245c] + ArrayInterface v7.18.0 [a9b6321e] + Atomix v1.1.1 [fbb218c0] + BSON v0.3.9 [198e06fe] + BangBang v0.4.4 [9718e550] + Baselet v0.1.1 [e1450e63] + BufferedStreams v1.2.2 [fa961155] + CEnum v0.5.0 [082447d4] + ChainRules v1.72.3 [d360d2e6] + ChainRulesCore v1.25.1 [35d6a980] + ColorSchemes v3.29.0 [3da002f7] + ColorTypes v0.12.1 [c3611d14] + ColorVectorSpace v0.11.0 [5ae59095] + Colors v0.13.0 [d842c3ba] + CommonRLInterface v0.3.3 [bbf7d656] + CommonSubexpressions v0.3.1 [f70d9fcc] + CommonWorldInvalidations v1.0.0 [34da2185] + Compat v4.16.0 [a33af91c] + CompositionsBase v0.1.2 [187b0558] + ConstructionBase v1.5.8 [6add18c4] + ContextVariablesX v0.1.3 [d38c429a] + Contour v0.6.3 [a8cc5b0e] + Crayons v4.1.1 [9a962f9c] + DataAPI v1.16.0 [a93c6f00] + DataFrames v1.7.0 [864edb3b] + DataStructures v0.18.22 [e2d170a0] + DataValueInterfaces v1.0.0 [de0a67f4] + DeepQLearning v0.7.2 [244e2a9f] + DefineSingletons v0.1.2 [8bb1440f] + DelimitedFiles v1.9.1 [163ba53b] + DiffResults v1.1.0 [b552c78f] + DiffRules v1.15.1 [31c24e10] + Distributions v0.25.118 [ffbed154] + DocStringExtensions v0.9.4 [da5c29d0] + EllipsisNotation v1.8.0 [4e289a0a] + EnumX v1.0.5 [cc61a311] + FLoops v0.2.2 [b9860ae5] + FLoopsBase v0.1.1 [5789e2e9] + FileIO v1.17.0 [1a297f60] + FillArrays v1.13.0 [53c48c17] + FixedPointNumbers v0.8.5 ⌅ [587475ba] + Flux v0.14.25 [f6369f11] + ForwardDiff v1.0.1 ⌅ [d9f16b24] + Functors v0.4.12 [0c68f7d7] + GPUArrays v11.2.2 [46192b85] + GPUArraysCore v0.2.0 [86223c79] + Graphs v1.12.1 [076d061b] + HashArrayMappedTries v0.2.0 [34004b35] + HypergeometricFunctions v0.3.28 [7869d1d1] + IRTools v0.4.14 [615f187c] + IfElse v0.1.1 [a09fc81d] + ImageCore v0.10.5 [d25df0c9] + Inflate v0.1.5 [22cec73e] + InitialValues v0.3.1 [842dd82b] + InlineStrings v1.4.3 [3587e190] + InverseFunctions v0.1.17 [41ab1584] + InvertedIndices v1.3.1 [92d709cd] + IrrationalConstants v0.2.4 [82899510] + IteratorInterfaceExtensions v1.0.0 [692b3bcd] + JLLWrappers v1.7.0 [b14d175d] + JuliaVariables v0.2.4 [63c18a36] + KernelAbstractions v0.9.34 [929cbde3] + LLVM v9.2.0 [b964fa9f] + LaTeXStrings v1.4.0 [2ab3a3ac] + LogExpFunctions v0.3.29 [c2834f40] + MLCore v1.0.0 ⌃ [7e8f7934] + MLDataDevices v1.5.3 [d8e11817] + MLStyle v0.4.17 [f1d291b0] + MLUtils v0.4.8 [1914dd2f] + MacroTools v0.5.15 [dbb5928d] + MappedArrays v0.4.2 [299715c1] + MarchingCubes v0.1.11 [128add7d] + MicroCollections v0.2.0 [e1d29d7a] + Missings v1.2.0 [e94cdb99] + MosaicViews v0.3.4 [872c559c] + NNlib v0.9.30 [77ba4419] + NaNMath v1.1.3 [71a1bf82] + NameResolution v0.1.5 [d9ec5142] + NamedTupleTools v0.14.3 [6fe1bfb0] + OffsetArrays v1.16.0 [0b1bfda6] + OneHotArrays v0.2.7 ⌅ [3bd65402] + Optimisers v0.3.4 [bac558e1] + OrderedCollections v1.8.0 [90014a1f] + PDMats v0.11.33 [f3bd98c0] + POMDPLinter v0.1.2 [7588e00f] + POMDPTools v1.1.0 [a93abf59] + POMDPs v1.0.0 [5432bcbf] + PaddedViews v0.5.12 [d96e819e] + Parameters v0.12.3 [2dfb63ee] + PooledArrays v1.4.3 ⌅ [aea7be01] + PrecompileTools v1.2.1 [21216c6a] + Preferences v1.4.3 [8162dcfd] + PrettyPrint v0.2.0 [08abe8d2] + PrettyTables v2.4.0 [33c8b6b6] + ProgressLogging v0.1.4 [92933f4c] + ProgressMeter v1.10.4 [3349acd9] + ProtoBuf v1.1.1 [43287f4e] + PtrArrays v1.3.0 [1fd47b50] + QuadGK v2.11.2 [c1ae055f] + RealDot v0.1.0 [189a3867] + Reexport v1.2.2 [ae029012] + Requires v1.3.1 [79098fc4] + Rmath v0.8.0 [7e506255] + ScopedValues v1.3.0 [91c51154] + SentinelArrays v1.4.8 [efcf1570] + Setfield v1.1.2 [605ecd9f] + ShowCases v0.1.0 [699a6c99] + SimpleTraits v0.9.4 [a2af1166] + SortingAlgorithms v1.2.1 [dc90abb0] + SparseInverseSubset v0.1.2 [276daf66] + SpecialFunctions v2.5.0 [171d559e] + SplittablesBase v0.1.15 [cae243ae] + StackViews v0.1.1 [aedffcd0] + Static v1.2.0 [0d7ed370] + StaticArrayInterface v1.8.0 [90137ffa] + StaticArrays v1.9.13 [1e83bf80] + StaticArraysCore v1.4.3 [10745b16] + Statistics v1.11.1 [82ae8749] + StatsAPI v1.7.0 [2913bbd2] + StatsBase v0.34.4 [4c63d2b9] + StatsFuns v1.4.0 [892a3eda] + StringManipulation v0.4.1 [09ab397b] + StructArrays v0.7.1 [3783bdb8] + TableTraits v1.0.1 [bd369af6] + Tables v1.12.0 [899adc3e] + TensorBoardLogger v0.1.25 [62fd8b95] + TensorCore v0.1.1 [28d57a85] + Transducers v0.4.84 [410a4b4d] + Tricks v0.1.10 [3a884ed6] + UnPack v1.0.2 [b8865327] + UnicodePlots v3.7.2 [013be700] + UnsafeAtomics v0.3.0 ⌅ [e88e6eb3] + Zygote v0.6.76 [700de1a5] + ZygoteRules v0.2.7 [dad2f222] + LLVMExtra_jll v0.0.35+0 [efe28fd5] + OpenSpecFun_jll v0.5.6+0 [f50d1b31] + Rmath_jll v0.5.1+0 [0dad84c5] + ArgTools v1.1.2 [56f22d72] + Artifacts v1.11.0 [2a0f44e3] + Base64 v1.11.0 [8bf52ea8] + CRC32c v1.11.0 [ade2ca70] + Dates v1.11.0 [8ba89e20] + Distributed v1.11.0 [f43a241f] + Downloads v1.6.0 [7b1f6079] + FileWatching v1.11.0 [9fa8497b] + Future v1.11.0 [b77e0a4c] + InteractiveUtils v1.11.0 [4af54fe1] + LazyArtifacts v1.11.0 [b27032c2] + LibCURL v0.6.4 [76f85450] + LibGit2 v1.11.0 [8f399da3] + Libdl v1.11.0 [37e2e46d] + LinearAlgebra v1.11.0 [56ddb016] + Logging v1.11.0 [d6f4376e] + Markdown v1.11.0 [a63ad114] + Mmap v1.11.0 [ca575930] + NetworkOptions v1.2.0 [44cfe95a] + Pkg v1.11.0 [de0858da] + Printf v1.11.0 [9a3f8284] + Random v1.11.0 [ea8e919c] + SHA v0.7.0 [9e88b42a] + Serialization v1.11.0 [1a1011a3] + SharedArrays v1.11.0 [6462fe0b] + Sockets v1.11.0 [2f01184e] + SparseArrays v1.11.0 [4607b0f0] + SuiteSparse [fa267f1f] + TOML v1.0.3 [a4e569a6] + Tar v1.10.0 [8dfed614] + Test v1.11.0 [cf7118a7] + UUIDs v1.11.0 [4ec0a83e] + Unicode v1.11.0 [e66e0078] + CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] + LibCURL_jll v8.6.0+0 [e37daf67] + LibGit2_jll v1.7.2+0 [29816b5a] + LibSSH2_jll v1.11.0+1 [c8ffd9c3] + MbedTLS_jll v2.28.6+0 [14a3606d] + MozillaCACerts_jll v2023.12.12 [4536629a] + OpenBLAS_jll v0.3.27+1 [05823500] + OpenLibm_jll v0.8.5+0 [bea87d4a] + SuiteSparse_jll v7.7.0+0 [83775a58] + Zlib_jll v1.2.13+1 [8e850b90] + libblastrampoline_jll v5.11.0+0 [8e850ede] + nghttp2_jll v1.59.0+0 [3f19e933] + p7zip_jll v17.4.0+2 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated -m` Installation completed after 5.53s ################################################################################ # Precompilation # Precompiling PkgEval dependencies... ┌ Warning: Could not use exact versions of packages in manifest, re-resolving └ @ TestEnv ~/.julia/packages/TestEnv/tgnBf/src/julia-1.11/activate_set.jl:63 Precompiling package dependencies... Precompilation completed after 574.29s ################################################################################ # Testing # Testing DeepQLearning ┌ Warning: Could not use exact versions of packages in manifest, re-resolving └ @ Pkg.Operations /opt/julia/share/julia/stdlib/v1.11/Pkg/src/Operations.jl:1920 Status `/tmp/jl_KItaAP/Project.toml` [fbb218c0] BSON v0.3.9 [d842c3ba] CommonRLInterface v0.3.3 [de0a67f4] DeepQLearning v0.7.2 [da5c29d0] EllipsisNotation v1.8.0 ⌅ [587475ba] Flux v0.14.25 [f3bd98c0] POMDPLinter v0.1.2 [355abbd5] POMDPModels v0.4.21 [7588e00f] POMDPTools v1.1.0 [a93abf59] POMDPs v1.0.0 [d96e819e] Parameters v0.12.3 [90137ffa] StaticArrays v1.9.13 [2913bbd2] StatsBase v0.34.4 [899adc3e] TensorBoardLogger v0.1.25 [37e2e46d] LinearAlgebra v1.11.0 [de0858da] Printf v1.11.0 [9a3f8284] Random v1.11.0 [8dfed614] Test v1.11.0 Status `/tmp/jl_KItaAP/Manifest.toml` [621f4979] AbstractFFTs v1.5.0 [7d9f7c33] Accessors v0.1.42 [79e6a3ab] Adapt v4.3.0 [66dad0bd] AliasTables v1.1.3 [dce04be8] ArgCheck v2.5.0 [ec485272] ArnoldiMethod v0.4.0 [4fba245c] ArrayInterface v7.18.0 [a9b6321e] Atomix v1.1.1 [fbb218c0] BSON v0.3.9 [198e06fe] BangBang v0.4.4 [9718e550] Baselet v0.1.1 [e1450e63] BufferedStreams v1.2.2 [fa961155] CEnum v0.5.0 [082447d4] ChainRules v1.72.3 [d360d2e6] ChainRulesCore v1.25.1 [35d6a980] ColorSchemes v3.29.0 ⌅ [3da002f7] ColorTypes v0.11.5 ⌃ [c3611d14] ColorVectorSpace v0.10.0 ⌅ [5ae59095] Colors v0.12.11 [d842c3ba] CommonRLInterface v0.3.3 [bbf7d656] CommonSubexpressions v0.3.1 [f70d9fcc] CommonWorldInvalidations v1.0.0 [34da2185] Compat v4.16.0 [a81c6b42] Compose v0.9.5 [a33af91c] CompositionsBase v0.1.2 [187b0558] ConstructionBase v1.5.8 [6add18c4] ContextVariablesX v0.1.3 [d38c429a] Contour v0.6.3 [a8cc5b0e] Crayons v4.1.1 [9a962f9c] DataAPI v1.16.0 [a93c6f00] DataFrames v1.7.0 [864edb3b] DataStructures v0.18.22 [e2d170a0] DataValueInterfaces v1.0.0 [de0a67f4] DeepQLearning v0.7.2 [244e2a9f] DefineSingletons v0.1.2 [8bb1440f] DelimitedFiles v1.9.1 [163ba53b] DiffResults v1.1.0 [b552c78f] DiffRules v1.15.1 [31c24e10] Distributions v0.25.118 [ffbed154] DocStringExtensions v0.9.4 [da5c29d0] EllipsisNotation v1.8.0 [4e289a0a] EnumX v1.0.5 [cc61a311] FLoops v0.2.2 [b9860ae5] FLoopsBase v0.1.1 [5789e2e9] FileIO v1.17.0 [1a297f60] FillArrays v1.13.0 [53c48c17] FixedPointNumbers v0.8.5 ⌅ [587475ba] Flux v0.14.25 [f6369f11] ForwardDiff v1.0.1 ⌅ [d9f16b24] Functors v0.4.12 [0c68f7d7] GPUArrays v11.2.2 [46192b85] GPUArraysCore v0.2.0 [86223c79] Graphs v1.12.1 [076d061b] HashArrayMappedTries v0.2.0 [34004b35] HypergeometricFunctions v0.3.28 [7869d1d1] IRTools v0.4.14 [615f187c] IfElse v0.1.1 [a09fc81d] ImageCore v0.10.5 [d25df0c9] Inflate v0.1.5 [22cec73e] InitialValues v0.3.1 [842dd82b] InlineStrings v1.4.3 [3587e190] InverseFunctions v0.1.17 [41ab1584] InvertedIndices v1.3.1 [92d709cd] IrrationalConstants v0.2.4 [c8e1da08] IterTools v1.10.0 [82899510] IteratorInterfaceExtensions v1.0.0 [692b3bcd] JLLWrappers v1.7.0 [682c06a0] JSON v0.21.4 [b14d175d] JuliaVariables v0.2.4 [63c18a36] KernelAbstractions v0.9.34 [929cbde3] LLVM v9.2.0 [b964fa9f] LaTeXStrings v1.4.0 [2ab3a3ac] LogExpFunctions v0.3.29 [c2834f40] MLCore v1.0.0 ⌃ [7e8f7934] MLDataDevices v1.5.3 [d8e11817] MLStyle v0.4.17 [f1d291b0] MLUtils v0.4.8 [1914dd2f] MacroTools v0.5.15 [dbb5928d] MappedArrays v0.4.2 [299715c1] MarchingCubes v0.1.11 [442fdcdd] Measures v0.3.2 [128add7d] MicroCollections v0.2.0 [e1d29d7a] Missings v1.2.0 [e94cdb99] MosaicViews v0.3.4 [872c559c] NNlib v0.9.30 [77ba4419] NaNMath v1.1.3 [71a1bf82] NameResolution v0.1.5 [d9ec5142] NamedTupleTools v0.14.3 [6fe1bfb0] OffsetArrays v1.16.0 [0b1bfda6] OneHotArrays v0.2.7 ⌅ [3bd65402] Optimisers v0.3.4 [bac558e1] OrderedCollections v1.8.0 [90014a1f] PDMats v0.11.33 [f3bd98c0] POMDPLinter v0.1.2 [355abbd5] POMDPModels v0.4.21 [7588e00f] POMDPTools v1.1.0 [a93abf59] POMDPs v1.0.0 [5432bcbf] PaddedViews v0.5.12 [d96e819e] Parameters v0.12.3 [69de0a69] Parsers v2.8.1 [2dfb63ee] PooledArrays v1.4.3 ⌅ [aea7be01] PrecompileTools v1.2.1 [21216c6a] Preferences v1.4.3 [8162dcfd] PrettyPrint v0.2.0 [08abe8d2] PrettyTables v2.4.0 [33c8b6b6] ProgressLogging v0.1.4 [92933f4c] ProgressMeter v1.10.4 [3349acd9] ProtoBuf v1.1.1 [43287f4e] PtrArrays v1.3.0 [1fd47b50] QuadGK v2.11.2 [c1ae055f] RealDot v0.1.0 [189a3867] Reexport v1.2.2 [ae029012] Requires v1.3.1 [79098fc4] Rmath v0.8.0 [7e506255] ScopedValues v1.3.0 [91c51154] SentinelArrays v1.4.8 [efcf1570] Setfield v1.1.2 [605ecd9f] ShowCases v0.1.0 [699a6c99] SimpleTraits v0.9.4 [a2af1166] SortingAlgorithms v1.2.1 [dc90abb0] SparseInverseSubset v0.1.2 [276daf66] SpecialFunctions v2.5.0 [171d559e] SplittablesBase v0.1.15 [cae243ae] StackViews v0.1.1 [aedffcd0] Static v1.2.0 [0d7ed370] StaticArrayInterface v1.8.0 [90137ffa] StaticArrays v1.9.13 [1e83bf80] StaticArraysCore v1.4.3 [10745b16] Statistics v1.11.1 [82ae8749] StatsAPI v1.7.0 [2913bbd2] StatsBase v0.34.4 [4c63d2b9] StatsFuns v1.4.0 [892a3eda] StringManipulation v0.4.1 [09ab397b] StructArrays v0.7.1 [3783bdb8] TableTraits v1.0.1 [bd369af6] Tables v1.12.0 [899adc3e] TensorBoardLogger v0.1.25 [62fd8b95] TensorCore v0.1.1 [28d57a85] Transducers v0.4.84 [410a4b4d] Tricks v0.1.10 [3a884ed6] UnPack v1.0.2 [b8865327] UnicodePlots v3.7.2 [013be700] UnsafeAtomics v0.3.0 ⌅ [e88e6eb3] Zygote v0.6.76 [700de1a5] ZygoteRules v0.2.7 [dad2f222] LLVMExtra_jll v0.0.35+0 [efe28fd5] OpenSpecFun_jll v0.5.6+0 [f50d1b31] Rmath_jll v0.5.1+0 [0dad84c5] ArgTools v1.1.2 [56f22d72] Artifacts v1.11.0 [2a0f44e3] Base64 v1.11.0 [8bf52ea8] CRC32c v1.11.0 [ade2ca70] Dates v1.11.0 [8ba89e20] Distributed v1.11.0 [f43a241f] Downloads v1.6.0 [7b1f6079] FileWatching v1.11.0 [9fa8497b] Future v1.11.0 [b77e0a4c] InteractiveUtils v1.11.0 [4af54fe1] LazyArtifacts v1.11.0 [b27032c2] LibCURL v0.6.4 [76f85450] LibGit2 v1.11.0 [8f399da3] Libdl v1.11.0 [37e2e46d] LinearAlgebra v1.11.0 [56ddb016] Logging v1.11.0 [d6f4376e] Markdown v1.11.0 [a63ad114] Mmap v1.11.0 [ca575930] NetworkOptions v1.2.0 [44cfe95a] Pkg v1.11.0 [de0858da] Printf v1.11.0 [9a3f8284] Random v1.11.0 [ea8e919c] SHA v0.7.0 [9e88b42a] Serialization v1.11.0 [1a1011a3] SharedArrays v1.11.0 [6462fe0b] Sockets v1.11.0 [2f01184e] SparseArrays v1.11.0 [4607b0f0] SuiteSparse [fa267f1f] TOML v1.0.3 [a4e569a6] Tar v1.10.0 [8dfed614] Test v1.11.0 [cf7118a7] UUIDs v1.11.0 [4ec0a83e] Unicode v1.11.0 [e66e0078] CompilerSupportLibraries_jll v1.1.1+0 [deac9b47] LibCURL_jll v8.6.0+0 [e37daf67] LibGit2_jll v1.7.2+0 [29816b5a] LibSSH2_jll v1.11.0+1 [c8ffd9c3] MbedTLS_jll v2.28.6+0 [14a3606d] MozillaCACerts_jll v2023.12.12 [4536629a] OpenBLAS_jll v0.3.27+1 [05823500] OpenLibm_jll v0.8.5+0 [bea87d4a] SuiteSparse_jll v7.7.0+0 [83775a58] Zlib_jll v1.2.13+1 [8e850b90] libblastrampoline_jll v5.11.0+0 [8e850ede] nghttp2_jll v1.59.0+0 [3f19e933] p7zip_jll v17.4.0+2 Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. Testing Running tests... Precompiling POMDPModels... 6402.1 ms ✓ Compose 20829.9 ms ✓ POMDPModels 2 dependencies successfully precompiled in 33 seconds. 114 already precompiled. 500 / 10000 eps 0.901 | avgR -0.001 | Loss 7.418e-02 | Grad 9.045e-02 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.177 | Loss 8.516e-02 | Grad 5.236e-02 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.241 | Loss 3.363e-02 | Grad 4.933e-02 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.553 | Loss 2.811e-02 | Grad 3.963e-02 | EvalR -Inf Evaluation ... Avg Reward 2.00 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.505 | Loss 2.485e-02 | Grad 4.884e-02 | EvalR 2.000 3000 / 10000 eps 0.406 | avgR 0.985 | Loss 1.784e-02 | Grad 5.140e-02 | EvalR 2.000 3500 / 10000 eps 0.307 | avgR 1.193 | Loss 1.305e-02 | Grad 1.445e-02 | EvalR 2.000 4000 / 10000 eps 0.208 | avgR 1.472 | Loss 3.622e-02 | Grad 6.899e-02 | EvalR 2.000 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 1.377 | Loss 6.950e-03 | Grad 3.648e-02 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 1.961 | Loss 4.442e-02 | Grad 2.751e-02 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.048 | Loss 6.467e-04 | Grad 9.072e-03 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 2.078 | Loss 7.904e-06 | Grad 2.474e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 1.845 | Loss 3.623e-04 | Grad 1.265e-02 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.031 | Loss 1.488e-03 | Grad 1.572e-02 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.075 | Loss 7.097e-04 | Grad 6.582e-03 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.054 | Loss 2.055e-06 | Grad 5.702e-04 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.047 | Loss 1.494e-04 | Grad 2.688e-03 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.077 | Loss 8.946e-05 | Grad 2.248e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.076 | Loss 2.562e-06 | Grad 6.147e-04 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.076 | Loss 2.140e-06 | Grad 9.866e-04 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time vanilla DQN | 2 2 2m51.2s 500 / 10000 eps 0.901 | avgR -0.029 | Loss 3.960e-02 | Grad 7.405e-02 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.102 | Loss 3.929e-02 | Grad 2.780e-02 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.470 | Loss 2.473e-02 | Grad 3.728e-02 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.588 | Loss 2.197e-02 | Grad 4.640e-02 | EvalR -Inf Evaluation ... Avg Reward 2.00 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.854 | Loss 5.706e-02 | Grad 1.212e-01 | EvalR 2.000 3000 / 10000 eps 0.406 | avgR 0.985 | Loss 1.326e-02 | Grad 3.517e-02 | EvalR 2.000 3500 / 10000 eps 0.307 | avgR 1.249 | Loss 3.265e-03 | Grad 2.011e-02 | EvalR 2.000 4000 / 10000 eps 0.208 | avgR 1.394 | Loss 1.118e-02 | Grad 7.234e-02 | EvalR 2.000 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 0.575 | Loss 5.980e-02 | Grad 5.101e-02 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 1.790 | Loss 6.331e-02 | Grad 3.773e-02 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.047 | Loss 2.518e-03 | Grad 1.522e-02 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 2.058 | Loss 9.965e-04 | Grad 2.212e-02 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.046 | Loss 4.916e-06 | Grad 9.225e-04 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.050 | Loss 1.268e-03 | Grad 1.982e-02 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.068 | Loss 1.600e-04 | Grad 8.429e-03 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.079 | Loss 2.913e-05 | Grad 1.524e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.057 | Loss 3.345e-06 | Grad 1.367e-03 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.058 | Loss 4.569e-05 | Grad 4.637e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.055 | Loss 3.858e-06 | Grad 1.578e-03 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.055 | Loss 7.535e-05 | Grad 2.232e-03 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time double Q DQN | 1 1 6.6s 500 / 10000 eps 0.901 | avgR -0.135 | Loss 9.973e-02 | Grad 7.339e-02 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.072 | Loss 1.066e-01 | Grad 9.531e-02 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.354 | Loss 2.052e-02 | Grad 1.346e-01 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.689 | Loss 2.085e-02 | Grad 2.960e-02 | EvalR -Inf Evaluation ... Avg Reward 2.00 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.262 | Loss 2.363e-02 | Grad 1.648e-02 | EvalR 2.000 3000 / 10000 eps 0.406 | avgR 1.017 | Loss 6.526e-02 | Grad 4.486e-02 | EvalR 2.000 3500 / 10000 eps 0.307 | avgR 0.661 | Loss 2.148e-02 | Grad 1.425e-01 | EvalR 2.000 4000 / 10000 eps 0.208 | avgR 1.588 | Loss 9.481e-02 | Grad 6.224e-02 | EvalR 2.000 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 1.813 | Loss 1.909e-02 | Grad 2.989e-02 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 2.005 | Loss 2.848e-04 | Grad 8.020e-03 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.074 | Loss 3.339e-05 | Grad 3.163e-03 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 2.058 | Loss 3.957e-05 | Grad 4.274e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.077 | Loss 2.744e-06 | Grad 1.633e-03 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.068 | Loss 5.152e-05 | Grad 6.219e-03 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.036 | Loss 5.032e-05 | Grad 3.266e-03 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.044 | Loss 1.809e-04 | Grad 1.197e-02 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.076 | Loss 1.384e-04 | Grad 5.471e-03 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.060 | Loss 3.189e-06 | Grad 2.263e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.029 | Loss 9.793e-06 | Grad 2.356e-03 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.070 | Loss 4.190e-05 | Grad 5.168e-03 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time dueling DQN | 1 1 46.3s 500 / 10000 eps 0.901 | avgR 0.009 | Loss 1.334e-01 | Grad 1.650e-01 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.105 | Loss 3.002e-02 | Grad 1.995e-02 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.435 | Loss 2.069e-02 | Grad 4.219e-02 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.564 | Loss 1.464e-02 | Grad 1.283e-02 | EvalR -Inf Evaluation ... Avg Reward 1.90 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.727 | Loss 3.614e-02 | Grad 6.802e-02 | EvalR 1.900 3000 / 10000 eps 0.406 | avgR 0.975 | Loss 4.116e-02 | Grad 1.370e-01 | EvalR 1.900 3500 / 10000 eps 0.307 | avgR 1.199 | Loss 9.364e-03 | Grad 2.574e-02 | EvalR 1.900 4000 / 10000 eps 0.208 | avgR 1.338 | Loss 1.907e-02 | Grad 1.809e-02 | EvalR 1.900 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 1.808 | Loss 5.820e-03 | Grad 2.542e-02 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 1.951 | Loss 4.262e-03 | Grad 1.617e-02 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.072 | Loss 4.240e-04 | Grad 1.827e-02 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 1.877 | Loss 6.181e-03 | Grad 1.030e-02 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.051 | Loss 1.008e-02 | Grad 9.425e-03 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.010 | Loss 3.438e-04 | Grad 1.301e-02 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.070 | Loss 5.226e-05 | Grad 1.905e-03 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.028 | Loss 2.939e-04 | Grad 4.258e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.049 | Loss 9.601e-05 | Grad 4.591e-03 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.075 | Loss 9.902e-05 | Grad 3.615e-03 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.046 | Loss 1.777e-04 | Grad 9.077e-03 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.077 | Loss 9.857e-04 | Grad 1.261e-03 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time Prioritized DDQN | 1 1 5.9s 500 / 10000 eps 0.901 | avgR 0.092 | Loss 1.474e-03 | Grad 1.169e-03 | EvalR -Inf 1000 / 10000 eps 0.802 | avgR 0.130 | Loss 6.859e-04 | Grad 1.462e-03 | EvalR -Inf 1500 / 10000 eps 0.703 | avgR 0.334 | Loss 6.310e-04 | Grad 7.830e-04 | EvalR -Inf 2000 / 10000 eps 0.604 | avgR 0.406 | Loss 8.951e-04 | Grad 1.855e-03 | EvalR -Inf Evaluation ... Avg Reward 2.10 | Avg Step 5.00 2500 / 10000 eps 0.505 | avgR 0.733 | Loss 1.719e-04 | Grad 2.936e-04 | EvalR 2.100 3000 / 10000 eps 0.406 | avgR 0.896 | Loss 1.049e-03 | Grad 1.755e-03 | EvalR 2.100 3500 / 10000 eps 0.307 | avgR 1.104 | Loss 4.911e-04 | Grad 8.762e-04 | EvalR 2.100 4000 / 10000 eps 0.208 | avgR 1.170 | Loss 2.652e-04 | Grad 1.226e-03 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 4500 / 10000 eps 0.109 | avgR 1.637 | Loss 5.639e-04 | Grad 8.385e-04 | EvalR 2.100 5000 / 10000 eps 0.010 | avgR 1.971 | Loss 2.598e-04 | Grad 3.134e-04 | EvalR 2.100 5500 / 10000 eps 0.010 | avgR 2.048 | Loss 1.248e-04 | Grad 3.742e-04 | EvalR 2.100 6000 / 10000 eps 0.010 | avgR 2.073 | Loss 2.100e-04 | Grad 7.388e-04 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 Saving new model with eval reward 2.100 6500 / 10000 eps 0.010 | avgR 2.049 | Loss 1.675e-04 | Grad 7.924e-04 | EvalR 2.100 7000 / 10000 eps 0.010 | avgR 2.050 | Loss 1.796e-04 | Grad 1.203e-03 | EvalR 2.100 7500 / 10000 eps 0.010 | avgR 2.037 | Loss 6.206e-05 | Grad 3.970e-04 | EvalR 2.100 8000 / 10000 eps 0.010 | avgR 2.079 | Loss 2.688e-05 | Grad 2.486e-04 | EvalR 2.100 Evaluation ... Avg Reward 2.10 | Avg Step 5.00 8500 / 10000 eps 0.010 | avgR 2.051 | Loss 2.111e-06 | Grad 1.664e-04 | EvalR 2.100 9000 / 10000 eps 0.010 | avgR 2.071 | Loss 4.464e-07 | Grad 1.482e-04 | EvalR 2.100 9500 / 10000 eps 0.010 | avgR 2.075 | Loss 1.080e-06 | Grad 7.032e-05 | EvalR 2.100 10000 / 10000 eps 0.010 | avgR 2.055 | Loss 1.266e-06 | Grad 6.352e-05 | EvalR 2.100 Restore model with eval reward 2.100 Test Summary: | Pass Total Time TestMDP DRQN | 1 1 2m37.6s 500 / 10000 eps 0.901 | avgR 0.067 | Loss 6.218e-02 | Grad 4.526e-02 | EvalR -Inf Evaluation ... Avg Reward 3.47 | Avg Step 61.09 1000 / 10000 eps 0.802 | avgR -0.333 | Loss 9.368e-02 | Grad 7.466e-03 | EvalR 3.470 Evaluation ... Avg Reward -2.47 | Avg Step 40.09 1500 / 10000 eps 0.703 | avgR 0.410 | Loss 2.766e-02 | Grad 4.668e-03 | EvalR -2.470 Evaluation ... Avg Reward 2.57 | Avg Step 54.35 2000 / 10000 eps 0.604 | avgR 0.611 | Loss 6.230e-02 | Grad 6.823e-03 | EvalR 2.570 Evaluation ... Avg Reward 3.81 | Avg Step 36.02 2500 / 10000 eps 0.505 | avgR 1.527 | Loss 2.172e-02 | Grad 6.205e-03 | EvalR 3.810 Evaluation ... Avg Reward 4.06 | Avg Step 44.72 3000 / 10000 eps 0.406 | avgR 1.432 | Loss 5.789e-02 | Grad 3.071e-03 | EvalR 4.060 Evaluation ... Avg Reward 1.28 | Avg Step 56.99 Saving new model with eval reward 1.280 3500 / 10000 eps 0.307 | avgR 1.816 | Loss 9.177e-02 | Grad 9.783e-03 | EvalR 1.280 Evaluation ... Avg Reward 4.29 | Avg Step 43.90 4000 / 10000 eps 0.208 | avgR 2.157 | Loss 5.079e-02 | Grad 6.382e-03 | EvalR 4.290 Evaluation ... Avg Reward 3.90 | Avg Step 43.25 4500 / 10000 eps 0.109 | avgR 2.275 | Loss 3.248e-02 | Grad 7.243e-03 | EvalR 3.900 Evaluation ... Avg Reward 3.22 | Avg Step 46.90 5000 / 10000 eps 0.010 | avgR 2.441 | Loss 1.089e-01 | Grad 5.396e-02 | EvalR 3.220 Evaluation ... Avg Reward 4.59 | Avg Step 49.25 5500 / 10000 eps 0.010 | avgR 2.618 | Loss 8.642e-02 | Grad 1.164e-02 | EvalR 4.590 Evaluation ... Avg Reward 3.59 | Avg Step 40.85 6000 / 10000 eps 0.010 | avgR 2.490 | Loss 2.606e-02 | Grad 1.686e-02 | EvalR 3.590 Evaluation ... Avg Reward 3.59 | Avg Step 55.80 Saving new model with eval reward 3.590 6500 / 10000 eps 0.010 | avgR 2.755 | Loss 7.095e-02 | Grad 1.323e-02 | EvalR 3.590 Evaluation ... Avg Reward 3.95 | Avg Step 47.49 7000 / 10000 eps 0.010 | avgR 2.265 | Loss 1.680e-01 | Grad 1.338e-02 | EvalR 3.950 Evaluation ... Avg Reward 3.66 | Avg Step 53.83 7500 / 10000 eps 0.010 | avgR 2.549 | Loss 2.872e-02 | Grad 4.002e-02 | EvalR 3.660 Evaluation ... Avg Reward 4.52 | Avg Step 47.85 8000 / 10000 eps 0.010 | avgR 2.863 | Loss 5.626e-02 | Grad 6.115e-03 | EvalR 4.520 Evaluation ... Avg Reward 6.06 | Avg Step 41.42 8500 / 10000 eps 0.010 | avgR 2.716 | Loss 8.934e-02 | Grad 1.018e-02 | EvalR 6.060 Evaluation ... Avg Reward 5.59 | Avg Step 53.52 9000 / 10000 eps 0.010 | avgR 3.343 | Loss 2.660e-02 | Grad 1.469e-02 | EvalR 5.590 Evaluation ... Avg Reward 6.55 | Avg Step 34.18 Saving new model with eval reward 6.550 9500 / 10000 eps 0.010 | avgR 3.412 | Loss 9.244e-03 | Grad 2.433e-02 | EvalR 6.550 Evaluation ... Avg Reward 5.30 | Avg Step 47.58 10000 / 10000 eps 0.010 | avgR 3.725 | Loss 5.098e-02 | Grad 6.172e-03 | EvalR 5.300 Restore model with eval reward 6.550 Test Summary: | Pass Total Time GridWorld DDRQN | 1 1 1m05.1s 500 / 10000 eps 0.901 | avgR -24.785 | Loss 8.155e-03 | Grad 1.009e-02 | EvalR -Inf Evaluation ... Avg Reward 1.01 | Avg Step 101.00 1000 / 10000 eps 0.802 | avgR -25.159 | Loss 1.681e-02 | Grad 1.288e-02 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 1500 / 10000 eps 0.703 | avgR -23.507 | Loss 2.821e-03 | Grad 1.128e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 2000 / 10000 eps 0.604 | avgR -22.784 | Loss 1.088e-02 | Grad 2.641e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 2500 / 10000 eps 0.505 | avgR -21.343 | Loss 4.022e-03 | Grad 4.390e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 3000 / 10000 eps 0.406 | avgR -19.978 | Loss 1.000e-02 | Grad 1.656e-02 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 Saving new model with eval reward 1.010 3500 / 10000 eps 0.307 | avgR -18.654 | Loss 1.140e-02 | Grad 4.508e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 4000 / 10000 eps 0.208 | avgR -17.365 | Loss 3.913e-03 | Grad 5.815e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 4500 / 10000 eps 0.109 | avgR -15.899 | Loss 4.547e-03 | Grad 2.708e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 5000 / 10000 eps 0.010 | avgR -14.406 | Loss 8.237e-03 | Grad 9.793e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 5500 / 10000 eps 0.010 | avgR -13.027 | Loss 1.487e-02 | Grad 7.135e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 6000 / 10000 eps 0.010 | avgR -11.909 | Loss 5.312e-03 | Grad 3.140e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 Saving new model with eval reward 1.010 6500 / 10000 eps 0.010 | avgR -10.930 | Loss 3.772e-03 | Grad 9.426e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 7000 / 10000 eps 0.010 | avgR -10.101 | Loss 8.016e-03 | Grad 5.970e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 7500 / 10000 eps 0.010 | avgR -9.409 | Loss 6.667e-03 | Grad 5.933e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 8000 / 10000 eps 0.010 | avgR -8.777 | Loss 6.796e-03 | Grad 1.313e-02 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 8500 / 10000 eps 0.010 | avgR -8.229 | Loss 5.708e-03 | Grad 1.064e-02 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 9000 / 10000 eps 0.010 | avgR -7.733 | Loss 4.396e-03 | Grad 5.161e-03 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 Saving new model with eval reward 1.010 9500 / 10000 eps 0.010 | avgR -7.299 | Loss 5.992e-03 | Grad 1.106e-02 | EvalR 1.010 Evaluation ... Avg Reward 1.01 | Avg Step 101.00 10000 / 10000 eps 0.010 | avgR -6.908 | Loss 5.871e-03 | Grad 5.920e-03 | EvalR 1.010 Restore model with eval reward 1.010 Test Summary: | Pass Total Time TigerPOMDP DDRQN | 1 1 28.6s Test Summary: | Pass Total Time Static Array Env | 1 1 6.1s here Test Summary: | Pass Total Time Common RL Env | 1 1 4.1s 500 / 10000 eps 0.901 | avgR -2.714 | Loss 1.351e+00 | Grad 9.667e-01 | EvalR -Inf Evaluation ... Avg Reward -0.37 | Avg Step 67.43 1000 / 10000 eps 0.802 | avgR -0.778 | Loss 6.873e-01 | Grad 5.626e-01 | EvalR -0.370 Evaluation ... Avg Reward -2.46 | Avg Step 54.69 1500 / 10000 eps 0.703 | avgR -1.048 | Loss 8.471e-01 | Grad 7.711e-01 | EvalR -2.460 Evaluation ... Avg Reward -0.05 | Avg Step 78.02 2000 / 10000 eps 0.604 | avgR -0.039 | Loss 5.714e-01 | Grad 1.072e+00 | EvalR -0.050 Evaluation ... Avg Reward 0.20 | Avg Step 85.88 2500 / 10000 eps 0.505 | avgR 0.328 | Loss 5.354e-01 | Grad 5.854e-01 | EvalR 0.200 Evaluation ... Avg Reward -0.40 | Avg Step 65.73 3000 / 10000 eps 0.406 | avgR 0.423 | Loss 7.217e-01 | Grad 2.365e-01 | EvalR -0.400 Evaluation ... Avg Reward 1.30 | Avg Step 74.52 Saving new model with eval reward 1.300 3500 / 10000 eps 0.307 | avgR 0.423 | Loss 6.663e-01 | Grad 3.810e-01 | EvalR 1.300 Evaluation ... Avg Reward 3.47 | Avg Step 53.62 4000 / 10000 eps 0.208 | avgR 0.451 | Loss 7.987e-01 | Grad 1.826e-01 | EvalR 3.470 Evaluation ... Avg Reward 1.07 | Avg Step 59.74 4500 / 10000 eps 0.109 | avgR 0.735 | Loss 6.002e-01 | Grad 3.061e-01 | EvalR 1.070 Evaluation ... Avg Reward 4.20 | Avg Step 48.96 5000 / 10000 eps 0.010 | avgR 1.275 | Loss 3.696e-01 | Grad 7.500e-01 | EvalR 4.200 Evaluation ... Avg Reward 0.10 | Avg Step 58.45 5500 / 10000 eps 0.010 | avgR 1.637 | Loss 4.279e-01 | Grad 4.859e-01 | EvalR 0.100 Evaluation ... Avg Reward 0.41 | Avg Step 59.57 6000 / 10000 eps 0.010 | avgR 1.539 | Loss 3.400e-01 | Grad 6.315e-01 | EvalR 0.410 Evaluation ... Avg Reward 0.63 | Avg Step 72.49 6500 / 10000 eps 0.010 | avgR 1.725 | Loss 1.918e-01 | Grad 2.185e-01 | EvalR 0.630 Evaluation ... Avg Reward 5.86 | Avg Step 31.20 7000 / 10000 eps 0.010 | avgR 1.510 | Loss 5.539e-01 | Grad 1.613e-01 | EvalR 5.860 Evaluation ... Avg Reward 2.54 | Avg Step 68.09 7500 / 10000 eps 0.010 | avgR 1.784 | Loss 8.510e-01 | Grad 7.759e-02 | EvalR 2.540 Evaluation ... Avg Reward 0.41 | Avg Step 74.58 8000 / 10000 eps 0.010 | avgR 1.520 | Loss 6.058e-01 | Grad 3.073e-01 | EvalR 0.410 Evaluation ... Avg Reward 5.27 | Avg Step 19.24 8500 / 10000 eps 0.010 | avgR 1.980 | Loss 8.568e-01 | Grad 6.558e-02 | EvalR 5.270 Evaluation ... Avg Reward 3.03 | Avg Step 49.34 9000 / 10000 eps 0.010 | avgR 2.480 | Loss 5.606e-01 | Grad 3.068e-01 | EvalR 3.030 Evaluation ... Avg Reward 2.06 | Avg Step 70.66 Saving new model with eval reward 2.060 9500 / 10000 eps 0.010 | avgR 1.990 | Loss 4.283e-01 | Grad 1.182e-01 | EvalR 2.060 Evaluation ... Avg Reward -1.28 | Avg Step 62.57 10000 / 10000 eps 0.010 | avgR 2.069 | Loss 4.761e-01 | Grad 4.951e-01 | EvalR -1.280 Restore model with eval reward 2.060 Total discounted reward for 1 simulation: 0.0 500 / 10000 eps 0.901 | avgR -2.714 | Loss 1.351e+00 | Grad 9.667e-01 | EvalR -Inf Evaluation ... Avg Reward -0.37 | Avg Step 67.43 1000 / 10000 eps 0.802 | avgR -0.778 | Loss 6.873e-01 | Grad 5.626e-01 | EvalR -0.370 Evaluation ... Avg Reward -2.46 | Avg Step 54.69 1500 / 10000 eps 0.703 | avgR -1.048 | Loss 8.471e-01 | Grad 7.711e-01 | EvalR -2.460 Evaluation ... Avg Reward -0.05 | Avg Step 78.02 2000 / 10000 eps 0.604 | avgR -0.039 | Loss 5.714e-01 | Grad 1.072e+00 | EvalR -0.050 Evaluation ... Avg Reward 0.20 | Avg Step 85.88 2500 / 10000 eps 0.505 | avgR 0.328 | Loss 5.354e-01 | Grad 5.854e-01 | EvalR 0.200 Evaluation ... Avg Reward -0.40 | Avg Step 65.73 3000 / 10000 eps 0.406 | avgR 0.423 | Loss 7.217e-01 | Grad 2.365e-01 | EvalR -0.400 Evaluation ... Avg Reward 1.30 | Avg Step 74.52 Saving new model with eval reward 1.300 3500 / 10000 eps 0.307 | avgR 0.423 | Loss 6.663e-01 | Grad 3.810e-01 | EvalR 1.300 Evaluation ... Avg Reward 3.47 | Avg Step 53.62 4000 / 10000 eps 0.208 | avgR 0.451 | Loss 7.987e-01 | Grad 1.826e-01 | EvalR 3.470 Evaluation ... Avg Reward 1.07 | Avg Step 59.74 4500 / 10000 eps 0.109 | avgR 0.735 | Loss 6.002e-01 | Grad 3.061e-01 | EvalR 1.070 Evaluation ... Avg Reward 4.20 | Avg Step 48.96 5000 / 10000 eps 0.010 | avgR 1.275 | Loss 3.696e-01 | Grad 7.500e-01 | EvalR 4.200 Evaluation ... Avg Reward 0.10 | Avg Step 58.45 5500 / 10000 eps 0.010 | avgR 1.637 | Loss 4.279e-01 | Grad 4.859e-01 | EvalR 0.100 Evaluation ... Avg Reward 0.41 | Avg Step 59.57 6000 / 10000 eps 0.010 | avgR 1.539 | Loss 3.400e-01 | Grad 6.315e-01 | EvalR 0.410 Evaluation ... Avg Reward 0.63 | Avg Step 72.49 6500 / 10000 eps 0.010 | avgR 1.725 | Loss 1.918e-01 | Grad 2.185e-01 | EvalR 0.630 Evaluation ... Avg Reward 5.86 | Avg Step 31.20 7000 / 10000 eps 0.010 | avgR 1.510 | Loss 5.539e-01 | Grad 1.613e-01 | EvalR 5.860 Evaluation ... Avg Reward 2.54 | Avg Step 68.09 7500 / 10000 eps 0.010 | avgR 1.784 | Loss 8.510e-01 | Grad 7.759e-02 | EvalR 2.540 Evaluation ... Avg Reward 0.41 | Avg Step 74.58 8000 / 10000 eps 0.010 | avgR 1.520 | Loss 6.058e-01 | Grad 3.073e-01 | EvalR 0.410 Evaluation ... Avg Reward 5.27 | Avg Step 19.24 8500 / 10000 eps 0.010 | avgR 1.980 | Loss 8.568e-01 | Grad 6.558e-02 | EvalR 5.270 Evaluation ... Avg Reward 3.03 | Avg Step 49.34 9000 / 10000 eps 0.010 | avgR 2.480 | Loss 5.606e-01 | Grad 3.068e-01 | EvalR 3.030 Evaluation ... Avg Reward 2.06 | Avg Step 70.66 Saving new model with eval reward 2.060 9500 / 10000 eps 0.010 | avgR 1.990 | Loss 4.283e-01 | Grad 1.182e-01 | EvalR 2.060 Evaluation ... Avg Reward -1.28 | Avg Step 62.57 10000 / 10000 eps 0.010 | avgR 2.069 | Loss 4.761e-01 | Grad 4.951e-01 | EvalR -1.280 Restore model with eval reward 2.060 Test Summary: | Total Time README Examples | 0 10.5s Testing DeepQLearning tests passed Testing completed after 580.01s PkgEval succeeded after 1179.64s