Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
2025-07-11 11:32:33

榪欐槸涓€閮ㄤ細璁╀漢蹇借鐨勪綔鍝侊紝濡傛灉涓嶆槸鍥犱負閭撹秴鍜岃懀媧佸嚭婕斾簡鐢卞畠鏀圭紪鐨勭數瑙嗗墽銆婄溮鐖卞嶮騫淬€嬶紝瀹冩亹鎬曚細璺熷ぇ澶氭暟緗戞枃涓€鏍鳳紝娣規病鍦ㄦ旦鐎氱殑鏂囨搗涔嬩腑銆?

鐜嬪浜哄湪涓囧鏉戞槸灝忛棬灝忔埛锛屾病鏈変翰鎴氭湅鍙嬫拺鑵幫紝鎯寵鍦ㄤ竾瀹舵潙鐢熷瓨锛屽彧鑳借濂戒竾鍠勫爞鍜屼竾浼犲锛岃鐜嬪弸寰風戶緇湁媧誨共锛屽コ鍎胯兘緇х畫鍦ㄤ竾瀹墮泦鍥㈠伐浣滐紝涓や釜鍎垮瓙浠ュ悗鐨勫伐浣滃彲鑳戒篃閮借浠頒粭涓囧銆?鏋楁鏋濆湪濠氱ぜ涓婏紝闈炶璿峰埌涓囧杽鍫傚埌鍦烘墠寮€濮嬩華寮忥紝闈炶璁╀竴瀵規柊浜哄彥鎷滀竾鍠勫爞锛岄兘鏄負浜嗙粰鍦ㄥ満鐨勪竾瀹舵潙鐨勬潙姘戠湅鐨勶紝鏈変簡涓囧杽鍫傜殑璁ゅ彲锛屼粬浠墠鑳芥洿濂藉湪榪欓噷鐢熸椿锛屼笉琚帓鎸ゃ€?

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

濠氶椆浜嬩歡鍚庯紝鏋楁鏋濊刀緔у甫鐫€紺肩墿鍘繪眰鎯咃紝璁╁コ鍎垮幓琛ュ彂鍠滅硸锛岃鍧婁滑榪炴嬁鍠滅硸閮借闂竾涔﹁鐨勬€佸害锛屾湁娌℃湁鍘熻皡浠栦滑锛屽師鏈凡緇忔嬁浜嗕竴鍖呯硸锛屽惉鍒頒綍騫哥鍜岀帇搴嗘潵鐨勪簤鎵э紝絝嬮┈鏄庣櫧鍙堥椆鍍典簡锛岃繛蹇欎涪涓嬬硸锛岀敓鎬曡鐗佃繛锛屼篃鏄鐜板疄鐨勩€?鍙锛屽湪涓囧鏉戯紝涓€鍒囬兘浠ヤ竾鍠勫爞涓洪鍚戞爣锛屼竾鍠勫爞灝辨槸涓囧鏉戠殑絏烇紝浜轟漢閮借絏嫓銆?浣曞垢紱忓師鏈兂瑕佹伅浜嬪畞浜猴紝濂圭牳浜嗕竾浼犲锛屾効鎰忓幓閬撴瓑锛屼絾鏄竾浼犲鍋氶敊浜嬪湪鍏堬紝涔熷簲璿ョ粰濡瑰浣曞垢榪愰亾姝夈€?

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

浣曞垢紱忕殑璿夋眰鏄甯哥殑锛屼絾瑕佸熀浜庡弻鏂瑰鉤絳夌殑鏉′歡锛屽綋瀵規柟鏄竴鏂瑰己鏉冿紝鏄綋鍦扮殑絏烇紝鍙湁鍒漢鏁話鍜屾崸鐫€鐨勪喚锛屽嚭鍦洪兘鏄€滅粰闈㈠瓙鈥濓紝鎰挎剰闂瑰枩涔熸槸鈥滅粰闈㈠瓙鈥濓紝鍦ㄤ粬浠績閲岋紝浠栦滑鏃╁氨涓嶆槸涓€鑸漢銆?涓囧杽鍫備笉鎰挎剰鍋氫笉璁查亾鐞嗙殑瀹橈紝涓嶆効鎰忔垚涓轟竴璦€鍫傦紝浣嗕粬鍦ㄦ墍鏈変漢鐨勬瓕鍔熼寰蜂腑锛屽凡緇忔棤褰腑鎴愪負榪欐牱鐨勪漢銆?

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

涓囧杽鍫傞樆姝㈠効瀛愪竾浼犲鎭舵剰鎶ュ锛屾壒璿勪粬鏈夐敊鍦ㄥ厛锛屼絾鏄湪浣曞垢紱忔潵瑕佹眰閬撴瓑鏃訛紝浠栬繕鏄嫆緇濅簡锛屽湪浠栫湅鏉ワ紝鑴擱潰緇堢┒姣旀涔夊叕騫蟲洿閲嶈銆?

浣曞垢紱忓緱涓嶅埌鍏鉤瀵瑰緟锛岃濠嗗閫肩潃閬撴瓑锛屽﹩濠嗘嬁鍑?000鍏冩伅浜嬪畞浜猴紝鍗存垚涓轟簡涓€涓瑧璿濄€?鈥濃€︹€︹€︹€︹€︹€︹€︹€︿笌鍓嶇敺鍙嬩滑鍒嗘墜鏃剁殑鐢婚潰涓嶆柇鍦版誕鐜板湪鐜嬩匠鐨勮剳嫻烽噷锛屾垚涓烘尌涔嬩笉鍘葷殑璁板繂銆?

鐜嬩匠鑷鑷崇粓閮借寰楄嚜宸辨病浠€涔堝ぇ闂锛岃闀跨溮涔熺畻涓笂姘村鉤锛岃宸ヤ綔涔熻繕綆椾綋闈紝瀹跺涵鏉′歡涔熶笉閿欍€?濂逛笉鏄庣櫧涓轟粈涔堣嚜宸辯殑鍓嶇敺鍙嬩滑鎺ヤ簩榪炰笁鍦拌窡鑷繁鎻愬垎鎵嬶紝涔熶笉鏄庣櫧涓轟綍鍦ㄥ彴涓婂媷鏁㈢ず鐖卞嵈灞″薄琚嫆銆?

濂瑰垎鏋愪簡璁稿鍘熷洜锛屽嵈鍞嫭婕忔帀浜嗚嚜宸辯殑鎬ф牸闂銆?鍦ㄧ帇浣崇殑娼滄剰璿嗛噷鐖辨儏鍜屽濮諱技涔庡彧鏄棬褰撴埛瀵瑰氨鍙互锛屽ス浠庢潵娌℃兂榪囧彧鏈夋€ф牸鍚堟媿锛屼袱涓漢鎵嶈兘璧板緱闀胯繙銆?

(作者:古交外圍)