ملصق غير متابع جنائيا بالنظر إلى مقدار الجودة و / أو الأذواق المثيرة للتفكير التي ينشرها
mike64_t
mike64_t‏18 أكتوبر، 12:33
I think the observation that LLMs are "bad tutors" in that they cannot precisely probe understanding is accurate. The fact that "upweighting the entire rollout" is stupid is also true. However its not obvious to me that the remedy for that is LLM-reflection as to "what went well". I think this runs into very similar issues of collapse-risk or misallocation of supervision. Because while we might be sucking supervision through a straw, the only thing thats even worse is sucking tainted supervision through a straw.
ليس الأمر كما لو أن مايك هو ملصق متخصص أو أي شيء ، لكنني أفكر فقط في عدد حسابات القمامة التي تحتوي على 2-10 أضعاف
‏‎30.78‏K