Deep Qwen's DPO Execution Flaw: What Went Wrong?
I ran the code-lab provided for the lecture “DPO in Practice”. The end-result post DPO is not what demonstrated in the lecture. The model expected to remember it’s identity as Deep Qwen instead of Qwen post DPO, but the model goes ahead and respond with a different identity post DPO for each prompt.






