This is the fourth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The third post can be found here. Since SFT is the cause for many safety relevant properties, a natural strategy is to filter out rollout
ā”
Key Insights
10 editorial insights.
AiFeed24 TeamĀ·ā± 1 min readĀ·News
Deep Analysis
Multi-Source Intelligence
Tags:#ai
Found this useful? Share it!



