Reasoning
Collection
156 items • Updated • 31
Investigating prompt engineering and calibration on smaller language models reveals mixed results, with joint effects generally negative across multiple commonsense reasoning tasks.
Prompt engineering and calibration make large language models excel at reasoning tasks, including multiple choice commonsense reasoning. From a practical perspective, we investigate and evaluate these strategies on smaller language models. Through experiments on five commonsense reasoning benchmarks, we find that each strategy favors certain models, but their joint effects are mostly negative.
Get this paper in your agent:
hf papers read 2304.06962 curl -LsSf https://hf.co/cli/install.sh | bash No model linking this paper
No dataset linking this paper
No Space linking this paper