Evaluating Large Language Models Using “Counterfactual Tasks”#Performance#Reasoning#Large Language Models#Blog·aiguide.substack.com·May 14, 2024Evaluating Large Language Models Using “Counterfactual Tasks”