Signed in as:
filler@godaddy.com
Signed in as:
filler@godaddy.com

A directory of practical and usable AI agents resources from applications and platforms to frameworks and utilities and other parts of the growing ecosystem

Most reasoning benchmarks for LLMs emphasize factual accuracy or step-by-step logic. In finance, however, professionals must not only converge on optimal decisions but also generate creative, plausible futures under uncertainty. We introduce ConDiFi, a benchmark that jointly evaluates divergent and convergent thinking in LLMs for financial tasks.
ConDiFi features 607 macro-financial prompts for divergent reasoning and 990 multi-hop adversarial MCQs for convergent reasoning. Using this benchmark, we evaluated 14 leading models and uncovered striking differences. Despite high fluency, GPT-4o underperforms on Novelty and Actionability. In contrast, models like DeepSeek-R1 and Cohere Command R+ rank among the top for generating actionable, insights suitable for investment decisions. ConDiFi provides a new perspective to assess reasoning capabilities essential to safe and strategic deployment of LLMs in finance.
Presented at KDD2025: Workshop on Evaluation and Trustworthiness of Agentic and Generative AI Models, Oral Track
(Joint work with the Government Technology Agency of Singapore)
Copyright © 2025 Deep Insight Labs: Collaborative AI Agents for Investment Research and Analysis - All Rights Reserved.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.