Reflection on KPIs, Categories, and Prompt Design

1. Questionable Categories

* Shell Gas is labeled as Utilities, but it more naturally fits a Transportation/Fuel category rather than bills like internet or phone.
* Uber is labeled as Other, though it clearly represents Transportation.
* Spotify is also in Other, but it behaves like a Subscription/Entertainment expense.

Improvement rules:

* If the merchant name contains keywords like Gas, Shell, BP, Chevron, classify as Transportation/Fuel.
* If the merchant is Uber, Lyft, or similar, classify as Transportation.
* If the merchant matches known digital services (Spotify, Netflix, Hulu, etc.), classify as Subscriptions/Entertainment rather than Other.

2. KPI Errors / Inconsistencies
   Re-summing all positive (non-income) transactions gives a total spend of $731.13, which matches total_spend. Dividing by the 13 expense transactions gives an average of about $56.24, so the numeric KPIs are internally consistent.

However, there is a conceptual inconsistency in top_merchants. Based on total spend, the highest merchants are:

* Costco – $185.75
* Apple Store – $120.00
* AT&T – $95.00
* Whole Foods – $84.60

But top_merchants lists Costco, Apple Store, and Whole Foods, omitting AT&T, which actually has a higher spend than Whole Foods. This suggests either that the definition of “top merchants” is unclear (for example, only Shopping/Grocery merchants are allowed), or that the list was created without sorting by spend.

Improvement rules for KPIs:

* Define top_merchants as “top N merchants by total spend across all expense categories,” and always sort by total spend.
* Return a structure like { "merchant": "...", "total_spend": ... } so the logic can be audited.
* Explicitly define average_expense as total_spend / number_of_expense_transactions and consistently round to two decimals.
* Consider adding KPIs like savings rate ((total_income - total_spend) / total_income) and category_totals for better insight.

3. Prompt Modifications for Better KPI Extraction
   To make KPI extraction more robust, the prompt should:

* Instruct the model to:

  * Return total_spend as the sum of all positive, non-income amounts.
  * Return total_income as the sum of all income transactions (even if they appear as negative numbers in the raw data).
  * Compute average_expense using only expense transactions, and state how many transactions were included.
  * Return top_merchants as the top 3 merchants by total spend, sorted from highest to lowest, and include the amount for each.
* Add a validation step in the prompt:

  * “Verify that total_spend approximately equals the sum of all expense amounts. If not, adjust or flag the inconsistency.”
* Clarify sign conventions:

  * “Income may appear as negative amounts in the raw data; normalize it to positive in the KPIs and exclude it from expense averages.”

4. Prompt Modifications for Better Categorization Logic
   For categorization, the prompt should add clear rule-style instructions:

* Use merchant-based rules where possible. For example: gas stations → Transportation/Fuel; ride-sharing apps → Transportation; streaming/music services → Subscriptions/Entertainment.
* Only use the category “Other” when no reasonable mapping is available.
* If a merchant matches multiple possible categories, choose the one that best reflects the purpose of the expense (for example, Shell Gas is fuel, not a utility bill).

5. Model-Driven Reasoning Summary
   By comparing the KPIs back to the raw transactions, we confirm the arithmetic (total spend and average expense) but detect conceptual issues in categorization and in the definition of top_merchants. Introducing clearer, rule-based mappings for merchants and more explicit instructions for computing and validating KPIs should reduce inconsistencies and yield more reliable financial summaries in future runs.
