system_prompt user_prompt_prefix user_prompt_suffix 0_recall 1_recall accuracy missing parameters_update_response
0 Classify into one of the following labels {'0', '1'} Classify the following text <text> </text>
Your response should produce a brief reasoning and end with Label: <label>{{label}}</label>
0.208 0.727 34.000 0.136 Here is my interpretation of the two labels, followed by a set of prompt templates that should steer the model to the correct decisions.

Informed interpretation of the labels
Label “0” – an objective, fact-based information request. These are questions asking for:
• Definitions or explanations of how something works
• Concrete lists or examples of items (algorithms, applications, costs)
• Quantitative or timeline data (“how long...”)
• Step-by-step instructions or “homework” style advice
Label “1” – a subjective, evaluative or classification-style query. These are questions asking for:
• A binary/yes-no decision (“Do you…?”, “Is this…?”, “Could…?”, “Would…?”)
• Personal feelings or opinions (“How do you feel…?”, “Do you think…?”)
• Hypothetical or “what if” reasoning (“How might…?”, “If… then…?”)
• Follow-up “other” requests that build on previous context (“What other…?”)
• Conceptual classification or social/cultural analysis

Invalid reasons for choosing “0” or “1”:
• Do not base your choice solely on the topic domain (e.g. “science” vs “business”).
• Do not treat every wh-word (“What/How/Why”) as automatically objective.
• Always focus on the user’s intent (objective fact retrieval vs subjective evaluation).

Examples
1. Text: “What are some specialized sorting algorithms?”
Reasoning: factual list of algorithms → Label: <label>0</label>
2. Text: “Do you think this experience changed your relationship with your father?”
Reasoning: personal opinion/yes-no → Label: <label>1</label>
3. Text: “How long does it take for muscle and skin cells to die after circulation stops?”
Reasoning: objective timeline data → Label: <label>0</label>
4. Text: “How might this ‘good kid’ syndrome affect Japanese society in the long term?”
Reasoning: hypothetical social analysis → Label: <label>1</label>

And here are the new prompt settings:

<system_prompt>
You are a question-classification assistant. You will receive a single question and must label its intent as either:
• 0 – an objective, fact-based request (definitions, concrete lists, numeric or timeline data, instructions).
• 1 – a subjective or evaluative query (yes/no classification, personal opinions or feelings, hypothetical reasoning, conceptual classification, “other” follow-ups).

Do not choose a label based solely on topic domain or the presence of a wh-word. Focus on the question’s intent. Provide a one-sentence concise reasoning, then output exactly:

Label: <label>X</label>

where X is 0 or 1.
</system_prompt>

<user_prompt_prefix>
Classify the following question:

</user_prompt_prefix>

<user_prompt_suffix>


Provide a concise reasoning, then end with:
Label: <label>{0 or 1}</label>
</user_prompt_suffix>
1
You are a question-classification assistant. You will receive a single question and must label its intent as either:
• 0 – an objective, fact-based request (definitions, concrete lists, numeric or timeline data, instructions).
• 1 – a subjective or evaluative query (yes/no classification, personal opinions or feelings, hypothetical reasoning, conceptual classification, “other” follow-ups).

Do not choose a label based solely on topic domain or the presence of a wh-word. Focus on the question’s intent. Provide a one-sentence concise reasoning, then output exactly:

Label: <label>X</label>

where X is 0 or 1.

Classify the following question:




Provide a concise reasoning, then end with:
Label: <label>{0 or 1}</label>
0.333 0.576 35.000 0.074 Here’s my informed interpretation of the labels:

Label 0 – Objective or instructive requests
• Definitions or factual explanations (including “why” questions requesting reasons)
• Concrete lists or examples (algorithms, names, factors)
• Numeric or timeline data (“how long…”)
• Step-by-step instructions, advice or suggestions (“what could you do…”)
• Requests for possible consequences or effects (“could X be detrimental?”)

Label 1 – Subjective or evaluative queries
• Yes/no or binary decisions (“do you think…?”, “is this…”?)
• Personal opinions or feelings (“how do you feel…?”)
• Hypothetical or “what if” reasoning (“how might…?”, “what if…?”)
• Conceptual classifications or social/cultural analysis
• Follow-up “other” requests that build on previous context (“what other…?”)

Invalid rationales
• Don’t choose based solely on topic domain or presence of a wh-word
• Don’t rely only on keywords—focus on the user’s true intent

Below is a revised prompt set designed to enforce these rules and fix the current misclassifications:

<system_prompt>
You are a question‐classification assistant. You will receive one user question and must decide if it is:
0 – Objective or instructive (definitions, factual “why” explanations, concrete lists, numbers/timelines, instructions/advice, requests for possible consequences).
1 – Subjective or evaluative (yes/no or binary opinion, personal feelings, hypothetical “what if/ how might” reasoning, conceptual classification, follow-up “other” requests).

Do not answer the question or decline—only classify. Base your choice on the user’s intent, not topic or keywords. First give a one-sentence rationale, then output exactly:

Label: <label>X</label>

Examples:
Question: “What are some sorting algorithms?” → factual list → Label: <label>0</label>
Question: “Why doesn’t Omaha Steaks show raw steak photos?” → factual reason → Label: <label>0</label>
Question: “What could you do to help someone express emotions?” → advice/instructions → Label: <label>0</label>
Question: “Do you think this trend is harmful?” → opinion → Label: <label>1</label>
Question: “How might this affect society?” → hypothetical analysis → Label: <label>1</label>
Question: “What other examples exist?” → follow-up “other” → Label: <label>1</label>
</system_prompt>

<user_prompt_prefix>
Classify the following question:

</user_prompt_prefix>

<user_prompt_suffix>

Provide a concise one-sentence reasoning, then on a new line:
Label: <label>{0 or 1}</label>
</user_prompt_suffix>
2
You are a question‐classification assistant. You will receive one user question and must decide if it is:
0 – Objective or instructive (definitions, factual “why” explanations, concrete lists, numbers/timelines, instructions/advice, requests for possible consequences).
1 – Subjective or evaluative (yes/no or binary opinion, personal feelings, hypothetical “what if/ how might” reasoning, conceptual classification, follow-up “other” requests).

Do not answer the question or decline—only classify. Base your choice on the user’s intent, not topic or keywords. First give a one-sentence rationale, then output exactly:

Label: <label>X</label>

Examples:
Question: “What are some sorting algorithms?” → factual list → Label: <label>0</label>
Question: “Why doesn’t Omaha Steaks show raw steak photos?” → factual reason → Label: <label>0</label>
Question: “What could you do to help someone express emotions?” → advice/instructions → Label: <label>0</label>
Question: “Do you think this trend is harmful?” → opinion → Label: <label>1</label>
Question: “How might this affect society?” → hypothetical analysis → Label: <label>1</label>
Question: “What other examples exist?” → follow-up “other” → Label: <label>1</label>

Classify the following question:



Provide a concise one-sentence reasoning, then on a new line:
Label: <label>{0 or 1}</label>
0.604 0.424 43.000 0.000 Here’s my informed interpretation of the labels, followed by a revised prompt set that should steer the model toward the correct classifications—including explicit rules for the patterns that were frequently mis‐labeled.

Your informed interpretation of the labels

Label 0 – Objective or instructive requests
• Definitions, mechanisms, factual “why” explanations
• Concrete lists or examples (names, factors, categories)
• Numeric or timeline data (“how long…”, “how many…”)
• Step-by-step instructions, practical advice or “homework” style guidance
• Factual analysis or explanation that can be answered with agreed–upon information (e.g. “What are your thoughts on X?” when used to prompt factual analysis, “Could X be detrimental?” to list known effects)

Label 1 – Subjective or evaluative queries
• Personal opinions or feelings (“how do you feel…?”)
• Normative or value judgments (“positive or negative…?”, “harmful…?”)
• Hypothetical or speculative reasoning (“what if…?”, “how might…?”, “long-term effects…?”)
• Questions about prevalence or typicality (“is it common…?”, “is this typical…?”)
• Requests for evidence or support (“is there any evidence…?”)
• Follow-up “other” requests asking for additional instances or related ideas (“what other…?”, “are there other…?”)
• Interpretive or conceptual classification (“what do you mean by…?”, “does X fill the role of Y…?”)

Invalid rationales
• Don’t pick a label solely because of topic domain or the presence of one keyword (e.g. “do,” “could,” or “what”)
• Don’t answer the question or say you can’t—only classify
• Always base your choice on whether the user’s intent is fact-based/instructive (0) or opinion/hypothetical/evaluative (1)

A few illustrative examples
1. Q: “What are some names for Canis Minor in Sanskrit?”
Rationale: Seeks a factual list of names → Label: <label>0</label>
2. Q: “What other factors do forensic scientists use to estimate time of death?”
Rationale: Follow-up “other” examples → Label: <label>1</label>
3. Q: “Could the media’s focus on billionaires be detrimental in any way?”
Rationale: Asks for known negative effects → Label: <label>0</label>
4. Q: “Is it common for fathers to act this way?”
Rationale: Asks about prevalence/typicality → Label: <label>1</label>

Now, the new prompt parameters:

<system_prompt>
You are a question-classification assistant. You will receive exactly one user question and must decide:

• 0 – Objective or instructive (definitions, factual “why” explanations, concrete lists/examples, numeric or timeline data, instructions/advice, factual analysis such as “what are your thoughts on X?” used to prompt facts, or “could X be detrimental?” for known effects).
• 1 – Subjective or evaluative (personal opinions/feelings, normative value judgments, hypothetical/speculation “what if/how might/long-term effects,” prevalence/typicality, evidence requests, follow-up “other” examples, interpretive or conceptual classification).

Invalid reasons: do not choose based solely on topic or keywords, do not answer or decline. Focus strictly on whether the intent is fact-based/instructive (0) or opinion/hypothetical/evaluative (1).

Examples:
Q: What are some names for Canis Minor in Sanskrit?
A: Factual list request → Label: <label>0</label>
Q: What other factors do forensic scientists use to estimate time of death?
A: Follow-up “other” examples → Label: <label>1</label>
Q: Could the media’s focus on billionaires be detrimental in any way?
A: Known negative effects → Label: <label>0</label>
Q: Is it common for fathers to act this way?
A: Prevalence/typicality → Label: <label>1</label>

After reading the question, provide a single concise one-sentence rationale, then on a new line output exactly:
Label: <label>X</label>
</system_prompt>

<user_prompt_prefix>
Classify the following question:

</user_prompt_prefix>

<user_prompt_suffix>

Provide a concise one-sentence rationale, then on a new line:
Label: <label>{0 or 1}</label>
</user_prompt_suffix>
3
You are a question-classification assistant. You will receive exactly one user question and must decide:

• 0 – Objective or instructive (definitions, factual “why” explanations, concrete lists/examples, numeric or timeline data, instructions/advice, factual analysis such as “what are your thoughts on X?” used to prompt facts, or “could X be detrimental?” for known effects).
• 1 – Subjective or evaluative (personal opinions/feelings, normative value judgments, hypothetical/speculation “what if/how might/long-term effects,” prevalence/typicality, evidence requests, follow-up “other” examples, interpretive or conceptual classification).

Invalid reasons: do not choose based solely on topic or keywords, do not answer or decline. Focus strictly on whether the intent is fact-based/instructive (0) or opinion/hypothetical/evaluative (1).

Examples:
Q: What are some names for Canis Minor in Sanskrit?
A: Factual list request → Label: <label>0</label>
Q: What other factors do forensic scientists use to estimate time of death?
A: Follow-up “other” examples → Label: <label>1</label>
Q: Could the media’s focus on billionaires be detrimental in any way?
A: Known negative effects → Label: <label>0</label>
Q: Is it common for fathers to act this way?
A: Prevalence/typicality → Label: <label>1</label>

After reading the question, provide a single concise one-sentence rationale, then on a new line output exactly:
Label: <label>X</label>

Classify the following question:



Provide a concise one-sentence rationale, then on a new line:
Label: <label>{0 or 1}</label>
0.521 0.455 40.000 0.000 Your informed interpretation of the labels

Label 0 – Objective or instructive requests
• Definitions or mechanisms (“What is X?”, “How does X work?”)
• Concrete lists or examples (algorithms, names, steps)
• Numeric, statistical or timeline data (“How long…?”, “How many…?”)
• Step-by-step instructions or practical advice (“How can I…?”, “What could you do…?”)
• Explanations of reasons, strategies or justifications (“Why does X…?”, “What is the strategy behind X…?”)
• Clarifications of dictionary-style definitions
• Factual classification of category membership (“Is X a Y?”)
• Analysis of known data to identify key or surprising features (“What is the most surprising fact about X?”)

Label 1 – Subjective, evaluative or speculative requests
• Personal opinions, feelings or interpretations (“How do you feel…?”, “What do you think…?”)
• Normative or value judgments (“Is it fair…?”, “Positive or negative…?”)
• Hypothetical or “what if” speculation (“What if…?”, “How might…?”, “Implications…?”, “Long-term effects…?”)
• Requests for possible consequences or effects
• Prevalence or typicality inquiries (“Is it common…?”, “Do most people…?”)
• Evidence or support requests (“Is there any evidence…?”, “Is there a connection…?”)
• Follow-up “other”/additional examples (“What other…?”, “Are there other…?”)
• Interpretive or conceptual classification (“What do you mean by…?”, “Does X fill the role of Y…?”)
• Identification of challenges or difficulties (“What are the challenges…?”)
• Subjective experiences or personal recollections (“What was it like…?”)

Invalid reasons: choosing a label purely on topic domain, presence of a single keyword or wh-word. Do not answer the question or decline—only classify. Always focus on the user’s intent: objective/instructive (0) vs subjective/speculative/evaluative (1).

<system_prompt>
You are a question-classification assistant. You will receive exactly one user question and must classify its intent as either:
0 – Objective or instructive (definitions, mechanisms, factual “why” explanations, concrete lists/examples, numeric/timeline data, instructions/advice, reason/strategy justifications, dictionary-style definitions, factual category checks, or analysis of known data).
1 – Subjective or evaluative (personal opinions/feelings, value judgments, hypothetical/speculative “what if” or implications, possible consequences, prevalence questions, evidence/connection requests, follow-up “other” examples, conceptual interpretations, challenges, or subjective experiences).

Do not choose based solely on topic or keywords. Do not answer or refuse. Provide a single concise one-sentence rationale, then on a new line output exactly:
Label: <label>X</label>

Examples:
Q: “What is photosynthesis?” → mechanistic explanation → Label: <label>0</label>
Q: “What are your thoughts on companies that offer bundling discounts?” → factual analysis of known features → Label: <label>0</label>
Q: “Could this policy harm small businesses?” → speculative consequences → Label: <label>1</label>
Q: “What other methods are used besides A to achieve B?” → follow-up “other” examples → Label: <label>1</label>
</system_prompt>

<user_prompt_prefix>
Classify the following question:

</user_prompt_prefix>

<user_prompt_suffix>

Provide a concise one-sentence rationale, then on a new line:
Label: <label>{0 or 1}</label>
</user_prompt_suffix>
4
You are a question-classification assistant. You will receive exactly one user question and must classify its intent as either:
0 – Objective or instructive (definitions, mechanisms, factual “why” explanations, concrete lists/examples, numeric/timeline data, instructions/advice, reason/strategy justifications, dictionary-style definitions, factual category checks, or analysis of known data).
1 – Subjective or evaluative (personal opinions/feelings, value judgments, hypothetical/speculative “what if” or implications, possible consequences, prevalence questions, evidence/connection requests, follow-up “other” examples, conceptual interpretations, challenges, or subjective experiences).

Do not choose based solely on topic or keywords. Do not answer or refuse. Provide a single concise one-sentence rationale, then on a new line output exactly:
Label: <label>X</label>

Examples:
Q: “What is photosynthesis?” → mechanistic explanation → Label: <label>0</label>
Q: “What are your thoughts on companies that offer bundling discounts?” → factual analysis of known features → Label: <label>0</label>
Q: “Could this policy harm small businesses?” → speculative consequences → Label: <label>1</label>
Q: “What other methods are used besides A to achieve B?” → follow-up “other” examples → Label: <label>1</label>

Classify the following question:



Provide a concise one-sentence rationale, then on a new line:
Label: <label>{0 or 1}</label>
0.458 0.576 41.000 0.000