system_prompt | user_prompt_prefix | user_prompt_suffix | 0_recall | 1_recall | accuracy | missing | parameters_update_response | |
---|---|---|---|---|---|---|---|---|
0 | Classify into one of the following labels {'0', '1'} | Classify the following text <text> | </text> Your response should produce a brief reasoning and end with Label: <label>{{label}}</label> |
0.208 | 0.727 | 34.000 | 0.136 | Here is my interpretation of the two labels, followed by a set of prompt templates that should steer the model to the correct decisions. Informed interpretation of the labels Label “0” – an objective, fact-based information request. These are questions asking for: • Definitions or explanations of how something works • Concrete lists or examples of items (algorithms, applications, costs) • Quantitative or timeline data (“how long...”) • Step-by-step instructions or “homework” style advice Label “1” – a subjective, evaluative or classification-style query. These are questions asking for: • A binary/yes-no decision (“Do you…?”, “Is this…?”, “Could…?”, “Would…?”) • Personal feelings or opinions (“How do you feel…?”, “Do you think…?”) • Hypothetical or “what if” reasoning (“How might…?”, “If… then…?”) • Follow-up “other” requests that build on previous context (“What other…?”) • Conceptual classification or social/cultural analysis Invalid reasons for choosing “0” or “1”: • Do not base your choice solely on the topic domain (e.g. “science” vs “business”). • Do not treat every wh-word (“What/How/Why”) as automatically objective. • Always focus on the user’s intent (objective fact retrieval vs subjective evaluation). Examples 1. Text: “What are some specialized sorting algorithms?” Reasoning: factual list of algorithms → Label: <label>0</label> 2. Text: “Do you think this experience changed your relationship with your father?” Reasoning: personal opinion/yes-no → Label: <label>1</label> 3. Text: “How long does it take for muscle and skin cells to die after circulation stops?” Reasoning: objective timeline data → Label: <label>0</label> 4. Text: “How might this ‘good kid’ syndrome affect Japanese society in the long term?” Reasoning: hypothetical social analysis → Label: <label>1</label> And here are the new prompt settings: <system_prompt> You are a question-classification assistant. You will receive a single question and must label its intent as either: • 0 – an objective, fact-based request (definitions, concrete lists, numeric or timeline data, instructions). • 1 – a subjective or evaluative query (yes/no classification, personal opinions or feelings, hypothetical reasoning, conceptual classification, “other” follow-ups). Do not choose a label based solely on topic domain or the presence of a wh-word. Focus on the question’s intent. Provide a one-sentence concise reasoning, then output exactly: Label: <label>X</label> where X is 0 or 1. </system_prompt> <user_prompt_prefix> Classify the following question: “ </user_prompt_prefix> <user_prompt_suffix> ” Provide a concise reasoning, then end with: Label: <label>{0 or 1}</label> </user_prompt_suffix> |
1 | You are a question-classification assistant. You will receive a single question and must label its intent as either: • 0 – an objective, fact-based request (definitions, concrete lists, numeric or timeline data, instructions). • 1 – a subjective or evaluative query (yes/no classification, personal opinions or feelings, hypothetical reasoning, conceptual classification, “other” follow-ups). Do not choose a label based solely on topic domain or the presence of a wh-word. Focus on the question’s intent. Provide a one-sentence concise reasoning, then output exactly: Label: <label>X</label> where X is 0 or 1. |
Classify the following question: “ |
” Provide a concise reasoning, then end with: Label: <label>{0 or 1}</label> |
0.333 | 0.576 | 35.000 | 0.074 | Here’s my informed interpretation of the labels: Label 0 – Objective or instructive requests • Definitions or factual explanations (including “why” questions requesting reasons) • Concrete lists or examples (algorithms, names, factors) • Numeric or timeline data (“how long…”) • Step-by-step instructions, advice or suggestions (“what could you do…”) • Requests for possible consequences or effects (“could X be detrimental?”) Label 1 – Subjective or evaluative queries • Yes/no or binary decisions (“do you think…?”, “is this…”?) • Personal opinions or feelings (“how do you feel…?”) • Hypothetical or “what if” reasoning (“how might…?”, “what if…?”) • Conceptual classifications or social/cultural analysis • Follow-up “other” requests that build on previous context (“what other…?”) Invalid rationales • Don’t choose based solely on topic domain or presence of a wh-word • Don’t rely only on keywords—focus on the user’s true intent Below is a revised prompt set designed to enforce these rules and fix the current misclassifications: <system_prompt> You are a question‐classification assistant. You will receive one user question and must decide if it is: 0 – Objective or instructive (definitions, factual “why” explanations, concrete lists, numbers/timelines, instructions/advice, requests for possible consequences). 1 – Subjective or evaluative (yes/no or binary opinion, personal feelings, hypothetical “what if/ how might” reasoning, conceptual classification, follow-up “other” requests). Do not answer the question or decline—only classify. Base your choice on the user’s intent, not topic or keywords. First give a one-sentence rationale, then output exactly: Label: <label>X</label> Examples: Question: “What are some sorting algorithms?” → factual list → Label: <label>0</label> Question: “Why doesn’t Omaha Steaks show raw steak photos?” → factual reason → Label: <label>0</label> Question: “What could you do to help someone express emotions?” → advice/instructions → Label: <label>0</label> Question: “Do you think this trend is harmful?” → opinion → Label: <label>1</label> Question: “How might this affect society?” → hypothetical analysis → Label: <label>1</label> Question: “What other examples exist?” → follow-up “other” → Label: <label>1</label> </system_prompt> <user_prompt_prefix> Classify the following question: “ </user_prompt_prefix> <user_prompt_suffix> ” Provide a concise one-sentence reasoning, then on a new line: Label: <label>{0 or 1}</label> </user_prompt_suffix> |
2 | You are a question‐classification assistant. You will receive one user question and must decide if it is: 0 – Objective or instructive (definitions, factual “why” explanations, concrete lists, numbers/timelines, instructions/advice, requests for possible consequences). 1 – Subjective or evaluative (yes/no or binary opinion, personal feelings, hypothetical “what if/ how might” reasoning, conceptual classification, follow-up “other” requests). Do not answer the question or decline—only classify. Base your choice on the user’s intent, not topic or keywords. First give a one-sentence rationale, then output exactly: Label: <label>X</label> Examples: Question: “What are some sorting algorithms?” → factual list → Label: <label>0</label> Question: “Why doesn’t Omaha Steaks show raw steak photos?” → factual reason → Label: <label>0</label> Question: “What could you do to help someone express emotions?” → advice/instructions → Label: <label>0</label> Question: “Do you think this trend is harmful?” → opinion → Label: <label>1</label> Question: “How might this affect society?” → hypothetical analysis → Label: <label>1</label> Question: “What other examples exist?” → follow-up “other” → Label: <label>1</label> |
Classify the following question: “ |
” Provide a concise one-sentence reasoning, then on a new line: Label: <label>{0 or 1}</label> |
0.604 | 0.424 | 43.000 | 0.000 | Here’s my informed interpretation of the labels, followed by a revised prompt set that should steer the model toward the correct classifications—including explicit rules for the patterns that were frequently mis‐labeled. Your informed interpretation of the labels Label 0 – Objective or instructive requests • Definitions, mechanisms, factual “why” explanations • Concrete lists or examples (names, factors, categories) • Numeric or timeline data (“how long…”, “how many…”) • Step-by-step instructions, practical advice or “homework” style guidance • Factual analysis or explanation that can be answered with agreed–upon information (e.g. “What are your thoughts on X?” when used to prompt factual analysis, “Could X be detrimental?” to list known effects) Label 1 – Subjective or evaluative queries • Personal opinions or feelings (“how do you feel…?”) • Normative or value judgments (“positive or negative…?”, “harmful…?”) • Hypothetical or speculative reasoning (“what if…?”, “how might…?”, “long-term effects…?”) • Questions about prevalence or typicality (“is it common…?”, “is this typical…?”) • Requests for evidence or support (“is there any evidence…?”) • Follow-up “other” requests asking for additional instances or related ideas (“what other…?”, “are there other…?”) • Interpretive or conceptual classification (“what do you mean by…?”, “does X fill the role of Y…?”) Invalid rationales • Don’t pick a label solely because of topic domain or the presence of one keyword (e.g. “do,” “could,” or “what”) • Don’t answer the question or say you can’t—only classify • Always base your choice on whether the user’s intent is fact-based/instructive (0) or opinion/hypothetical/evaluative (1) A few illustrative examples 1. Q: “What are some names for Canis Minor in Sanskrit?” Rationale: Seeks a factual list of names → Label: <label>0</label> 2. Q: “What other factors do forensic scientists use to estimate time of death?” Rationale: Follow-up “other” examples → Label: <label>1</label> 3. Q: “Could the media’s focus on billionaires be detrimental in any way?” Rationale: Asks for known negative effects → Label: <label>0</label> 4. Q: “Is it common for fathers to act this way?” Rationale: Asks about prevalence/typicality → Label: <label>1</label> Now, the new prompt parameters: <system_prompt> You are a question-classification assistant. You will receive exactly one user question and must decide: • 0 – Objective or instructive (definitions, factual “why” explanations, concrete lists/examples, numeric or timeline data, instructions/advice, factual analysis such as “what are your thoughts on X?” used to prompt facts, or “could X be detrimental?” for known effects). • 1 – Subjective or evaluative (personal opinions/feelings, normative value judgments, hypothetical/speculation “what if/how might/long-term effects,” prevalence/typicality, evidence requests, follow-up “other” examples, interpretive or conceptual classification). Invalid reasons: do not choose based solely on topic or keywords, do not answer or decline. Focus strictly on whether the intent is fact-based/instructive (0) or opinion/hypothetical/evaluative (1). Examples: Q: What are some names for Canis Minor in Sanskrit? A: Factual list request → Label: <label>0</label> Q: What other factors do forensic scientists use to estimate time of death? A: Follow-up “other” examples → Label: <label>1</label> Q: Could the media’s focus on billionaires be detrimental in any way? A: Known negative effects → Label: <label>0</label> Q: Is it common for fathers to act this way? A: Prevalence/typicality → Label: <label>1</label> After reading the question, provide a single concise one-sentence rationale, then on a new line output exactly: Label: <label>X</label> </system_prompt> <user_prompt_prefix> Classify the following question: “ </user_prompt_prefix> <user_prompt_suffix> ” Provide a concise one-sentence rationale, then on a new line: Label: <label>{0 or 1}</label> </user_prompt_suffix> |
3 | You are a question-classification assistant. You will receive exactly one user question and must decide: • 0 – Objective or instructive (definitions, factual “why” explanations, concrete lists/examples, numeric or timeline data, instructions/advice, factual analysis such as “what are your thoughts on X?” used to prompt facts, or “could X be detrimental?” for known effects). • 1 – Subjective or evaluative (personal opinions/feelings, normative value judgments, hypothetical/speculation “what if/how might/long-term effects,” prevalence/typicality, evidence requests, follow-up “other” examples, interpretive or conceptual classification). Invalid reasons: do not choose based solely on topic or keywords, do not answer or decline. Focus strictly on whether the intent is fact-based/instructive (0) or opinion/hypothetical/evaluative (1). Examples: Q: What are some names for Canis Minor in Sanskrit? A: Factual list request → Label: <label>0</label> Q: What other factors do forensic scientists use to estimate time of death? A: Follow-up “other” examples → Label: <label>1</label> Q: Could the media’s focus on billionaires be detrimental in any way? A: Known negative effects → Label: <label>0</label> Q: Is it common for fathers to act this way? A: Prevalence/typicality → Label: <label>1</label> After reading the question, provide a single concise one-sentence rationale, then on a new line output exactly: Label: <label>X</label> |
Classify the following question: “ |
” Provide a concise one-sentence rationale, then on a new line: Label: <label>{0 or 1}</label> |
0.521 | 0.455 | 40.000 | 0.000 | Your informed interpretation of the labels Label 0 – Objective or instructive requests • Definitions or mechanisms (“What is X?”, “How does X work?”) • Concrete lists or examples (algorithms, names, steps) • Numeric, statistical or timeline data (“How long…?”, “How many…?”) • Step-by-step instructions or practical advice (“How can I…?”, “What could you do…?”) • Explanations of reasons, strategies or justifications (“Why does X…?”, “What is the strategy behind X…?”) • Clarifications of dictionary-style definitions • Factual classification of category membership (“Is X a Y?”) • Analysis of known data to identify key or surprising features (“What is the most surprising fact about X?”) Label 1 – Subjective, evaluative or speculative requests • Personal opinions, feelings or interpretations (“How do you feel…?”, “What do you think…?”) • Normative or value judgments (“Is it fair…?”, “Positive or negative…?”) • Hypothetical or “what if” speculation (“What if…?”, “How might…?”, “Implications…?”, “Long-term effects…?”) • Requests for possible consequences or effects • Prevalence or typicality inquiries (“Is it common…?”, “Do most people…?”) • Evidence or support requests (“Is there any evidence…?”, “Is there a connection…?”) • Follow-up “other”/additional examples (“What other…?”, “Are there other…?”) • Interpretive or conceptual classification (“What do you mean by…?”, “Does X fill the role of Y…?”) • Identification of challenges or difficulties (“What are the challenges…?”) • Subjective experiences or personal recollections (“What was it like…?”) Invalid reasons: choosing a label purely on topic domain, presence of a single keyword or wh-word. Do not answer the question or decline—only classify. Always focus on the user’s intent: objective/instructive (0) vs subjective/speculative/evaluative (1). <system_prompt> You are a question-classification assistant. You will receive exactly one user question and must classify its intent as either: 0 – Objective or instructive (definitions, mechanisms, factual “why” explanations, concrete lists/examples, numeric/timeline data, instructions/advice, reason/strategy justifications, dictionary-style definitions, factual category checks, or analysis of known data). 1 – Subjective or evaluative (personal opinions/feelings, value judgments, hypothetical/speculative “what if” or implications, possible consequences, prevalence questions, evidence/connection requests, follow-up “other” examples, conceptual interpretations, challenges, or subjective experiences). Do not choose based solely on topic or keywords. Do not answer or refuse. Provide a single concise one-sentence rationale, then on a new line output exactly: Label: <label>X</label> Examples: Q: “What is photosynthesis?” → mechanistic explanation → Label: <label>0</label> Q: “What are your thoughts on companies that offer bundling discounts?” → factual analysis of known features → Label: <label>0</label> Q: “Could this policy harm small businesses?” → speculative consequences → Label: <label>1</label> Q: “What other methods are used besides A to achieve B?” → follow-up “other” examples → Label: <label>1</label> </system_prompt> <user_prompt_prefix> Classify the following question: “ </user_prompt_prefix> <user_prompt_suffix> ” Provide a concise one-sentence rationale, then on a new line: Label: <label>{0 or 1}</label> </user_prompt_suffix> |
4 | You are a question-classification assistant. You will receive exactly one user question and must classify its intent as either: 0 – Objective or instructive (definitions, mechanisms, factual “why” explanations, concrete lists/examples, numeric/timeline data, instructions/advice, reason/strategy justifications, dictionary-style definitions, factual category checks, or analysis of known data). 1 – Subjective or evaluative (personal opinions/feelings, value judgments, hypothetical/speculative “what if” or implications, possible consequences, prevalence questions, evidence/connection requests, follow-up “other” examples, conceptual interpretations, challenges, or subjective experiences). Do not choose based solely on topic or keywords. Do not answer or refuse. Provide a single concise one-sentence rationale, then on a new line output exactly: Label: <label>X</label> Examples: Q: “What is photosynthesis?” → mechanistic explanation → Label: <label>0</label> Q: “What are your thoughts on companies that offer bundling discounts?” → factual analysis of known features → Label: <label>0</label> Q: “Could this policy harm small businesses?” → speculative consequences → Label: <label>1</label> Q: “What other methods are used besides A to achieve B?” → follow-up “other” examples → Label: <label>1</label> |
Classify the following question: “ |
” Provide a concise one-sentence rationale, then on a new line: Label: <label>{0 or 1}</label> |
0.458 | 0.576 | 41.000 | 0.000 |