Quality Understanding in Online Question Answer Sites
The internet is full of questions, but not all questions are created equal. For our research developing a question-answer for Earth and ocean sciences, it’s important to understand the distinction between well-formed scientific inquiries that can be addressed with an analysis or reference, and questions that require further clarification to deliver meaningful responses. This poster explores how statistical language models can be applied to predict the quality of a scientific question as part of this larger research endeavor.
This work extends research on question quality within specific online communities, notably Yahoo! Answers and StackExchange, seeing if the same part of speech parsing and hierarchical topic modeling approaches can yield generalizable models. We develop a cross-platform metric of question quality and use it to assess the quality of these statistical models on Yahoo! Answers, StackExchange, and Reddit AskScience corpora.
We believe that this work will have applications beyond the initial motivation that make it relevant to the broader statistics community. A better approach to discriminate question quality will enable us to build stronger generative models, discover and respond to scientific questions online, and probe the capacity of statistical models to understand quality in human language.
Authors: Gabriel Montague (Harvard), Jorge Ortiz (IBM Research), Catherine Crawford (IBM Research)