I spent the past few weeks taking an in-depth look at how our users find questions to answer, with a keen eye on Stack Overflow. I measured user behavior and click streams and discussed with the other Stack Overflow developers and community managers how they meant the navigation and questiion list pages to work when they implemented them.

Last week I blogged about the answer sources. This posts deal with the measured user behavior. Where do our users actually find stuff to answer? What do the numbers say?

I think I found some interesting answers.

Answer Sources

It's not trivial to understand how our users find stuff to answer because the site is so varied and functionality-rich. Since question sources lead to questions and question pages contain the form used to answer, we've been injecting hidden <input> fields containing the source in the form using a mix of the referrer and server-side logic. We then store the events in our A/B and site measuring internal über-tool PRIZM.

I then proceeded to apply some convoluted SQL to investigate the question "where do Stack Exchange answers come from?"

The bare sources

The first thing we looked at is the breakdown by page.

Stack Overflow answer sources by page

This breakdown shows that most answers come from either the home page or the "Questions by Tag" page. This is hardly surprising as they are some of the most easily reachable pages (by clicking our logo or clicking on a tag).

Another interesting breakdown is answers by tab.

Stack Overflow answer sources by tab

The "Newest" tabs are by far the most used sources -- people monitor tags by sitting on the tabs displaying incoming questions to answer as fast as possible. Also, the interesting tab on the home page seems to be a good source of answers.

Which tabs earn our users the most reputation?

The next thing we examined is what sources lead to more reputation gain. For this we use a fake "rep" value calculated via rep (score * 10 + 15 if accepted), which doesn't really account for down votes and bounties. Joined with the number of answers give, this shows that, by far, the tabs with most velocity are favored.

Stack Overflow answer sources by total "Rep"

A doubt: do views affect score?

A common doubt is that the number of views affects score and thus the views that show lists having questions with the most traffic will generate more rep due to it, instead of the quality of their contents. I verified that there is no major change on the question voting patterns until 250 views. I have already corrected all the statistics, including past ones, you will find here so only questions with at most 250 views are considered.

There's not much difference in the distribution of scores below 300

Instead, consider the questions with 700-800 views: more than 60% of them have scores below -1 or over 7! Whilst quality drives views, there's no doubt that views drive voting. By considering only questions with relative few views, we are removing outliers from our data and increasing its quality. How many answers are we disregarding with this? In percentage, few. See how answers are distributed in the buckets.

Basically all answers are in the first 2 buckets anyway

Another doubt: does reputation affect score?

Another doubt is that users with higher reputation gain more votes than users with less. This is a reasonable question because the more a user is experienced, the more effective they become at answering -- they've had time to learn. To remove any doubt, we've been only considering answers to recent questions (answer within 12 hours), and only recent activity.

Reputation vs. Average "Rep"

Surprisingly, this scatter diagram follows a log law with a decent fit:

math

This shows that different reputation groups have different reputation gains from answers -- because of experience, because of self-selection, because people vote more for higher rep users, and so on -- and that these gains are predictable. We can use it to create reputation "boundaries" to cluster users into sensible groups arbitrarily. We can then use those groups to see if there are differences in behaviors.

Group	Total Rep	Average "Rep"	# Users
Minimum	1	1.705339066	2416047
Low	2-200	7.94400314	1357252
Medium	201-2500	12.33444211	186998
High	2501-30000	17.76094111	30219
AAA+++	30001+	25.33123327	1718

As you can see the users have different average "Reputation" gained by answering (so, likely, different behavior). As the whole community breaks down in these categories, predictably the numbers go down as the rep goes up, but the number of users is still high so the category is useful.

Users by group

Let us circle back: how does answering behavior correlate to classes?

Answer sources by group

There are some evident patterns in this graph:

Higher rep users use more of Questions List by Tag and Home page and, funnily, Questions Show.
Lower rep users answer more from Newest Questions, Other (e.g. Google, other network sites) and unknown sources.

Conclusions

These are the key takeaway points:

There are a few pages which are most used by our users to find stuff to answer: the home page and the list of newest questions by tag. Other sources are used more by inexperienced users, but progressively less as users learn how to use the site.
Users can be effectively categorized based on reputation, and our chosen boundaries give us 5 classes of users: minimal rep, low rep, medium rep, high rep and AAA+++ rep. These users behave in a quantitatively different manner and get different results on the site.
The site navigation gives great relevance to the "Unanswered" and "Questions" top navs, but these do not correspond to user behavior. If we were to restructure navigation so that the most useful lists more easily accessible, this should help lower rep users to adopt the most effective navigation style quicker.

Discuss on Reddit or Hacker News

I am the Chief R&D at BaxEnergy, developer, hacker, blogger, conference lecturer. Bio: ex Stack Overflow core, ex Toptal core.

No, I Don’t Want to Subscribe to Your Newsletter
The Bureau of Programming • Feb 25, 2017

No, I don’t want to subscribe to your newsletter. No, I don’t want to complete a short survey. And no, I don’t want to become a member.

Stack Overflow: answer sources by the numbers

Answer Sources

The bare sources

Which tabs earn our users the most reputation?

A doubt: do views affect score?

Another doubt: does reputation affect score?

Let us circle back: how does answering behavior correlate to classes?

Conclusions

Newest Posts

TDD and the Zero-Defects Myth

What can Stack Overflow learn from ChatGPT?

Fan mail

Intelligent Trip

Guest blog: Building, in partnership with communities by Shog9

Gleanings

No, I Don’t Want to Subscribe to Your Newsletter
The Bureau of Programming • Feb 25, 2017

Stack Overflow: answer sources by the numbers

Answer Sources

The bare sources

Which tabs earn our users the most reputation?

A doubt: do views affect score?

Another doubt: does reputation affect score?

Let us circle back: how does answering behavior correlate to classes?

Conclusions

Newest Posts

TDD and the Zero-Defects Myth

What can Stack Overflow learn from ChatGPT?

Fan mail

Intelligent Trip

Guest blog: Building, in partnership with communities by Shog9

Gleanings

No, I Don’t Want to Subscribe to Your NewsletterThe Bureau of Programming • Feb 25, 2017

No, I Don’t Want to Subscribe to Your Newsletter
The Bureau of Programming • Feb 25, 2017