Why AI is a problem on Stack Overflow

There's a scene in Fellini's where the director character is reading a negative review of the movie he is working on (which is also, somehow, the movie we are watching). The critic speaks the review in voice-over and the voice-over abruptly stops when the director suddenly tosses the paper away.


This is a companion discussion topic for the original entry at https://jlericson.com/2023/06/26/problem_ai.html

Like others, I’m trying to understand why the company is having a hard time getting community consistently right.

It occurs to me that, at its core, the problem is one of signals. I believe it’s the same fundamental problem faced by social media companies. The end users’ behavior and communication is a signal. Another set of signals is from shareholders and advertisers. Company decisions are profit-driven, meaning they will tend to align with the signals from shareholders/advertisers, and not tend to align (notice I did not say “tend not to align” - a crucial difference) with signals from end users (in the case of SO, this is askers, answerers, and moderators: the “community”).

I have no doubt that the company - and all companies with a similar business model where the product is seen by users as being the free thing-of-value-to-them, and by the company and those that pay the bills as a system providing access to the users for a modest fee - understands very well that keeping the users happy and engaged is foundational to the success of the business. But the financial signals are immediate and clear, and the community signals are indirect, delayed, and often messy. The human mind, no matter how brilliant, just can’t compete with such a setup, try as they might. And I don’t doubt the leadership of such companies does try. It’s just that the business model is really great for initial growth, and really great at having that momentum pull the company past the initial golden age and on into a vicious cycle of decline.

If that sounds overly pessimistic, I am hopeful that there’s yet another business model innovation yet to blossom: one beyond simple pay-for-value, beyond the misaligned signals mess we have now. Will we find it? Is it a pipe dream? Time will tell.

2 Likes

There’s multiple instances of “ChatGTP” in the post

1 Like

This is exactly right in my experience. It certainly doesn’t help that there’s always complaining about something. If you don’t have the right filters, it’s easy to imagine that nothing will make the community happy and so you might as well pursue business goals instead.

The other issue is that financial problems represent an existential crisis for the company, but the community doesn’t feel the effect immediately. So it’s hard to tell the difference between a company greedy for growth and a company desperate to pay the bills. It really screws things up if the company is constitutionally unable to be honest with its community.

The idea of using drafts to try to estimate the volume of potentially AI-generated posts is an interesting one, even though it has problems, and I wonder how much better the company’s response to ChatGPT could have been if they had been proactively adding similar indirect metrics to measure behaviors that might indicate changes in sites’ health. Of course then the company would have to define “healthy” and that may be difficult because it does not necessarily overlap with “generates measurable revenue growth”.

It makes me think of agile’s “Definition of Done”:

Definition of done is a simple list of activities (writing code, coding comments, unit testing, integration testing, release notes, design documents, etc.) that add verifiable/demonstrable value to the product.

Keeping an auditable list of what you mean when you say a task is “done” (or in this instance when you say a site is “healthy”) helps get people on the same page when measuring progress. Is a a site improving, stagnating or declining? Stack Exchange already has some metrics for getting a site through public beta on Area51, but for some reason those metrics lose visibility once a site graduates.

Is it more important to welcome new users than discourage posting of AI generated junk? How can you have that conversation if everyone in the room has a different idea of what a healthy site looks like? Also, keeping metrics in front of a community might have a benefit of engaged users working to improve those statistics.

We used to do regular site evaluations that served some of the same goals as the Area 51 stats. Unfortunately the process was labor intensive and didn’t seem to have much of an impact so I wouldn’t argue for going back to it. (It’s also hard to imagine the company concerning itself with the idea.) But I do think there’s a need for some form of quality evaluation.

(I’m on vacation at the moment, I’m thinking this might be something I can start for my new community at College Confidential. I wonder if Codidact has considered that?)

1 Like

I would rather see on-going metrics presented in an easy to understand way that a good portion of the community (not just 20k+) could view whenever they got curious instead of a formal check-in or review. Maybe some of the most popular SEDE queries would be a good place to start looking for inspiration. (I’m writing this from a SE perspective, but I think some of it translates to other communities and networks).

I’d like to see some friendly rivalry among sites around keeping metrics in the positive zone. Leaderboards are very motivating for some folks. If the metrics are designed well, I should be able to compare ELL to Stack Overflow. For example, what percentage of the “good” questions asked this fortnight have more than one upvoted answer? If “good” is defined in way that normalizes out activity differences, it shouldn’t matter too much how big a site is in terms of active users and traffic. “What percentage of the active community with access to certain privileges (close votes for example) use them with some amount of frequency?” might be another metric where “active” is measured against the median activity for a site rather than some fixed “visits per week” threshold. (I don’t know if those are good metrics for community health or not; they’re just off the top of my head)

There could be a bunch of metrics with more being added all the time. Maybe metrics get a test run to see how meaningful they are and get decommissioned after a while if they don’t contribute much value. Each individual metric doesn’t have to be a KPI (key performance indicator) to have some value when looking at how different aspects of the community are changing over time.