Wikipedia talk:Writing articles with large language models: Difference between revisions

Content deleted Content added

Latest revision as of 00:03, 25 October 2025

What is the purpose of this guideline?
- To establish a ground rule against using AI tools to create articles from scratch.
This guideline covers so little! What’s the point?
- The point is to have something. Instead of trying to get consensus for the perfect guideline on AI, which doesn’t exist, we have been in practice pursuing a piecemeal approach: restrictions on AI images, AI-generated comments, etc. This is the next step. Eventually, we might merge them all into one to create one guideline on AI use. This proposal is designed to be as simple and straightforward as possible so as to easily gain consensus.
Why doesn’t this guideline explain or justify itself?
- Guidelines aren’t information pages. We have plenty of information already about why using LLMs is usually a bad idea at WP:LLM.

Cremastra (talk · contribs) 20:43, 24 October 2025 (UTC)[reply]

Should this proposal be accepted as a guideline? (Please consider reading the FAQ above before commenting.)

Is this going to solve everything about AI? Of course not. But the current state of affairs of no real formal P&Gs on LLMs in articles – just WP:LLM, which is an essay – is a huge weakness. We need guidelines to point to, not essays. Let’s write out what our practical ground rules are: for starters, that using a chatbot to write a Wikipedia article for you is not a good plan.

We could spend ages writing up up one page that reflects all our views on AI. We tried that, and it failed. So I’m trying this instead.

This proposal doesn’t say a lot of things. It doesn’t say anything about using LLMs to find sources or rephrase paragraphs or even write a new section for you. It’s just about writing a new article from scratch with a chatbot, which I think there’s consensus against. Thank you. Cremastra (talk · contribs) 20:56, 24 October 2025 (UTC)[reply]

Yes as proposer. It’s about time we articulated this: LLMs may have their uses, and there are plenty of edge cases, but using ChatGPT to generate your article for you is not one of those. We have CSD G15, but that’s a “practical” thing. We need the theory, the guidelines, to back it up. Statements of principle. Something very simple we can point new users to, that anyone can understand. This is that. Cremastra (talk · contribs) 20:56, 24 October 2025 (UTC)[reply]
Where’s the WP:RFCBEFORE discussion? voorts (talk/contributions) 21:38, 24 October 2025 (UTC)[reply]
Not ready for RFC, workshop first Can we like, workshop this before going to an RFC? I think the community would consider a total ban on LLMs to generate article content, per both Kowal2701 and Chaotic Enby below (note: added) at this point – your proposal is just a ban on LLMs to create new articles, which leaves a gap. There are also implications around enforcement that I was hoping to have basic guidelines for before launching into this. NicheSports (talk) 21:44, 24 October 2025 (UTC)[reply]

Yes, it leaves a gap. That’s the point.

We keep trying to build a perfect house, but that’s hard to gain consensus here. All I’m trying to do is build the foundations, so we have something. Is that really that hard? Cremastra (talk · contribs) 22:51, 24 October 2025 (UTC)[reply]

If you want to use the RfC process, yes, it is that hard. RFCBEFORE is very clear that you shouldn’t bring something to the community for an RfC until it’s been discussed. voorts (talk/contributions) 23:01, 24 October 2025 (UTC)[reply]

For a proposal to be accepted as a guideline, it is a clear status quo that it must be done via an RfC. Not all proposals, especially simple ones, need to be “workshopped” in advance. If you want to remove the RfC tag, go ahead. I do understand the RFCBEFORE argument. But a discussion is needed that simply asks whether we accept this as a statement of principle. Cremastra (talk · contribs) 23:05, 24 October 2025 (UTC)[reply]

There is no simple LLM policy because they can be used in many ways and identifying their use is often subjective. Let’s discuss this. It doesn’t have to take forever. I think we could have a robust policy proposal ready within weeks that would have a decent chance of community approval NicheSports (talk) 23:12, 24 October 2025 (UTC)[reply]

That’s why this isn’t trying to be a catch-all. It’s making a narrow statement of principle to back up current practices. Cremastra (talk · contribs) 23:13, 24 October 2025 (UTC)[reply]

There’s no rush. This is an issue that the community has been grappling with for several years. The way we make reasoned decisions is by discussing things, not trying to blast our individual proposals through consensus-building processes. voorts (talk/contributions) 23:13, 24 October 2025 (UTC)[reply]

I’m not trying to rush this. I’m just proposing a narrow statement of principle to back up current practices. This doesn’t need extensive discussion prior to the main discussion. We’re discussing it right now and will probably do so for at least a month. If I was rushing I’d be proposing some all-encompassing guideline that tries to cover everything AI, which is not this. This is only a proposal, and now the community can decide whether or not it’s a good one. What I object to is needlessly slowing things down with pre-discussions instead of just going ahead and having the main discussion. A narrow, two-sentence proposal doesn’t have much scope for “workshopping” anyway. Cremastra (talk · contribs) 23:14, 24 October 2025 (UTC)[reply]

No. I would explain the reasons, but I would use more time on that than what has been used to write this “proposal”. Really, just 5 lines? Even an AI-generated guideline against AIs would have done it better. Cambalachero (talk) 21:46, 24 October 2025 (UTC)[reply]

So the primary characteristic you look for in guidelines is length, rather than concision, efficacy, or accuracy? Good to know. I cordially direct you to WP:KISS. Cremastra (talk · contribs) 23:29, 24 October 2025 (UTC)[reply]
Let’s back out of this and workshop it first – This topic needs guidance, but the current proposal feels half baked, incomplete, and arbitrarily narrow. Tazerdadog (talk) 21:50, 24 October 2025 (UTC)[reply]

@Tazerdadog Did you read the FAQ?

This really isn’t very hard. We want to build a house. I’m trying to build a foundation. We really don’t need to waste breath “workshopping” every damn thing. For once, let’s do something simple.

Yes, it is incomplete!!! Again, did you read the FAQ? See my comment here. Let’s actually start building the house instead of just talking about it!! Cremastra (talk · contribs) 22:54, 24 October 2025 (UTC)[reply]
@Cremastra: are you amenable to workshopping this?
voorts (talk/contributions) 21:51, 24 October 2025 (UTC)[reply]
Plug to do this at WT:AIC? I have been mulling ideas around for a while, sounds like now may be the time

Policy options: total ban or llm use restricted to an autopatrolled-type user right

Enforcement: need to address valid concerns about having AI “snipe hunts”. Criteria for determining AI use should be as objective as possible, i.e. G15 criteria, repeated content verification failures, or unambiguous evidence that a user is not writing in their own voice. To what extent do we want to include guidelines like this in a proposed policy? What about blocking policy? 1LLM/2LLM? First block duration? Lots to consider.

Supporting data: do we want to present any data along with a policy proposal? Ideas here are mostly statistics from AfC and NPP declines, that will be the easiest to quantify

NicheSports (talk) 21:59, 24 October 2025 (UTC)[reply]

The one data point I have in mind is that 10% of AfC declines (3,026 out of 30,409) invoke the LLM decline criterion. This gives us a lower bound on the proportion of drafts with AI issues, since articles can have less clear-cut issues besides the one or two stated in the decline reason, and drafts deleted through G15 do not show up in the tracking categories. Chaotic Enby (talk · contribs) 23:33, 24 October 2025 (UTC)[reply]

Yep makes sense. I would also like to chart total AfC submissions by outcome over time. If we see a bump in AfC submissions and/or decline rate post 2022 while the active user base hasnt really changed then we can probably conclude something about LLM usage beyond just the # of AfC drafts declined while invoking the LLM criteria. I unfortunately don’t know how to find this data NicheSports (talk) 23:40, 24 October 2025 (UTC)[reply]

Since drafts that last got edited more than six months ago don’t show up on Category:Declined AfC submissions, it gives us what is effectively a six-month rolling window of declined drafts. While categories don’t keep a history of their page count, I can check if earlier versions have been archived on Wayback Machine to give us some past data. Chaotic Enby (talk · contribs) 23:59, 24 October 2025 (UTC)[reply]

As for G15s, wouldn’t admins have access to that data through the deletion log? NicheSports (talk) 23:44, 24 October 2025 (UTC)[reply]

VoortsWhat needs workshopping? It’s a very simple proposal that states a principle that I think many Wikipedians will agree with.

Our collective addiction to “workshopping” and “discussing” everything to death has paralyzed our decision-making skills.

I have presented a proposal for a guideline. It’s up to the community to say “yes, this represents our principles” or “no, this doesn’t represent our principles”. Cremastra (talk · contribs) 22:53, 24 October 2025 (UTC)[reply]

It’s an extreme proposal that might be susceptible to many exceptions. Does your proposal ban using AI spell correction? Does it ban turning a list of well-researched, bulleted notes into an article using an LLM and then checking the LLM output to ensure everything is correct? If so, why? Also, what does this proposal accomplish? We already delete articles that have objective indicia that they were generated with an LLM and not checked and we already block editors that indiscriminately rely on LLMs. voorts (talk/contributions) 22:58, 24 October 2025 (UTC)[reply]

Does your proposal ban using AI spell correction? Does it ban turning a list of well-researched, bulleted notes into an article using an LLM and then checking the LLM output to ensure everything is correct? If so, why? As stated in the FAQ, it does not. It is in fact very narrow, focusing only on articles wholly generated by AI from scratch. Being against that is not an extreme position.

Also, what does this proposal accomplish? We already delete articles that have objective indicia that they were generated with an LLM and not checked and we already block editors that indiscriminately rely on LLMs. Because we should have a statement of position as well. We do these things, but we don’t base them on any underlying guideline. In contrast, we do plenty of practical things that all rely fundamentally on, say WP:NPOV or WP:V. One might say: “why do we need WP:NPOV? We’re already committed to the neutral point of view in our articles; we have cleanup tags and noticeboards for it.” Well, those practical things have to rest on some kind of foundation. Cremastra (talk · contribs) 23:04, 24 October 2025 (UTC)[reply]

Also RE your comment here, I just restarted WP:PROJPOL. I’m not opposed to short PAGs. I’m opposed to jumping the gun. voorts (talk/contributions) 23:04, 24 October 2025 (UTC)[reply]

Yes in principle and support the urgency, but I’d like to see it workshopped a little more. I love concise guidelines, this only needs a couple more sentences and to point to WP:LLM#Usage. I know this is a big issue over at AFC, but what’s the substantial difference between banning using AI to generate new articles from scratch but not content from scratch in general? Imo what we need is a guideline banning uncritically using LLMs to generate content from scratch, and have an essay which gives editors tips on how they can use AI to help them (ie. personally understand the topic, find sources, spell/grammar check etc.).

Kowal2701 (talk) 22:27, 24 October 2025 (UTC)[reply]

Yes, although I agree with Kowal2701 that this can also be extended to generating new content from scratch in general. This guideline doesn’t specify any detection mechanisms, but that makes sense, as that isn’t the purpose of a guideline. In the same way as we can agree to ban undisclosed paid editing or sockpuppetry without having to make an exhaustive list of all the ways they could be spotted. These will naturally be developed and evolve alongside AI technologies themselves, and we obviously shouldn’t accuse people based on “tells” without any concrete evidence. Chaotic Enby (talk · contribs) 23:17, 24 October 2025 (UTC)[reply]

This guideline doesn’t specify any detection mechanisms, but that makes sense, as that isn’t the purpose of a guideline. In the same way as we can agree to ban undisclosed paid editing or sockpuppetry without having to make an exhaustive list of all the ways they could be spotted. Yes, this, exactly. Thank you. Cremastra (talk · contribs) 23:18, 24 October 2025 (UTC)[reply]

To note: this is my current position based on the signal-to-noise ratio of existing AI models. If, in 2 or 5 years, new models were capable of generating Wikipedia articles of acceptable quality with no significant issues, I would be open to revising the guideline to something like *Large language models should not be used to generate new Wikipedia articles without human review. Chaotic Enby (talk · contribs) 23:28, 24 October 2025 (UTC)[reply]

Yes, why not. I like this unique approach of building a policy/guideline page from the ground up and expanding it further based on talk page discussions. The time and effort that will be used to workshop this RfC are better spent discussing potential additions to the page instead once it becomes a guideline. Some1 (talk) 23:27, 24 October 2025 (UTC)[reply]
Yes in principle. I suggest changing the page title to Wikipedia:Creating articles with large language models. My main worry is that if this WP:PROPOSAL succeeds, this appropriately narrow page may both grow significantly over time and also get worse with every expansion. “Don’t do the whole article in LLM” is good. “Here are 67 different details that might help you spot a violation” is not good. WhatamIdoing (talk) 00:03, 25 October 2025 (UTC)[reply]

@@ Line 63: / Line 63: @@
 ::{{tqq|This guideline doesn’t specify any detection mechanisms, but that makes sense, as that isn’t the purpose of a guideline. In the same way as we can agree to ban undisclosed paid editing or sockpuppetry without having to make an exhaustive list of all the ways they could be spotted.}} Yes, this, exactly. Thank you. [[User:Cremastra|Cremastra]] ([[User talk:Cremastra|talk]] <b>·</b> [[Special:Contribs/Cremastra|contribs]]) 23:18, 24 October 2025 (UTC)
 ::To note: this is my current position based on the signal-to-noise ratio of existing AI models. If, in 2 or 5 years, new models were capable of generating Wikipedia articles of acceptable quality with no significant issues, I would be open to revising the guideline to something like {{xt|*Large language models should not be used to generate new Wikipedia articles without human review.}} [[User:Chaotic Enby|<span style=”color:#8a7500″>Chaotic <span style=”color:#9e5cb1″>Enby</span></span>]] ([[User talk:Chaotic Enby|talk]] · [[Special:Contributions/Chaotic Enby|contribs]]) 23:28, 24 October 2025 (UTC)
 *”’Yes”’, why not. I like this unique approach of building a policy/guideline page from the ground up and expanding it further based on talk page discussions. The time and effort that will be used to workshop this RfC are better spent discussing potential additions to the page instead once it becomes a guideline. [[User:Some1|Some1]] ([[User talk:Some1|talk]]) 23:27, 24 October 2025 (UTC)
+* ”’Yes”’ in principle.  I suggest changing the page title to [[Wikipedia:Creating articles with large language models]].  My main worry is that if this [[WP:PROPOSAL]] succeeds, this appropriately narrow page may both grow significantly over time and also get worse with every expansion.  “Don’t do ”the whole article” in LLM” is good.  “Here are 67 different details that might help you spot a violation” is not good.  [[User:WhatamIdoing|WhatamIdoing]] ([[User talk:WhatamIdoing|talk]]) 00:03, 25 October 2025 (UTC)

Latest revision as of 00:03, 25 October 2025

Must Read

Leave a Comment Cancel Reply