Two weeks ago, a UK software developer spotted fuel stations plotted in the Indian Ocean and electric vehicle statistics that had dropped by a factor of 1,000 overnight. Both datasets were published by government bodies and major institutions. Both had glaring, obvious errors that five minutes of validation would have caught. Neither was corrected until public embarrassment forced the issue.

Dispatch

LONDON, 29 MARCH 2026 — The problem surfaces cleanest through a developer's eye. On Successful Software, a platform focused on data wrangling tools, the author (a practitioner, not a polemicist) documented two institutional failures that reveal a systemic collapse in data governance:

A quick plot of the latitude and longitude shows some clear outliers… some of these UK fuel stations are apparently located in the Indian and South Atlantic oceans. In at least one case, it looks like they got the latitude and longitude the wrong way around.

The same dataset showed fuel price ratios of 1,538:1 between the most expensive and cheapest litre — a number that defies physical reality and basic sanity-checking.
Successful Software, 29 March 2026 ^[1]

📷 Image via Hacker News Front Page · Reproduced for editorial reference under fair use

The UK government's fuel finder data — a downloadable CSV file intended as a public resource during Middle East fuel supply tensions — had been live for weeks. The developer reported the errors on 22 March. The government acknowledged receipt on 24 March. By 29 March, the same corrupted file remained published without correction. ^[1]

The second case compounds the indictment. The RAC (Royal Automobile Club), a major UK motoring organisation, published a report on electric vehicles. Its opening graph showed battery electric vehicles on UK roads plummeting from 1.4 million in 2024 to 0.0017 million in 2025 — a collapse of 99.9 per cent. ^[1]

Did the number of Battery Electric Vehicles on the UK's roads suddenly drop from ~1.4 million in 2024 to ~0.0017 million in 2025? What happened to those ~1.4 million vehicles? I'm guessing that someone got their thousands and millions mixed up.
Successful Software, 29 March 2026 ^[1]

The error is elementary. Thousands and millions are not interchangeable. The actual EV fleet has grown, not evaporated. Yet the report circulated with this error embedded in its headline visual.

A parallel concern emerges from Guardian reporting on data integrity in polling. In March 2026, researchers discovered that fraudulent church attendance data — generated by automated tools and paid survey participants — had contaminated polling datasets. The contamination was systematic enough to create false narratives about religious revival in Britain. ^[2]

Experts say paid participants are using automated tools to generate unreliable survey responses at scale.
The Guardian, 28 March 2026 ^[2]

The difference is instructive: the church data fraud was caught because it produced implausible results. EV numbers collapsing by 99.9 per cent should trigger the same alarm. It did not, because no one was checking.

What's Really Happening

Confirmed: UK government and RAC data both contained elementary errors that would fail any basic validation check (range tests, outlier detection, unit consistency). ^[1]

Confirmed: The government acknowledged the fuel data error but made no public correction within one week. ^[1]

Structural cause: Data submission is decentralised (fuel stations self-report; survey participants self-report), but validation is either absent or performed by people without mathematical literacy or domain knowledge. ^[1]

Confirmed: This pattern mirrors a broader trend: AI-generated or AI-contaminated datasets are now circulating in training pipelines without human verification, creating feedback loops where errors compound. ^[2]

One thing missed: Institutions are publishing data without asking a single question: "Does this pass a sanity check?" No range validation. No outlier flagging. No peer review before release.

Stop Garbage Data... — Stock photo · For illustration only

The Real Stakes

The erosion of institutional data credibility has three cascading consequences.

First, decision-makers make bad calls. During an energy crisis, fuel price data that is off by orders of magnitude feeds into policy models. A government energy advisor using the RAC EV dataset would conclude that the electric vehicle transition has reversed catastrophically — prompting entirely wrong resource allocation. Confirmed: the fuel dataset was published during the current conflict in the Middle East when energy data directly shapes emergency planning. ^[1]

Second, trust in institutions evaporates. When citizens spot obvious errors in government data and nothing happens for a week, they conclude the institution is either incompetent or indifferent. Neither conclusion is recoverable. A fund manager reviewing UK energy infrastructure data will now discount government sources and rely instead on private vendors — increasing information asymmetry and cost. A policy advisor will hedge their recommendations because the underlying data is suspect.

Third, and most urgent: the AI feedback loop. The Guardian investigation documented how fraudulent survey data, once published, gets ingested into training datasets for language models. Those models then serve back the fraudulent patterns to new users, who treat the AI-generated output as validation of the original error. ^[2] The developer on Successful Software named this explicitly: I fear we are heading for a future where LLMs generate data, which people don't bother to properly check. This data is then used train LLMs. The error is then much harder to spot once it is served back without the original source by LLMs. A slop-apocalypse. ^[1]

This is not hypothetical. It is already happening. The contamination of church attendance data through AI-generated survey responses is the prototype. ^[2]

Industry Context

The root cause is not malice. It is institutional collapse of responsibility.

Data submission is decentralised: fuel stations enter their own prices; survey participants answer their own questions. This is unavoidable. But validation — the gate between raw input and publication — has been either eliminated or delegated to junior staff with no domain expertise or statistical training.

The RAC error (thousands vs. millions) suggests the graph was generated by someone who never looked at the output. The government's week-long non-response suggests no one owns the fuel dataset — it exists in limbo between departments, acknowledged but not claimed.

This is a staffing and accountability problem, not a technical one. Validation is not hard. It is boring. It is not rewarded. It is not celebrated. A data engineer who catches an error before publication gets no credit. A data engineer whose error reaches the public gets fired. The incentive structure is perverse.

Meanwhile, the pressure to publish at scale and speed is relentless. Government open data initiatives are measured by volume: how many datasets published, how many downloads. Quality is invisible in the metrics. RAC reports are measured by engagement: how many clicks, how many shares. A graph with an obvious error that triggers outrage performs better than a corrected one that nobody notices.

Impact Radar

Economic Impact: 7/10 — Energy and transport policy decisions rest on this data. Misallocation of resources during a supply crisis has direct cost. ^[1]

Geopolitical Impact: 3/10 — No cross-border implications in the immediate sources, though data integrity failures in energy reporting could affect international energy markets indirectly.

Technology Impact: 8/10 — The AI feedback loop means corrupted data now poisons training datasets at scale. ^[2] This is a systemic threat to model reliability.

Social Impact: 6/10 — Trust in institutions erodes when citizens spot obvious errors and nothing happens. ^[1]

Policy Impact: 7/10 — Policy-makers downstream of corrupted data will make worse decisions. ^[1]

Watch For

1. Does the UK government publish a corrected fuel dataset within 30 days, with a public statement on validation procedures? If not, it signals that data governance remains a non-priority even after public embarrassment. ^[1]

2. Do major news outlets (BBC, Financial Times, The Guardian) begin systematically fact-checking institutional datasets before citing them? If this becomes standard practice, it will force institutions to validate before publishing. If it does not, corrupted data will continue circulating unchecked.

3. Does any major AI lab (OpenAI, Anthropic, DeepMind) publish a framework for detecting and filtering contaminated training data? This is now urgent. The church data case proves the problem is live. ^[2]

Bottom Line

Institutions are publishing data without the most basic validation checks, then ignoring correction requests for weeks. This is not a technical problem — it is a governance failure. And because AI models now ingest this corrupted data at scale, the errors compound and become harder to trace. The fix is simple: hire people whose job is to ask "Does this make sense?" before anything goes live. The fact that this is not already standard practice is the real scandal.

---

AI Translation (日本語) — For reference only. English version is authoritative.

停止するゴミデータ...

政府と機関のデータが基本的な清掃チェックに引っかかっていることが明らかになった。誰も結果に対する責任を取ろうとしていない。

二週間前に、英国のソフトウェア開発者がインド洋に燃料ステーションが配置されていると、電気自動車の統計が一夜間に 1,000 倍に急落したと指摘した。これらのデータは政府体や主要な機関によって公開されていた。両方のデータセットは、五分で確認すれば見過ごせない明確なエラーを含んでいた。しかし、いずれも公衆の恥ずかしさが原因で修正された。

速報

2026年3月29日ロンドン — これは開発者の目から最もクリアに見える。データの整理ツールを提供するプラットフォームSuccessful Softwareの著者が、機関がデータ統制におけるシステム的な崩壊を明らかにした。

緯度と経度の直線的なプロットから、いくつかの明らかに異常なデータ点が浮かび上がっている… これらの英国の燃料ステーションは、明らかにインド洋や南大西洋海洋で位置しているように見える。一部のケースでは、緯度と経度が逆方向に設定されているように見える。

同一のデータセットは、最も高価な燃料と最安値の燃料の比が 1,538:1 に急落していることを示しており、物理的実現性と基本的な sanity check を破る数値である。
Successful Software, 2026年3月29日 ^[1]

英国政府の燃料探索データ — 中東における燃料供給緊張時に公開されたダウンロード可能な CSV ファイル — は数週間前に公開されていた。開発者は 2026年3月22日にエラーを報告し、政府は 2026年3月24日に受領を確認した。しかし、2026年3月29日には依然として修正されていない同一の破損ファイルが公開されていた。 ^[1]

第二のケースは、主要な英国自動車団体 RAC（Royal Automobile Club）による電気自動車の報告書を反映している。その開頭グラフは、2024年から2025年にかけて電気自動車のUK道路における数が1,400万から0.017万に急落していることを示しており、これは99.9パーセントの減少である。 ^[1]

電気自動車が2024年に約1,400万台から2025年に約0.017万台に急落したのはなぜなのか？これらの約1,400万台の車両は何が起こったのか？おそらく、数千と百万を間違えた可能性がある。
Successful Software, 2026年3月29日 ^[1]

このエラーは簡単である。数千と百万は互換性がない。電気自動車のfleetは成長しただけでなく、蒸发したわけではない。しかし、このエラーが報告書のタイトルグラフに嵌め込まれている。

統計的不確実性に関する Guardian の報道から、調査の質が汚染されていることが明らかになった。2026年3月、研究者は偽の教会訪問データ — 自動化ツールと有料調査参加者の間で生成された不確実な調査応答が統計データセットに汚染されていることを発見した。この汚染は、スコットランドの宗教復興を説明する偽の narrative を生み出すためのシステム的である。 ^[2]

専門家は、有料参加者が自動化ツールを用いて大規模な不確実な調査応答を作成していると述べている。
Guardian, 2026年3月28日 ^[2]

この違いは教訓となる：教会データの詐欺は、不確実な結果を生み出すため、電気自動車のデータが99.9パーセント減少した場合でも同じ警告を引き起こさなかった。これは誰もチェックしていないためである。

現実の状況

確認済み： UK 政府と RAC のデータは、基本的なバリデーションチェックに失敗したエレクトファクタルエラーを含んでいた。 (範囲テスト、異常値検出、単位の一致性) ^[1]

確認済み： 政府は燃料データのエラーを認めたが、一周間以内に公的修正を行わなかった。 ^[1]

構造的な原因： データの提出は分散化されている（燃料ステーションが自己報告；調査参加者が自己報告）が、バリデーションは存在しないか、数学的知識や専門知識を持たない人々によって行われている。 ^[1]

確認済み： このパターンは、AI 生成または AI 污染されたデータセットがトレーニングパイプラインに無人で流通し、エラーの複合を引き起こす broader 趋勢に一致している。 ^[2]

一つの欠点： institutions はデータを発行する前にこれは sanity check に合格していますか？という質問を一つもしていない。範囲の確認なし、異常値のフラグ付けなし、発行前にペアレビューなし。

現実のリスク

データ統制の信頼性の低下は、三つの連鎖的な結果を生み出す。

最初に： 議決者が間違った決定を行う。エネルギー危機中、燃料価格データのオーダー級の誤差は政策モデルにフィードバックされる。中東におけるエネルギー供給緊張時に RAC の電気自動車データを使用した政府のエネルギーアドバイザーは、電気自動車の移行が急落していると結論付け、これにより資源配分が間違った方向へ動く。確認済み：燃料データは現在の中東におけるエネルギー供給緊張中に公開された。 ^[1]

第二に： institutions の信用は消えている。政府データの明らかで明確なエラーが見つかると、一周間以内に何も起こらないため、 institutions は無能であるか indifference であると結論付けられる。この結論は回復不可能である。基金管理者が UK のエネルギーインフラストラクチャのデータをレビューする際は、政府ソースを使用せずプライベートベンダーに依存することになる。情報不対称性とコストが増加する。政策アドバイザーは推奨を保証するために、基礎的なデータの不確実性に対する懸念を考慮する。

第三に： 最優先事項は AI のフィードバックループである。 Guardian の調査は、偽の調査データが発行されると訓練データセットにフィードバックされるように、AI 生成のデータは新しいユーザーに不確実なパターンを提供し、AI 生成の出力が最初のエラーを確認するための証拠として扱われることを明らかにした。 ^[2] この開癬は明確に命名された：私はロボット大規模言語モデルがデータを生成し、それを適切にチェックしない場合、これが新たなユーザーにとって AI 生成の出力が最初のエラーを確認するための証拠として扱われるという予想外の事態に直面していると懸念しています。 ^[1]

これは仮想ではない。すでに起こっている。AI 生成の調査データを通じて汚染されたデータが発生した教会訪問データは原型である。 ^[2]

産業の背景

原因は善意ではなく、責任感の喪失である。

データの提出は分散化されている：燃料ステーションが自ら価格を報告し、調査参加者が自己の質問に答える。これは避けられない。しかし、バリデーションは存在しないか、専門知識や統計的訓練を持たない人々によって行われている。

RAC のエラー（数千と百万）は、グラフを生成した人が結果を見たことがないことを示している。政府の一周間の無応答は、燃料データが誰も所有していないことを示している。政府部門間で認識されつつも、誰も所有していない状態にある。

これは人員と責任感の問題であり、技術的な問題ではない。バリデーションは難しいことではなく、退屈な仕事である。報酬は見られない。祝賀は行われない。データエンジニアが発行前にエラーを確認した場合でも、誰も信用を得る。データエンジニアが発行後に修正を要求されると、解雇される。

この報酬の構造は歪曲である。

一方、大規模な発行と速さへの圧力は続けられる。政府のオープンデータイニatives は、データセットの数とダウンロードの数を測定するための指標で評価される。質は指標から見えない。

RAC の報告はエンゲージメントを測定する：クリック数、共有数。異常値が浮かび上がるグラフは、明確なエラーを示すグラフよりも多くのクリックと共有を得る。

影響のレーダーシート

経済的影響： 7/10 — 能源と輸送政策の決定はこのデータに基づく。供給 Crisis における資源配分の誤差は直接的なコストをもたらす。

地政学的影響： 3/10 — 緊急のエネルギー報道におけるデータ統制の不確実性は、国際的なエネルギー市場に影響を及ぼす可能性があるが、直接の影響はない。

テクノロジ的影響： 8/10 — AI のフィードバックループは、汚染データをスケールでトレーニングデータセットに注入し、モデルの信頼性を脅かす。

社会的影響： 6/10 — 議論の明らかなエラーが見つかると、何も起こらないため、 institutions の信用は消えている。

政策的影響： 7/10 — 複合的なデータの汚染は、政策決定者を引き込む。 ^[1]

監視すべき

1. **UK 政府は、30 日以内に修正された燃料データを公開し、バリデーションプロセスの声明を公的に発表しますか？そうでなければ、データ統制は緊張状態の後でも優先順位が低いことを示しています。 ^[1]

2. **主要なニュース outlets (BBC、Financial Times、The Guardian) は、引用する機関のデータを事前に検証するためのシステム的な方法を始めるか？それがない場合、汚染データは無視され続けます。 ^[1]

3. **主要な AI ラボ (OpenAI、Anthropic、DeepMind) は汚染データを検出およびフィルタリングするためのフレームワークを公開しますか？これは緊急性が高まっています。教会データの例は問題が存在していることを証明しています。 ^[2]

最終的な結論

institutions はデータを無視し、修正要求を無視して数週間発行している。これは技術的な問題ではなく、責任感の喪失によるガバナンスの失敗である。また、AI モデルがスケールで汚染データをトレーニングするため、エラーは複合し、より難解になる。修正は簡単である：発行前にこれは正しいですか？という質問を含むエンジニアの雇用が必要である。この状況が現在標準的な形で行われていないのは、本当の scandals である。

---

参考文献

^[1] Successful Software — Stop Publishing Garbage Data, It's Embarrassing (2026年3月29日)。URL: https://successfulsoftware.net/2026/03/29/stop-publishing-garbage-data-its-embarrassing:

^[2] Guardian — 'Our assumptions are broken': how fraudulent church data revealed AI's threat to polling (2026年3月28日)。URL: https://www.theguardian.com/technology/2026/mar/28/how-fraudulent-church-data-revealed-ais-threat-to-polling

📎 References & Source Archive All citations · Wayback Machine mirrors →

Stop Garbage Data...