In principle, a LLM like ChatGPT could be used to validate proofs. But for proof checking,
there's a huge difference between a system that works most of the time, and one that is
reliable all of the time (and perhaps even provably reliable)!
LLMs are designed to give answers that sound plausible, and “Yes, that sounds right
to me.” is a depressingly plausible answer to a long, complicated, and incorrect
proof.
More generally, even if you asked ChatGPT to translate the English-language proof into a
formal system like Coq or Lean and then had a computer verify the result, there's always
the question of hallucination: does the translation always preserve the meaning of the
input proof, or might the response to an incorrect input proof be a more plausible (i.e.,
corrected) Coq or Lean proof, or a correct proof for a different theorem?
In contrast, a theory-driven system for checking natural language proofs is likely to be
less flexible, but also much more predictable and with explanable output.