Some of DeepSeek-R1 and o3 (high) "Failures" on ARC eval test