Project AIxandria

28 Feb 2025

"There is no political power without power over the archive"

I recently went to a physics teachers’ conference and found out that Colin Hopkins, a retired but still prominent and very active member of the community, has been working on something that I very much care about and want to see succeed. It is not only a fabulous idea that should have been done a long time ago, but also something that I would like to see implemented in every other subject.

For some background, Colin has been collecting past exam papers for longer than the education department has access to, and he has been sorting all of the questions from each of those papers out based on concepts/skills. For example with Newton’s Laws, he has further divided the subtopics down specifically to F=ma with constant acceleration, and coding each question for the question type (whether it is multiple choice, a calculation or an explanation question).

Colin is now at the stage where all the questions and answers are ready, and the next step he is working on is trying to find an easy way to let teachers convert their class results into a Guttman chart, analyze it for areas of weakness, and then link students to the relevant similar questions in his database for additional practice materials. I want to help him find an AI-assisted solution that can potentially automate part of this process.

Nice looking table that is factually incorrect

Output from ChatGPT after analysing a table of results involving 7 students and 25 questions.

I started by simply following his original idea, which was to give all of the data and instructions to ChatGPT to see what happens. Despite how many ideas I thought I had about what could work better, of course it was not so easy. Sometimes questions were missing and some students were not marked, or questions were marked but done incorrectly. By the end it even marked and provided results for students that didn't exist at all. Rubbish in, rubbish out.

The next step was to explore a few more advanced options, such as creating my own 'GPT' or using the API 'Assistants' in the hope of being able to create consistently useful outputs. I watched this video to get myself familiar with the general idea of what to do:

The idea was simple enough, and I had everything I needed to get started. I'll spare you the details, but in the end my long winding adventure into API keys, system instructions and rendering markdown formatting did not result in a useful solution. However, I did discover through the process that LLMs in general were still not reliable enough to handle large amounts of data accurately... and that it is very cheap to use if you know what to ask for. With the right setup, the current (2025) pricing of GPT-4o mini output is USD$0.60, measured by the millions of tokens, which will last for quite some time for most purposes. For comparison, the latest GPT-4.5 costs USD$150 per 1 million output tokens (250 times the price).

ChatGPT output of a badly formatted table

The output from ChatGPT is not necessarily wrong, but it might not be how one would expect a table to look like. Apparently this is perfectly fine Markdown formatting.

Eventually I decided to go back to basics and pursue the path of good old excel. With the help of AI again, I discovered that there was a way to excel formula everything I needed into the sheet, such that you only need to provide student responses and answers and it would take care of the rest. Neat!

Interestingly, the hurdle that took me the longest time to get over was the fact that Google Sheets does not allow for horizontal sorting with a formula. The Microsoft version of Excel in Office is slightly different and it has that function, so everything worked well after the migration. With the handy sort function on hand, the final step was to simply write out the formulas, format them nicely and they were good to go.

Seemingly well sorted Guttman chart with minimal effort!

Or so I thought! Life would be boring if things were so easy.

Turns out that as soon as Microsoft (or Google) forms receives a new response, the excel includes the new data by inserting a new row which shifts all other cells (including references) down. As long as something is outside of the automatically created OfficeForms.Table , even absolute references will get shifted down! I have never see this happen anywhere else, so there was a lot of testing and experimenting involved to find out what was happening.

The references to the rows get shifted down regardless of absolute referencing ($). In this case, row 6 and 7 should both show the number 5 (like in the E column) because that is what I actually have in the form responses, but it skipped my new response and continued down the number list (5, 6, 7...) like nothing happened.

Maintaining the references intact would normally be a good feature because everything stays the same as before and nothing gets messed up, except now I actually want the new responses to show up in my formulas. The only solution I could think of was to use a dynamic type of referencing, which is anything that does not involve naming the specific cells that you want. By using something like =OfficeForms.Table[Id] which will return the entire array from the Id column of the OfficeForms.Table table, any new form responses that get added to the table will show up as well. As a bonus side note: =OfficeForms.Table[#Data] is a special case that will just return only all rows and all columns in the table body, but it does not include the column headers.

Page updated

Google Sites

Report abuse

Project AIxandria

© Philomaths blog 2025