Weekly Update: ML and Data Processing

21 Dec 2024

December 21st: Recap

This is a more specific dive into what we have been up to since Milestone 3.
For more specific changes, feel free to visit our Github. Although note; much back-end work is local & research/testing based.

Key Outline:

Fleshed out Query2Latex (Q2L) for future use in physical files (q/a, practice in pdf)
Experimenting with local inference to accumulate a synthetic db of a initial given RAG db
Technical research / outreach for our purposes in terms of best approach of ‘tuning’
Set up a more flexible endpoint for testing purposes, with future scalability plans in mind
Successfully implemented an early-stage on-site ‘IDE’ for student programming purposes

Detailed Highlights:

Try KORA:

IDE: Kahu implemented the new endpoints + model with his front-end for a large improvement in overall capabilities
- Accuracy and better handling generative test-cases for respective code.
- Working towards supporting more lang (py, java, js, cpp) for now

ML Research and Data Processing:

Conducted ongoing testing for technical approach of data-fitting for a given set (especially small in size)
- Fine-tuning on a small given dataset is on the surface a similar behaviour to a large RAG approach (this is being tested)
Improved Q2L pipeline to work near full-auto for later implementing
- Merging cleaning & compilation processes; now leveraging a document-familiar model to handle parsing, checkups etc.
Adding localisation of latex templates to specific uni’s is planned & possible VUW example

This update is a reminder of how continued progress drives us behind the scenes while we work toward Milestone 4

Stay tuned!