Weekly Update: ML and Data Processing
December 21st: Recap
This is a more specific dive into what we have been up to since Milestone 3.
For more specific changes, feel free to visit our Github. Although note; much back-end work is local & research/testing based.
Key Outline:
- Fleshed out Query2Latex (Q2L) for future use in physical files (q/a, practice in pdf)
- Experimenting with local inference to accumulate a synthetic db of a initial given RAG db
- Technical research / outreach for our purposes in terms of best approach of ‘tuning’
- Set up a more flexible endpoint for testing purposes, with future scalability plans in mind
- Successfully implemented an early-stage on-site ‘IDE’ for student programming purposes
Detailed Highlights:
Try KORA:
- IDE: Kahu implemented the new endpoints + model with his front-end for a large improvement in overall capabilities
- Accuracy and better handling generative test-cases for respective code.
- Working towards supporting more lang (py, java, js, cpp) for now
ML Research and Data Processing:
- Conducted ongoing testing for technical approach of data-fitting for a given set (especially small in size)
- Fine-tuning on a small given dataset is on the surface a similar behaviour to a large RAG approach (this is being tested)
- Improved Q2L pipeline to work near full-auto for later implementing
- Merging cleaning & compilation processes; now leveraging a document-familiar model to handle parsing, checkups etc.
-
Adding localisation of latex templates to specific uni’s is planned & possible VUW example
This update is a reminder of how continued progress drives us behind the scenes while we work toward Milestone 4
Stay tuned!