Fine-tue an LLM model for triplet extraction
Do you want to fine-tune an LLM model for triplet extraction?
These findings from a recently published paper (first comment) could save you much time.
✅ Does the choice of coding vs natural language prompts significantly impact performance? When fine-tuning these open weights and small LLMs, the choice between code and natural language prompts has a limited impact on performance.
✅ Does training fine-tuned models to include chain-of-thought (rationale) sections in their outputs improve KG construction (KGC) performance? It is ineffective at best and highly detrimental at worst for fine-tuned models. This performance decrease is observed regardless of the number of in-context learning examples provided. Attention analysis suggests this might be due to the model's attention being dispersed on redundant information when rationale is used. Without rationale lists occupying prompt space, the model's attention can focus directly on the ICL examples while extracting relations.
✅ How do the fine-tuned smaller, open-weight LLMs perform compared to the CodeKGC baseline, which uses larger, closed-source models (GPT-3.5)? The selected lightweight LLMs significantly outperform the much larger CodeKGC baseline after fine-tuning. The best fine-tuned models improve upon the CodeKGC baseline by as much as 15–20 absolute F1 points across the dataset.
✅ Does model size matter for KGC performance when fine-tuning with a small amount of training data? Yes, but not in a straightforward way. The 70 B-parameter versions yielded worse results than the 1B, 3B, and 8B models when undergoing the same small amount of training. This implies that for KGC with limited fine-tuning, smaller models can perform better than much larger ones.
✅ For instruction-tuned models without fine-tuning, does prompt language or rationale help? For models without fine-tuning, using code prompts generally yields the best results for both code LLMs and the Mistral natural language model. In addition, using rationale generally seems to help these models, with most of the best results obtained when including rationale lists in the prompt.
✅ What do the errors made by the models suggest about the difficulty of the KGC task? difficulty in predicting relations, entities, and their order, especially when dealing with specialized terminology or specific domain knowledge, which poses a challenge even after fine-tuning. Some errors include adding superfluous adjectives or mistaking entity instances for class names.
✅ What is the impact of the number of in-context learning (ICL) examples during fine-tuning? The greatest performance benefit is obtained when moving from 0 to 3 ICL examples. However, additional ICL examples beyond 3 do not lead to any significant performance delta and can even lead to worse results. This further indicates that the fine-tuning process itself is the primary driver of performance gain, allowing the model to learn the task from the input text and target output.
fine-tune an LLM model for triplet extraction