Introduction
Not too long ago I’ve been engaged on the domain-specific fine-tuning of a number of LLMs. The primary and perhaps an important a part of this job is to gather, scrape, and clear textual information to feed the LLM. I seen that my code was turning into messy with many repetitions, as a result of for each recognized supply I used to be writing a script from scratch which had loads of issues in frequent with different scripts in my codebase. I used to be not following the “Don’t repeat your self” (DRY) precept in any respect. Because of this I made a decision to implement the Template Design Sample and make my code base extra elegant and environment friendly.
The Template Design Sample
I gained’t repeat right here what a design sample is and the way we classify design patterns primarily based on their functionalities, since I’ve written many articles on the topic. In case you are involved in studying my earlier articles on this matter I’ll go away some references on the finish.
On this article, I’ll present you an instance associated to information processing. Let’s say that in our challenge we’ve got to take care of completely different sorts of information that we need to analyze. A few of these information are…