THE NLP Journey : CHAPTER 2
In our final dialogue, we coated the important preprocessing steps required earlier than passing knowledge to a mannequin. These foundational steps assist clear and standardize textual content knowledge. However what if you wish to take your mannequin to the following stage? At this time, we’re diving into some superior NLP preprocessing strategies that can provide your mannequin an edge over others!
Think about we’re engaged on a job to find out whether or not two given questions are duplicates. Easy preprocessing strategies may not be sufficient, we want superior methods to seize nuanced patterns. Let’s discover how we will enhance our textual content preprocessing with smarter function engineering.
Increasing Shortened Phrases
Typically, individuals use contractions in writing, like isn’t as an alternative of isn’t, or received’t as an alternative of won’t. An excellent preprocessing step is to develop these phrases to their full type to standardize the textual content. For instance:
- “Aren’t they coming?” → “Aren’t they coming?”
By changing such phrases with their full type, we guarantee higher consistency throughout our dataset.