GPT style language models work with tokens rather than plain text. Tokens are numbers that represent small pieces of text.
Tokens are on average 4-5 characters long, but many common words are their own token. Some characters, such as emoji and characters from non-english character sets, may be made up of multiple tokens.
Different Models use different tokenizers. A list of which tokenizer each model uses can be seen