Paste text, pick a voice, and press Speak. Microsoft/system voices use real boundary events when available. Google voices use a fallback lip-sync if those events do not arrive.
If a Google/Chrome voice does not provide real word events, the mouth will still move using an estimated timing model.