The recent roguelike celebration game design conference was organized as a text based MUD. Participants attended using their browser by walking around a roguelike map. When they entered a room with an ongoing talk they would see an embedded livestream of the speakers.
I really like this idea but it seems like it could be better if they didn't limit the livestreams to particular speakers. You could keep the livestreams in the conference rooms but make it so everyone can livestream by proximity when outside the designated rooms - like a virtual proximity based chat-roulette. Being able to walk up to two people already in conversation, see them and hear what they're saying, and having the option to join in seems to be within the bounds of the tools we have today.
We already have something like this in modern video games (see for example phasmophobia) but MMOs always limit the number of participants on a single server because of the necessity of showing nice graphics and keeping everyone in sync. Our videoconferencing tools seem to be able to handle 40+ participants in a virtual conference without issue. So the problem comes down to checking proximity, making connections, and deciding when to enable audio. It's probably not an easy problem but it's worth investigating and I wish I had more time to learn about this technology because low-hanging fruit abounds.