You are right, Flash Player does not support mixing of incoming audio streams and it does not support modification of outgoing RTMFP stream.
Streams can be mixed in your sound card, so if you want to mix them as you metioned, you should play streams to loudspeakers and microphone. But it may give acoustic echo effect. Testing needed.
The best way to get conference – usage of Confbridge or MeetMe Asterisk conference server or other software which mixes calls audio and video into one conference room.
Such conference can be managed via API and you will be able any logic for such conference application.
Moreover, the conference room mixes several streams into one audio/video stream. It significally decreases bandwidth usage.
So working scheme is: FlashphonerClient – FlashphonerServer – Asterisk(Confbridge).