1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
|
HOW AUDIO EMULATION WORKS IN QEMU:
==================================
Things are a bit tricky, but here's a rough description:
QEMUSoundCard: models a given emulated sound card
SWVoiceOut: models an audio output from a QEMUSoundCard
SWVoiceIn: models an audio input from a QEMUSoundCard
HWVoiceOut: models an audio output (backend) on the host.
HWVoiceIn: models an audio input (backend) on the host.
Each voice can have its own settings in terms of sample size, endianess, rate, etc...
Emulation for a given soundcard typically does:
1/ Create a QEMUSoundCard object and register it with AUD_register_card()
2/ For each emulated output, call AUD_open_out() to create a SWVoiceOut object.
3/ For each emulated input, call AUD_open_in() to create a SWVoiceIn object.
Note that you must pass a callback function to AUD_open_out() and AUD_open_in();
more on this later.
Each SWVoiceOut is associated to a single HWVoiceOut, each SWVoiceIn is
associated to a single HWVoiceIn.
However you can have several SWVoiceOut associated to the same HWVoiceOut
(same thing for SWVoiceIn/HWVoiceIn).
SOUND PLAYBACK DETAILS:
=======================
Each HWVoiceOut has the following too:
- A fixed-size circular buffer of stereo samples (for stereo).
whose format is either floats or int64_t per sample (depending on build
configuration).
- A 'samples' field giving the (constant) number of sample pairs in the stereo buffer.
- A target conversion function, called 'clip()' that is used to read from the stereo
buffer and write into a platform-specific sound buffers (e.g. WinWave-managed buffers
on Windows).
- A 'rpos' offset into the circular buffer which tells where to read the next samples
from the stereo buffer for the next conversion through 'clip'.
|<----------------- samples ----------------------->|
| |
| rpos |
|
|_______v___________________________________________|
| | |
| | |
|_______|___________________________________________|
- A 'run_out' method that is called each time to tell the output backend to
send samples from the stereo buffer to the host sound card/server. This method
shall also modify 'rpos' and returns the number of samples 'played'. A more detailed
description of this process appears below.
- A 'write' method callback used to write a buffer of emulated sound samples from
a SWVoiceOut into the stereo buffer. Currently all backends simply call the generic
function audio_pcm_sw_write() to implement this.
According to malc, the audio sub-system's original author, this is to allow
a backend to use a platform-specific function to do the same thing if available.
(Similarly, all input backends have a 'read' methods which simply calls 'audio_pcm_sw_read')
Each SWVoiceOut has the following:
- a 'conv()' function used to read sound samples from the emulated sound card and
copy/mix them to the corresponding HWVoiceOut's stereo buffer.
- a 'total_hw_samples_mixed' which correspond to the number of samples that have
already been mixed into the target HWVoiceOut stereo buffer (starting from the
HWVoiceOut's 'rpos' offset). NOTE: this is a count of samples in the HWVoiceOut
stereo buffer, not emulated hardware sound samples, which can have different
properties (frequency, size, endianess).
______________
| |
| SWVoiceOut2 |
|______________|
______________ |
| | |
| SWVoiceOut1 | | thsm<N> := total_hw_samples_mixed
|______________| | for SWVoiceOut<N>
| |
| |
|<-----|------------thsm2-->|
| | |
|<---thsm1-------->| |
_______|__________________v________|_______________
| |111111111111111111| v |
| |222222222222222222222222222| |
|_______|___________________________________________|
^
| HWVoiceOut stereo buffer
rpos
- a 'ratio' value, which is the ratio of the target HWVoiceOut's frequency by
the SWVoiceOut's frequency, multiplied by (1 << 32), as a 64-bit integer.
So, if the HWVoiceOut has a frequency of 44kHz, and the SWVoiceOut has a frequency
of 11kHz, then ratio will be (44/11*(1 << 32)) = 0x4_0000_0000
- a callback provided by the emulated hardware when the SWVoiceOut is created.
This function is used to mix the SWVoiceOut's samples into the target
HWVoiceOut stereo buffer (it must also perform frequency interpolation,
volume adjustment, etc..).
This callback normally calls another helper functions in the audio subsystem
(AUD_write()) to to the mixing/volume-adjustment from emulated hardware sample
buffers.
Here's a small graphics that explains it better:
SWVoiceOut: emulated hardware sound buffers:
|
| (mixed through AUD_write() called from user-provided
| callback which is itself called on each audio timer
| tick).
v
HWVoiceOut: stereo sample circular buffer
|
| (sent through HWVoiceOut's 'clip' function, which is
| invoked from the 'run_out' method, also called on each
| audio timer tick)
v
backend-specific sound buffers
The function audio_timer() in audio/audio.c is called periodically and it is used as
a pulse to perform sound buffer transfers and mixing. More specifically for audio
output voices:
- For each HWVoiceOut, find the number of active SWVoiceOut, and the minimum number
of 'total_hw_samples_mixed' that have already been written to the buffer. We will
call this value the number of 'live' samples in the stereo buffer.
- if 'live' is 0, call the callback of each active SWVoiceOut to fill the stereo
buffer, if needed, then exit.
- otherwise, call the 'run_out' method of the HWVoiceOut object. This will change
the value of 'rpos' and return the number of samples played. Then the
'total_hw_samples_mixed' field of all active SWVoiceOuts is decremented by
'played', and the callback is called to re-fill the stereo buffer.
It's important to note that the SWVoiceOut callback:
- takes a 'free' parameter which is the number of stereo sound samples that can
be sent to the hardware stereo buffer (before rate adjustment, i.e. not the number
of sound samples in the SWVoiceOut emulated hardware sound buffer).
- must call AUD_write(sw, buff, count), where 'buff' points to emulated sound
samples, and their 'count', which must be <= the 'free' parameter.
- the implementation of AUD_write() will call the 'write' method of the target
HWVoiceOut, which in turns calls the function audio_pcm_sw_write() which does
standard rate/volume adjustment before mixing the conversion into the target
stereo buffer. It also increases the 'total_hw_samples_mixed' value of the
SWVoiceOut.
- audio_pcm_sw_write() returns the number of sound sample *bytes* that have
been mixed into the stereo buffer, and so does AUD_write().
So, in the end, we have the pseudo-code:
every sound timer ticks:
for hw in list_HWVoiceOut:
live = MIN([sw.total_hw_samples_mixed for sw in hw.list_SWVoiceOut ])
if live > 0:
played = hw.run_out(live)
for sw in hw.list_SWVoiceOut:
sw.total_hw_samples_mixed -= played
for sw in hw.list_SWVoiceOut:
free = hw.samples - sw.total_hw_samples_mixed
if free > 0:
sw.callback(sw, free)
SOUND RECORDING DETAILS:
========================
Things are similar but in reverse order. I.e. the HWVoiceIn acquires sound samples
in its stereo sound buffer, and the SWVoiceIn objects must consume them as soon as
they can.
|