TL;DR

Output.bin is a huge raw binary that is actually little-endian int16 samples. It perfectly split into blocks of 200000 bytes = 100000 samples

After recovering the initial xorshift state, simulate it to get mask_bit and decode plain_bit = a3 XOR mask_bit, then pack bits into bytes to obtain the flag.

Analysis

Output format (tone blocks)

The file is little-endian int16 samples split into fixed-size blocks:

1 block = 200000 bytes = 100000 samples

From the block generator (sub_1B0D0), each block is essentially:

x[t] = \mathrm{env}(t)\cdot 0.125 \cos\!\left(\phi + 2\pi (a_2+1)\frac{t}{20}\right) + \mathrm{noise}(t).

where $t=0..99999$ and $\phi = 0$ if $a_3=0$ , else $\phi=\pi$ . The key detail is the denominator: the tone is perfectly periodic with period 20, and $100000 = 5000\cdot 20$ .

1. Extracting $(a_2, a_3)$ from one block

Let $x[t]$ be one block, and for $f\in\{1..8\}$ define:

c_f = \sum_{t=0}^{99999} x[t] \cdot e^{-j 2\pi f t/20}.

Since only $t\bmod 20$ matters, compress it to 20 sums

$s[r] = \sum_{t\equiv r\ (\mathrm{mod}\ 20)} x[t]$ for $r=0..19$
$c_f = \sum_{r=0}^{19} s[r]\cdot e^{-j 2\pi f r/20}$

Then

$a_2 = \arg\max_{f\in\{1..8\}} |c_f| - 1$
for the chosen $f=a_2+1$ , the phase is only $0$ or $\pi$ , so it’s a sign flip:
$a_3 = 1$ iff $\Re(c_f) < 0$

2. PRNG embedding

The PRNG is standard xorshift128 (Marsaglia)

1
t = x ^ (x<<11);
2
(x,y,z,w) = (y,z,w, w ^ (w>>19) ^ t ^ (t>>8));

Per block the program consumes the xorshift stream like

Run xorshift 3 times, take w&1 each time, and assemble a2 as 3 bits (MSB $\rightarrow$ LSB).
Run xorshift 1 more time, take mask_bit = w&1.
Output phase bit is a3 = mask_bit XOR plain_bit.

The audio noise generator uses a different RNG state, so it does not advance the xorshift stream used for a2/mask_bit.

3. Recovering the 128-bit initial state (GF(2))

xorshift128 is linear over GF(2). Therefore every observed output bit w&1 after some number of steps is a linear combination of the initial 128-bit state.

From each block we learn 3 output bits via a2, so across enough blocks we collect at least 128 independent equations and solve for the initial state with Gaussian elimination over GF(2).

In practice:

Build the 128 $\times$ 128 transition matrix of one xorshift step over GF(2) by stepping each basis state bit once.
For every extracted a2 bit, add an equation row · state0 = bit.
Solve the resulting 128-bit system.

Exploit

Read output.bin as int16 and split into 100000-sample blocks.
For each block, compute the 20-sample folded sums $s[r]$ and extract (a2,a3) by checking $f=1..8$ .
Build the GF(2) linear system from all a2 bits and recover the initial xorshift128 state.
Simulate xorshift128:
- verify a2 matches for every block (sanity check)
- compute plain_bit = a3 XOR mask_bit

1
MASK128 = (1 << 128) - 1
2

3

4
def _xorshift128_step_words(x: int, y: int, z: int, w: int) -> tuple[int, int, int, int]:
5
    t = (x ^ ((x << 11) & 0xFFFFFFFF)) & 0xFFFFFFFF
6
    x, y, z = y, z, w
7
    w = (w ^ (w >> 19) ^ t ^ (t >> 8)) & 0xFFFFFFFF
8
    return x, y, z, w
9

10

11
@dataclass(frozen=True)
12
class XorShift128Matrix:
13
    rows: list[int]  # 128 rows, each 128-bit int
14
    cols: list[int]  # 128 cols, each 128-bit int
15

16

17
def build_xorshift128_matrix() -> XorShift128Matrix:
18
    cols: list[int] = []
19
    for bit in range(128):
20
        x = y = z = w = 0
21
        if bit < 32:
22
            x = 1 << bit
23
        elif bit < 64:
24
            y = 1 << (bit - 32)
25
        elif bit < 96:
26
            z = 1 << (bit - 64)
27
        else:
28
            w = 1 << (bit - 96)
29

30
        nx, ny, nz, nw = _xorshift128_step_words(x, y, z, w)
31
        col = (nx & 0xFFFFFFFF) | ((ny & 0xFFFFFFFF) << 32) | ((nz & 0xFFFFFFFF) << 64) | ((nw & 0xFFFFFFFF) << 96)
32
        cols.append(col)
33

34
    rows = [0] * 128
35
    for j, col in enumerate(cols):
36
        tmp = col
37
        while tmp:
38
            lsb = tmp & -tmp
39
            i = lsb.bit_length() - 1
40
            rows[i] |= 1 << j
41
            tmp ^= lsb
42

43
    return XorShift128Matrix(rows=rows, cols=cols)
44

45

46
def row_mul(row: int, mat_rows: list[int]) -> int:
47
    out = 0
48
    tmp = row
49
    while tmp:
50
        lsb = tmp & -tmp
51
        idx = lsb.bit_length() - 1
52
        out ^= mat_rows[idx]
53
        tmp ^= lsb
54
    return out
55

56

57
def vec_mul(mat_cols: list[int], vec: int) -> int:
58
    out = 0
59
    tmp = vec
60
    while tmp:
61
        lsb = tmp & -tmp
62
        idx = lsb.bit_length() - 1
63
        out ^= mat_cols[idx]
64
        tmp ^= lsb
65
    return out
66

67

68
def mat_square_rows(mat_rows: list[int]) -> list[int]:
69
    return [row_mul(r, mat_rows) for r in mat_rows]
70

71

72
def mat_square_cols(mat_cols: list[int]) -> list[int]:
73
    return [vec_mul(mat_cols, c) for c in mat_cols]
74

75

76
def precompute_pows_rows(base_rows: list[int], max_delta: int) -> list[list[int]]:
77
    bits = max(1, max_delta.bit_length())
78
    pows = [base_rows]
79
    for _ in range(1, bits):
80
        pows.append(mat_square_rows(pows[-1]))
81
    return pows
82

83

84
def precompute_pows_cols(base_cols: list[int], max_delta: int) -> list[list[int]]:
85
    bits = max(1, max_delta.bit_length())
86
    pows = [base_cols]
87
    for _ in range(1, bits):
88
        pows.append(mat_square_cols(pows[-1]))
89
    return pows
90

91

92
def apply_pows_row(row: int, delta: int, pows_rows: list[list[int]]) -> int:
93
    bit = 0
94
    tmp = delta
95
    out = row
96
    while tmp:
97
        if tmp & 1:
98
            out = row_mul(out, pows_rows[bit])
99
        tmp >>= 1
100
        bit += 1
101
    return out
102

103

104
def apply_pows_vec(vec: int, delta: int, pows_cols: list[list[int]]) -> int:
105
    bit = 0
106
    tmp = delta
107
    out = vec
108
    while tmp:
109
        if tmp & 1:
110
            out = vec_mul(pows_cols[bit], out)
111
        tmp >>= 1
112
        bit += 1
113
    return out
114

115

116
def extract_a2_a3(path: str) -> tuple[np.ndarray, np.ndarray]:
117
    seg_samples = 100_000
118
    samples = np.fromfile(path, dtype="<i2")
119
    if samples.size % seg_samples != 0:
120
        raise ValueError(f"unexpected file size: {samples.size} samples not divisible by {seg_samples}")
121
    num_segments = samples.size // seg_samples
122

123
    segs = samples.reshape(num_segments, seg_samples)
124
    sums20 = segs.reshape(num_segments, seg_samples // 20, 20).sum(axis=1, dtype=np.int64)
125

126
    r = np.arange(20, dtype=np.float64)[:, None]
127
    f = np.arange(1, 9, dtype=np.float64)[None, :]
128
    weights = np.exp(-1j * 2 * np.pi * r * f / 20.0)  # (20, 8)
129

130
    coefs = sums20 @ weights  # (segments, 8)
131
    mags = np.abs(coefs)
132
    idx = np.argmax(mags, axis=1).astype(np.int64)  # 0..7 => a2
133
    chosen = coefs[np.arange(num_segments), idx]
134
    a3 = (chosen.real < 0).astype(np.int8)
135
    return idx, a3
136

137

138
def build_a2_observations(
139
    a2: np.ndarray,
140
    *,
141
    preamble_segments: int = 8,
142
    noise_steps: int = 0,
143
) -> list[tuple[int, int]]:
144
    obs: list[tuple[int, int]] = []
145
    step = 0
146
    for seg_idx, v in enumerate(a2.tolist()):
147
        bits = [(v >> 2) & 1, (v >> 1) & 1, v & 1]
148
        for b in bits:
149
            step += 1
150
            obs.append((step, int(b)))
151
        step += 1
152
        step += noise_steps
153
    return obs
154

155

156
def solve_state_from_observations(
157
    obs: list[tuple[int, int]],
158
    *,
159
    matrix: XorShift128Matrix,
160
    preamble_segments: int = 8,
161
) -> int:
162
    if not obs:
163
        raise ValueError("no observations")
164

165
    max_delta = obs[0][0]
166
    for (s0, _), (s1, _) in zip(obs, obs[1:]):
167
        max_delta = max(max_delta, s1 - s0)
168

169
    pows_rows = precompute_pows_rows(matrix.rows, max_delta)
170

171
    equations: list[tuple[int, int]] = []
172
    current_step = 0
173
    row = 1 << 96  # output is LSB of w after each step
174
    for step, bit in obs:
175
        delta = step - current_step
176
        row = apply_pows_row(row, delta, pows_rows)
177
        current_step = step
178
        equations.append((row, bit))
179

180
    piv = [0] * 128  # store augmented rows (row | rhs<<128) keyed by pivot bit index
181
    for row, rhs in equations:
182
        r = row & MASK128
183
        b = rhs & 1
184
        while r:
185
            i = r.bit_length() - 1
186
            if piv[i]:
187
                r ^= piv[i] & MASK128
188
                b ^= (piv[i] >> 128) & 1
189
            else:
190
                piv[i] = r | (b << 128)
191
                break
192
        else:
193
            if b:
194
                raise ValueError("inconsistent system (bad extraction or wrong step schedule)")
195

196
    if any(p == 0 for p in piv):
197
        missing = sum(1 for p in piv if p == 0)
198
        raise ValueError(f"rank deficient: missing {missing} pivots")
199

200
    sol = 0
201
    for i in range(128):
202
        row_aug = piv[i]
203
        row = row_aug & MASK128
204
        rhs = (row_aug >> 128) & 1
205
        parity = (row & sol).bit_count() & 1
206
        val = rhs ^ parity
207
        if val:
208
            sol |= 1 << i
209

210
    return sol
211

212

213
def decode_plain_bits(
214
    a2: np.ndarray,
215
    a3: np.ndarray,
216
    *,
217
    state0: int,
218
    matrix: XorShift128Matrix,
219
    preamble_segments: int = 8,
220
    noise_steps: int = 0,
221
) -> list[int]:
222
    num_segments = a2.size
223
    if a3.size != num_segments:
224
        raise ValueError("a2/a3 length mismatch")
225
    max_delta = int(noise_steps)
226
    pows_cols = precompute_pows_cols(matrix.cols, max_delta)
227

228
    state = state0 & MASK128
229
    out_bits: list[int] = []
230
    for seg_idx in range(num_segments):
231
        bits = []
232
        for _ in range(3):
233
            state = vec_mul(matrix.cols, state)
234
            bits.append((state >> 96) & 1)
235
        pred_a2 = (bits[0] << 2) | (bits[1] << 1) | bits[2]
236
        if pred_a2 != int(a2[seg_idx]):
237
            raise ValueError(f"a2 mismatch at segment {seg_idx}: got {pred_a2}, expected {int(a2[seg_idx])}")
238

239
        state = vec_mul(matrix.cols, state)
240
        mask_bit = (state >> 96) & 1
241
        out_bits.append(int(mask_bit ^ int(a3[seg_idx])))
242

243
        state = apply_pows_vec(state, int(noise_steps), pows_cols)
244

245
    return out_bits
246

247

248
def bits_to_bytes(bits: list[int], *, msb_first: bool) -> bytes:
249
    if len(bits) % 8 != 0:
250
        raise ValueError("bit length not divisible by 8")
251
    out = bytearray()
252
    for i in range(0, len(bits), 8):
253
        chunk = bits[i : i + 8]
254
        v = 0
255
        if msb_first:
256
            for b in chunk:
257
                v = (v << 1) | (b & 1)
258
        else:
259
            for j, b in enumerate(chunk):
260
                v |= (b & 1) << j
261
        out.append(v)
262
    return bytes(out)

[SECCON CTF 14] gyokuto writeup