It sounds like you’re asking for uncompressed audio? That meets all of your list...

dale_glass · on Oct 1, 2022

We support that already, yup. But it never hurts to see if there's something better than that out there.

orlp · on Oct 4, 2022

You can bootleg your own fast lossless codec by doing delta-encoding on the raw PCM to get a lot of zeros and then feed it through an off-the-shelf fast compressor like snappy/lz4/zstandard/etc. It won't get remotely close to the dedicated audio algorithms, but I wouldn't be surprised if you cut your data size by a factor 2-4 and essentially no CPU cost compared to raw uncompressed audio.

muziq · on Oct 5, 2022

You’ve not done this before have you ?

orlp · on Oct 13, 2022

I haven't, but now I have. I took https://opus-codec.org/static/examples/samples/music_orig.wa... from https://opus-codec.org/examples/. Then I wrote the following snippet of Python code:

    from scipy.io import wavfile
    import numpy as np
    import zstd

    sampling_rate, samples = wavfile.read(r'data/bootleg-compress/music_orig.wav')
    orig = samples.tobytes()

    naive_compressed = zstd.ZSTD_compress(orig)
    deltas = np.diff(samples, prepend=samples.dtype.type(0), axis=0) # Per-channel deltas.
    compressed_deltas = zstd.ZSTD_compress(deltas.ravel()) # Interleave channels and compress.

    decompressed_deltas = np.frombuffer(zstd.ZSTD_uncompress(compressed_deltas), dtype=samples.dtype)
    decompressed = np.cumsum(decompressed_deltas.reshape(deltas.shape), axis=0, dtype=samples.dtype)
    assert np.array_equal(samples, decompressed)

    print(len(orig))
    print(len(naive_compressed))
    print(len(compressed_deltas))

giving:

    17432876
    15518973
    12817602

Looks like my initial estimation of 2-4 was way off (when FLAC achieves ~2 this should've been a red flag), but you do get a ~1.36x reduction in space at basically memory read speed.

Using an encoding for second order differences with storing -127 <= d <= 127 using 1 byte and the others 2 bytes (for an input of 16-bit audio) I got a ratio of ~1.50 for something that can still operate entirely at RAM speed:

    orig = samples.tobytes()
    deltas = np.diff(samples, prepend=samples.dtype.type(0), axis=0)      # Per-channel deltas.
    delta_deltas = np.diff(deltas, prepend=samples.dtype.type(0), axis=0) # Per-channel second-order differences.

    # Many small differences, encode almost all 1-byte differences using 1 byte,
    # using 3 bytes for larger differences. Interleave channels and encode.
    small = np.sum(np.abs(delta_deltas.ravel()) <= 127)
    bootleg = np.zeros(small + (len(delta_deltas.ravel()) - small) * 3, dtype=np.uint8)
    i = 0
    for dda in delta_deltas.flatten():
        if -127 <= dda <= 127:
            bootleg[i] = dda + 127
            i += 1
        else:
            bootleg[i] = 255
            bootleg[i + 1] = (dda + 2**15) % 256
            bootleg[i + 2] = (dda + 2**15) // 256
            i += 3

    compressed_bootleg = zstd.ZSTD_compress(bootleg)
    print(len(compressed_bootleg))

    decompressed_bootleg = zstd.ZSTD_uncompress(compressed_bootleg)
    result = []

    i = 0
    while i < len(bootleg):
        if bootleg[i] < 255:
            result.append(decompressed_bootleg[i] - 127)
            i += 1
        else:
            lo = decompressed_bootleg[i + 1]
            hi = decompressed_bootleg[i + 2]
            result.append(256*hi + lo - 2**15)
            i += 3

    decompressed_delta_deltas = np.array(result, dtype=samples.dtype).reshape(delta_deltas.shape)
    decompressed_deltas = np.cumsum(decompressed_delta_deltas, axis=0, dtype=samples.dtype)
    decompressed = np.cumsum(decompressed_deltas, axis=0, dtype=samples.dtype)
    assert np.array_equal(samples, decompressed)

Prints 11593846.

zinekeller · on Oct 1, 2022

While I also want a low-computation codec that can save space, the historical use cases unfortunately assumes a lot more CPU power to be compensated for a lot less bandwidth, so there's little research in this area, and there's no real incentive to make something like ProRes and DNxHD as if you are editing audio the SSD speeds has been so fast that you'll run into CPU problems first.

userbinator · on Oct 1, 2022

Either that or G.711.

viraptor · on Oct 1, 2022

G711 is neither high bitrate nor usable for music.

simfree · on Oct 1, 2022

Then use G.722, it works fine for music.

viraptor · on Oct 1, 2022

No, g722 is still a wideband speech codec. Its available frequency goes up to 7 kHz. The uncompressed audio this thread began with goes up to 22 kHz. With g722 you're losing most overtones, or even all overtones from the top of a piano. Please don't use g722 for music apart from on-hold muzak.