Hacker News new | past | comments | ask | show | jobs | submit login

It sounds like you’re asking for uncompressed audio? That meets all of your listed requirements. 48kHz * 16bit, single channel = 768kbit/s



We support that already, yup. But it never hurts to see if there's something better than that out there.


You can bootleg your own fast lossless codec by doing delta-encoding on the raw PCM to get a lot of zeros and then feed it through an off-the-shelf fast compressor like snappy/lz4/zstandard/etc. It won't get remotely close to the dedicated audio algorithms, but I wouldn't be surprised if you cut your data size by a factor 2-4 and essentially no CPU cost compared to raw uncompressed audio.


You’ve not done this before have you ?


I haven't, but now I have. I took https://opus-codec.org/static/examples/samples/music_orig.wa... from https://opus-codec.org/examples/. Then I wrote the following snippet of Python code:

    from scipy.io import wavfile
    import numpy as np
    import zstd

    sampling_rate, samples = wavfile.read(r'data/bootleg-compress/music_orig.wav')
    orig = samples.tobytes()

    naive_compressed = zstd.ZSTD_compress(orig)
    deltas = np.diff(samples, prepend=samples.dtype.type(0), axis=0) # Per-channel deltas.
    compressed_deltas = zstd.ZSTD_compress(deltas.ravel()) # Interleave channels and compress.

    decompressed_deltas = np.frombuffer(zstd.ZSTD_uncompress(compressed_deltas), dtype=samples.dtype)
    decompressed = np.cumsum(decompressed_deltas.reshape(deltas.shape), axis=0, dtype=samples.dtype)
    assert np.array_equal(samples, decompressed)

    print(len(orig))
    print(len(naive_compressed))
    print(len(compressed_deltas))
giving:

    17432876
    15518973
    12817602
Looks like my initial estimation of 2-4 was way off (when FLAC achieves ~2 this should've been a red flag), but you do get a ~1.36x reduction in space at basically memory read speed.

Using an encoding for second order differences with storing -127 <= d <= 127 using 1 byte and the others 2 bytes (for an input of 16-bit audio) I got a ratio of ~1.50 for something that can still operate entirely at RAM speed:

    orig = samples.tobytes()
    deltas = np.diff(samples, prepend=samples.dtype.type(0), axis=0)      # Per-channel deltas.
    delta_deltas = np.diff(deltas, prepend=samples.dtype.type(0), axis=0) # Per-channel second-order differences.

    # Many small differences, encode almost all 1-byte differences using 1 byte,
    # using 3 bytes for larger differences. Interleave channels and encode.
    small = np.sum(np.abs(delta_deltas.ravel()) <= 127)
    bootleg = np.zeros(small + (len(delta_deltas.ravel()) - small) * 3, dtype=np.uint8)
    i = 0
    for dda in delta_deltas.flatten():
        if -127 <= dda <= 127:
            bootleg[i] = dda + 127
            i += 1
        else:
            bootleg[i] = 255
            bootleg[i + 1] = (dda + 2**15) % 256
            bootleg[i + 2] = (dda + 2**15) // 256
            i += 3

    compressed_bootleg = zstd.ZSTD_compress(bootleg)
    print(len(compressed_bootleg))

    decompressed_bootleg = zstd.ZSTD_uncompress(compressed_bootleg)
    result = []

    i = 0
    while i < len(bootleg):
        if bootleg[i] < 255:
            result.append(decompressed_bootleg[i] - 127)
            i += 1
        else:
            lo = decompressed_bootleg[i + 1]
            hi = decompressed_bootleg[i + 2]
            result.append(256*hi + lo - 2**15)
            i += 3

    decompressed_delta_deltas = np.array(result, dtype=samples.dtype).reshape(delta_deltas.shape)
    decompressed_deltas = np.cumsum(decompressed_delta_deltas, axis=0, dtype=samples.dtype)
    decompressed = np.cumsum(decompressed_deltas, axis=0, dtype=samples.dtype)
    assert np.array_equal(samples, decompressed)
Prints 11593846.


While I also want a low-computation codec that can save space, the historical use cases unfortunately assumes a lot more CPU power to be compensated for a lot less bandwidth, so there's little research in this area, and there's no real incentive to make something like ProRes and DNxHD as if you are editing audio the SSD speeds has been so fast that you'll run into CPU problems first.


Either that or G.711.


G711 is neither high bitrate nor usable for music.


Then use G.722, it works fine for music.


No, g722 is still a wideband speech codec. Its available frequency goes up to 7 kHz. The uncompressed audio this thread began with goes up to 22 kHz. With g722 you're losing most overtones, or even all overtones from the top of a piano. Please don't use g722 for music apart from on-hold muzak.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: