Hacker Newsnew | past | comments | ask | show | jobs | submit | more misterdata's commentslogin

Not a single mention of the word ‘Apple’ in the original post, instead ‘the manufacturer’ and ‘the big corporation’.. curious if that is deliberate and if so what the reasoning is (legal?)


Presumably, because this isn't about Apple. They don't care about Apple. Merely about getting the M1/M2 arch to run Linux properly. It could have been Microsoft, Amazon, Google, treatment would have been the same.


But they did refer to them but used strange phrasing to do it


As they are accepting a JSON schema for the function calls, it is likely they are using token biasing based on the schema (using some kind of state machine that follows along with the tokens and only allows the next token to be a valid one given the grammar/schema). I have successfully implemented this for JSON Schema (limited subset) on llama.cpp. See also e.g. this implementation: https://github.com/1rgs/jsonformer


As someone also building constrained decoders against JSON [1], I was hopeful to see the same but I note the following from their documentation:

  The model can choose to call a function; if so, the content will be a stringified JSON object adhering to your custom schema (note: the model may generate invalid JSON or hallucinate parameters).
So sadly, it is just fine tuning. There's no hard biasing applied :(. You were so close, but so far OpenAI!

[1] https://github.com/newhouseb/clownfish

[2] https://platform.openai.com/docs/guides/gpt/function-calling


They may have just fine-tuned 3.5 to respond with valid JSON more times than not.

Building magic functions[0] I ran into many examples where JSONSchema broke for gpt-3.5-turbo but worked well for gpt-4.

[0] https://github.com/jumploops/magic


Or there’s a trade off between more complex schemas and logit bias going off the rails since there’s probably little to no backtracking.


Good point. Backtracking is certainly possible but it is probably tricky to parallelize at scale if you're trying to coalesce and slam through a bunch of concurrent (unrelated) requests with minimal pre-emption.


Also WONNX can both be used from native apps (through wgpu it uses Vulkan, DX or Metal) a well as on the web (using WebGPU, WONNX compiled to WebAssembly).


Looking forward to your WebGPU ML runtime! Also, why not contribute back to WONNX? (https://github.com/webonnx/wonnx)


Hi Tommy!

I sent you an email a few weeks back - would be great to chat!

WONNX is a seriously impressive project. There is a few reason I didn't just contribute back to WONNX:

1. WONNX does not parse the ONNX model into an IR, which I think is essential to have the freedom to transform the model as required.

2. When I started, WONNX didn't seem focused on symbolic dimensions (but I've seen you shipping the shape inference recently!).

3. The code quality has to be much higher when it's open source! I wanted to hack on this without anyone to please but myself.


I'm presently working on enhancing Burn's (https://burn-rs.github.io/) capabilities by implementing ONNX model importation (https://github.com/burn-rs/burn/issues/204). This will enable users to generate model source code during build time and load weights at runtime.

In my opinion, ONNX is more complex than necessary. Therefore, I opted to convert it to an intermediate representation (IR) first, which is then used to generate source code. A key advantage of this approach is the ease of merging nodes into corresponding operations, since ONNX and Burn don't share the same set of operators.


Actually WONNX also transforms to an IR first (early versions did not and simply translated the graph 1:1 to GPU shader invocations in topographically sorted order of the graph). In WONNX the IR nodes are (initially) simply (copy-on-write references to) the ONNX nodes. This IR is then optimized in various ways, including the fusion of ONNX ops (e.g. Conv+ReLU->ConvReLU). The newly inserted node still embeds an ONNX node structure to describe it but uses an internal operator.


Looks great!

ONNX is 100% more complex than necessary. Another format of interest is NNEF: https://www.khronos.org/nnef


Also see the recently introduced StableHLO and its serialization format: https://github.com/openxla/stablehlo/blob/main/docs/bytecode...


This makes running larger machine learning models in the browser feasible - see e.g. https://github.com/webonnx/wonnx (I believe Microsoft's ONNXRuntime.js will also soon gain a WebGPU back-end).


You can indeed perform inference using WebGPU (see e.g. [1] for GPU-accelerated inference of ONNX models on WebGPU; I am one of the authors).

The point made above is that WebGPU can only be used for GPU's and not really for other types of 'neural accelerators' (like e.g. the ANE on Apple devices).

[1] https://github.com/webonnx/wonnx


ANE is only accessible via coreml and internal apple frameworks so i would assume it wont be using ANE but maybe some neural accelerators in Intel/AMD/Nvidia processors and GPUs.

Accelerators inside GPU (like Tensorcores) seems like a lot better deal as you can easy utilize it without 4 abstraction layers with only some unknown to us mortals operations support inside. (And my god i hope apple will allow to programmable run ANE or at least put this api inside Metal framework cause right now working with Coreml for anything new is a nightmare and even some old models are broken on new versions of coremltools)


Replacement for the ONNX IR perhaps, but as far as I can see there is not (yet?) a file format for StableHLO (ONNX has a standardized on-disk format specified in Protobuf)


StableHLO has a serialization format which is based on MLIR bytecode. https://github.com/openxla/stablehlo/blob/main/docs/bytecode... goes into details of reading/writing portable artifacts for StableHLO programs and associated compatibility guarantees.

I'd also like to comment on our (StableHLO's) relationship with related work. StableHLO was a natural choice for the OpenXLA project, because a very similar operation set called HLO powers many of its key components. However, I would also like to give a shout out to related opsets in the ML community, including MIL, ONNX, TFLite, TOSA and WebNN.

Bootstrapping from HLO made a lot of sense to get things going, but that's just a starting point. There are many great ideas out there, and we're looking to evolve StableHLO beyond its roots. For example, we want to provide functionality to represent dynamism, quantization and sparsity, and there's so much to learn from related work.

We'd love to collaborate, and from the StableHLO side we can offer production-grade lowerings from TensorFlow, JAX and PyTorch, as well as compatibility with OpenXLA. Some of these connections in the ML ecosystem have already started growing organically, and we're super excited about that.


+1 to what Eugene said and evolutionary aspects. The proposal for stability of the format as well as the opset can be followed on the respective project forums (discourse & github issues/rfc) as these are discussed and refined to meet community needs.


Also, Vega [0] and Vega Lite [1]

[0] https://vega.github.io/vega/ [1] https://vega.github.io/vega-lite/


Not sure about WiFi but the idea exists in the 5G standard as ‘coordinated multipoint’ (CoMP)


Some APs (eg Ubiquiti) can actually steer clients from one band to the other based on minimum RSSI and other parameters (including device compatibility, and you can exclude or force a band for individual devices), which prevents this from happening.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: