Hacker News new | past | comments | ask | show | jobs | submit | more jackmpcollins's comments login

It's a great start but there's a little more work to do for full OpenAI API compatibility, namely streaming support and the tool_choice parameter. Making it fully compatible would allow it be swapped in directly to frameworks like langchain and magentic [1] that I am building).

[1] https://github.com/jackmpcollins/magentic/issues/207


Looks quite like magentic [0] that I've been building, though broader in scope? I'm (clearly) a huge advocate of pydantic, structured outputs, and keeping control flow in python code (rather than inside abstractions / "chains") as much as possible, so it's great to see those values here too! I'd be interested to hear what you consider in vs out of scope for mirascope and the long-term vision. Also would be cool for one of us to do a mirascope vs magentic comparison blog post.

[0] https://github.com/jackmpcollins/magentic


Hadn't seen magentic before this comment, super cool! Would love to do a comparison blog post -- I'll reach out about this separately :)

re: scope:

I think anything one layer above the base provider API is definitely in scope. A lot of these APIs are autogenerated (e.g. stainless), so I see a lot of value in building language-specific tools (starting with Python, likely JS/TS next) to improve developer experience at this single step up layer (e.g. tools, structured outputs, state across calls, streaming, etc.).

I care deeply about cutting the amount of time people have to spend on things that ultimately aren't the value adding aspects of their work (like maintaining an internal tool), and I think the (coming soon) future of AI-native applications will need this more than ever. Essentially anything that could be considered boiler-plate or unnecessary repeated code is something I want to improve without removing/reducing control and transparency -- I want to build useful abstractions that aren't obstructions.

I also thing it's important to do this through a provider-agnostic interface given how frequently a new "king" pops up.

re: long-term vision:

more support for various providers/models + tools and agentic flows are definitely top of mind here, but to be honest I'm not fully certain what this AI-native future looks like, so I'm keeping an open mind about where to direct efforts. I'm hopeful usage and feedback from building in the open will help further guide this process.


Is AdalFlow also focused on automated prompt optimization or is it broader in scope? It looks like there are also some features around evaluation. I'd be really interested to see a comparison between AdalFlow, DSPy [0], LangChain [1] and magentic [2] (package I've created, narrower in scope).

[0] https://github.com/stanfordnlp/dspy

[1] https://github.com/langchain-ai/langchain

[2] https://github.com/jackmpcollins/magentic


We are broader. We have essential building blocks for RAG, Agents. But also made whatever you build possible to auto-optimize. You can think of us as the library to do in-context learning. Just like PyTorch is for model-training.

Our benchmark has compared with Dspy and Text-grad(https://github.com/zou-group/textgrad)

We have better accuracy, more token-efficient, and faster convergence speed. We are publishing three research papers to explain this better to researchers.

https://adalflow.sylph.ai/use_cases/question_answering.html

We will compare with these optimization libraries but wont compare with libraries like LangChain or LlamaIndex. As they simply dont have optimization and it is pain to build on them.

Hope this make sense


Thanks for the explanation! Do you see auto-optimization as something that is useful for every use case or just some? And what determines when this is useful vs not?


I would say its useful for all production-grad application.

Trainer.diagnose helps you get a final eval score across different splits of datasets: train, val, test, and it logs all errors, including format errors so that you can manually diagnose and to decide if the evaluation is too low that you need further text-grad optimization.

if there is still a big gap between your optimized prompt vs performance on a more advanced model with the same prompt (say gpt4o), then you can use our "Learn-to-reason few-shot" to create demonstration from the advanced model to further close the performance gap. We have use cases optimized the performance all the way from 60% to 94% on gpt3.5 and the gpt4o has 98%.

We will give users some guideline in general.

We are the only library provides "diagnose" and "debug" feature and a clear optimization goal.


I've built a lightweight package that provides a standard interface to the LLM providers, as well as taking care of boilerplate around structured outputs, function calling, and opentelemetry/tracing. It's hopefully a good compromise between ease-of-use and complexity.

https://github.com/jackmpcollins/magentic


I haven't used LangGraph myself, but the latest magentic release is compatible with it if you'd like to check out the examples here https://github.com/jackmpcollins/magentic/issues/287


Please try out https://magentic.dev/ ! It is a light wrapper that is standard across LLM providers and handles the boilerplate code related to structured outputs and function calling. It doesn't include data/vector stores or integrations so it's a little lower-level than langchain, but for a lot of use cases it gives you the flexibility needed.


I completely agree, and built magentic [0] to cover the common needs (structured output, common abstraction across LLM providers, LLM-assisted retries) while leaving all the prompts up to the package user.

[0] https://github.com/jackmpcollins/magentic


I'm building magentic https://github.com/jackmpcollins/magentic which has basically this syntax, though it queries the LLM to generate the answer rather than writing + running code.

  from magentic import prompt
  from pydantic import BaseModel
  
  class Superhero(BaseModel):
      name: str
      age: int
      power: str
      enemies: list[str]
  
  @prompt("Create a Superhero named {name}.")
  def create_superhero(name: str) -> Superhero: ...

I do have plans to also solve the case you're talking about of generating code once and executing that each time.


Does the dashboard/UI support traces? I would love a tool in which to view opentelemetry traces, that can neatly display full prompt and response for the spans that represent LLM queries. I'm planning to add opentelemetry instrumentation to magentic [1] and looking for a UI that is easy to run locally that makes it easy to see what an agent is doing (via OTEL traces). I have more of my thoughts on the github issue: https://github.com/jackmpcollins/magentic/issues/136

[1] https://github.com/jackmpcollins/magentic


Shameless plug. We are building exactly this. Fully open source and open telemetry standard tracing with a visualization client that’s optimized for LLM application observability. Check it out here

- https://github.com/Scale3-Labs/langtrace - https://langtrace.ai/


I appreciate you sharing your project with me. It's great to see others working on solutions in this space. While our offerings may have some similarities, I'm sure there are unique aspects to what you've built that could be valuable to users. I encourage you to continue innovating and pushing the boundaries of what's possible. Healthy competition ultimately benefits the entire community, as it drives us all to create better products. I wish you the best of luck with your project.


Yup, The Dashboard is entirely powered by OTEL traces and yeah, you can see full prompts and responses including cost and request metadata.

The UI is pretty easy to run, Its one single image.

Lemme know if you need any help on instrumentation, If you have a PR, Lemme know, Ill try to assist on this


You can add whatever span attributes you like to otel traces, and then show those attributes in whatever UI you have (I use Grafana).


Ellipses is actually used in quite a few places. See the answers and comments on this stackoverflow post[0]. The usage most similar to what I have in the magentic examples is with the `@overload` decorator in the typing module[1].

With that said, you are free to put any code in the function body including `pass` or just a docstring or even `raise NotImplementedError` - it will not be executed. Using Ellipses satisfies VSCode/pyright type checking and seemed neatest to me for the examples and docs. I have some additional notes on this in the README[2].

[0] https://stackoverflow.com/q/772124/9995080

[1] https://docs.python.org/3/library/typing.html#typing.overloa...

[2] https://github.com/jackmpcollins/magentic#type-checking


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: