More

jackmpcollins · 2024-08-20T06:12:58 1724134378

It's a great start but there's a little more work to do for full OpenAI API compatibility, namely streaming support and the tool_choice parameter. Making it fully compatible would allow it be swapped in directly to frameworks like langchain and magentic [1] that I am building).

[1] https://github.com/jackmpcollins/magentic/issues/207

jackmpcollins · 2024-08-18T19:46:12 1724010372

Looks quite like magentic [0] that I've been building, though broader in scope? I'm (clearly) a huge advocate of pydantic, structured outputs, and keeping control flow in python code (rather than inside abstractions / "chains") as much as possible, so it's great to see those values here too! I'd be interested to hear what you consider in vs out of scope for mirascope and the long-term vision. Also would be cool for one of us to do a mirascope vs magentic comparison blog post.

[0] https://github.com/jackmpcollins/magentic

wbakst · 2024-08-18T20:11:21 1724011881

Hadn't seen magentic before this comment, super cool! Would love to do a comparison blog post -- I'll reach out about this separately :)

re: scope:

I think anything one layer above the base provider API is definitely in scope. A lot of these APIs are autogenerated (e.g. stainless), so I see a lot of value in building language-specific tools (starting with Python, likely JS/TS next) to improve developer experience at this single step up layer (e.g. tools, structured outputs, state across calls, streaming, etc.).

I care deeply about cutting the amount of time people have to spend on things that ultimately aren't the value adding aspects of their work (like maintaining an internal tool), and I think the (coming soon) future of AI-native applications will need this more than ever. Essentially anything that could be considered boiler-plate or unnecessary repeated code is something I want to improve without removing/reducing control and transparency -- I want to build useful abstractions that aren't obstructions.

I also thing it's important to do this through a provider-agnostic interface given how frequently a new "king" pops up.

re: long-term vision:

more support for various providers/models + tools and agentic flows are definitely top of mind here, but to be honest I'm not fully certain what this AI-native future looks like, so I'm keeping an open mind about where to direct efforts. I'm hopeful usage and feedback from building in the open will help further guide this process.

jackmpcollins · 2024-08-18T19:20:44 1724008844

Is AdalFlow also focused on automated prompt optimization or is it broader in scope? It looks like there are also some features around evaluation. I'd be really interested to see a comparison between AdalFlow, DSPy [0], LangChain [1] and magentic [2] (package I've created, narrower in scope).

[0] https://github.com/stanfordnlp/dspy

[1] https://github.com/langchain-ai/langchain

[2] https://github.com/jackmpcollins/magentic

meame2010 · 2024-08-18T19:28:50 1724009330

We are broader. We have essential building blocks for RAG, Agents. But also made whatever you build possible to auto-optimize. You can think of us as the library to do in-context learning. Just like PyTorch is for model-training.

Our benchmark has compared with Dspy and Text-grad(https://github.com/zou-group/textgrad)

We have better accuracy, more token-efficient, and faster convergence speed. We are publishing three research papers to explain this better to researchers.

https://adalflow.sylph.ai/use_cases/question_answering.html

We will compare with these optimization libraries but wont compare with libraries like LangChain or LlamaIndex. As they simply dont have optimization and it is pain to build on them.

Hope this make sense

jackmpcollins · 2024-08-18T19:59:06 1724011146

Thanks for the explanation! Do you see auto-optimization as something that is useful for every use case or just some? And what determines when this is useful vs not?

meame2010 · 2024-08-18T20:18:59 1724012339

I would say its useful for all production-grad application.

Trainer.diagnose helps you get a final eval score across different splits of datasets: train, val, test, and it logs all errors, including format errors so that you can manually diagnose and to decide if the evaluation is too low that you need further text-grad optimization.

if there is still a big gap between your optimized prompt vs performance on a more advanced model with the same prompt (say gpt4o), then you can use our "Learn-to-reason few-shot" to create demonstration from the advanced model to further close the performance gap. We have use cases optimized the performance all the way from 60% to 94% on gpt3.5 and the gpt4o has 98%.

We will give users some guideline in general.

We are the only library provides "diagnose" and "debug" feature and a clear optimization goal.

jackmpcollins · 2024-08-09T02:21:27 1723170087

I've built a lightweight package that provides a standard interface to the LLM providers, as well as taking care of boilerplate around structured outputs, function calling, and opentelemetry/tracing. It's hopefully a good compromise between ease-of-use and complexity.

https://github.com/jackmpcollins/magentic

jackmpcollins · 2024-08-09T02:07:36 1723169256

I haven't used LangGraph myself, but the latest magentic release is compatible with it if you'd like to check out the examples here https://github.com/jackmpcollins/magentic/issues/287

jackmpcollins · 2024-08-03T20:17:42 1722716262

Please try out https://magentic.dev/ ! It is a light wrapper that is standard across LLM providers and handles the boilerplate code related to structured outputs and function calling. It doesn't include data/vector stores or integrations so it's a little lower-level than langchain, but for a lot of use cases it gives you the flexibility needed.

jackmpcollins · 2024-06-23T03:03:22 1719111802

I completely agree, and built magentic [0] to cover the common needs (structured output, common abstraction across LLM providers, LLM-assisted retries) while leaving all the prompts up to the package user.

[0] https://github.com/jackmpcollins/magentic

jackmpcollins · on May 9, 2024

I'm building magentic https://github.com/jackmpcollins/magentic which has basically this syntax, though it queries the LLM to generate the answer rather than writing + running code.

  from magentic import prompt
  from pydantic import BaseModel
  
  class Superhero(BaseModel):
      name: str
      age: int
      power: str
      enemies: list[str]
  
  @prompt("Create a Superhero named {name}.")
  def create_superhero(name: str) -> Superhero: ...

I do have plans to also solve the case you're talking about of generating code once and executing that each time.

jackmpcollins · on April 28, 2024

Does the dashboard/UI support traces? I would love a tool in which to view opentelemetry traces, that can neatly display full prompt and response for the spans that represent LLM queries. I'm planning to add opentelemetry instrumentation to magentic [1] and looking for a UI that is easy to run locally that makes it easy to see what an agent is doing (via OTEL traces). I have more of my thoughts on the github issue: https://github.com/jackmpcollins/magentic/issues/136

[1] https://github.com/jackmpcollins/magentic

kakaly0403 · on April 29, 2024

Shameless plug. We are building exactly this. Fully open source and open telemetry standard tracing with a visualization client that’s optimized for LLM application observability. Check it out here

- https://github.com/Scale3-Labs/langtrace - https://langtrace.ai/

aman_041 · on April 29, 2024

I appreciate you sharing your project with me. It's great to see others working on solutions in this space. While our offerings may have some similarities, I'm sure there are unique aspects to what you've built that could be valuable to users. I encourage you to continue innovating and pushing the boundaries of what's possible. Healthy competition ultimately benefits the entire community, as it drives us all to create better products. I wish you the best of luck with your project.

patcher99 · on April 29, 2024

Yup, The Dashboard is entirely powered by OTEL traces and yeah, you can see full prompts and responses including cost and request metadata.

The UI is pretty easy to run, Its one single image.

Lemme know if you need any help on instrumentation, If you have a PR, Lemme know, Ill try to assist on this

Eridrus · on April 28, 2024

You can add whatever span attributes you like to otel traces, and then show those attributes in whatever UI you have (I use Grafana).

jackmpcollins · on Sept 27, 2023

Ellipses is actually used in quite a few places. See the answers and comments on this stackoverflow post[0]. The usage most similar to what I have in the magentic examples is with the `@overload` decorator in the typing module[1].

With that said, you are free to put any code in the function body including `pass` or just a docstring or even `raise NotImplementedError` - it will not be executed. Using Ellipses satisfies VSCode/pyright type checking and seemed neatest to me for the examples and docs. I have some additional notes on this in the README[2].

[0] https://stackoverflow.com/q/772124/9995080

[1] https://docs.python.org/3/library/typing.html#typing.overloa...

[2] https://github.com/jackmpcollins/magentic#type-checking