Hacker Newsnew | past | comments | ask | show | jobs | submit | pierrebrunelle's commentslogin

Building an infinite-memory, context-aware, multimodal chatbot is as easy as 600 lines of Pixeltable code.

Why? Because Pixeltable is the only multimodal data infrastructure that unifies storage and orchestration for your AI workloads.


Agree...as simple as:

@pxt.query def search_documents(query_text: str, user_id: str): sim = chunks.text.similarity(query_text) return ( chunks.where( (chunks.user_id == user_id) # Metadata filtering & (sim > 0.5) # Filter by similarity threshold & (pxt_str.len(chunks.text) > 30) # Additional filter/transformation ) .order_by(sim, asc=False) .select( chunks.text, source_doc=chunks.document, # Ref to the original document sim=sim, title=chunks.title, heading=chunks.heading, page_number=chunks.page ) .limit(20) )

For instance in https://github.com/pixeltable/pixeltable


I fully agree and this is why context engineering matters and unifying storage and orchestration and treating agents as just another function call is significant and getting full visibility into the pipeline to easily iterate and with with the I/O. This is a good sample implementation of that: https://github.com/pixeltable/pixelbot


I like PocketFlow. You beat me on the # of lines of code! But does it provide parallelization, caching, orchestration, versioning, observability, lineage, multi-modal support?

As you just showed, building an agent SDK is easy, so what's interesting to me is tackling:

- Infrastructure Sprawl: Juggling separate systems for vector search, state tracking, multimodal data handling, and monitoring leads to fragmented workflows and high operational costs. - State Management Nightmares: Reliably tracking agent memory, tool calls, and intermediate states across potentially long-running, asynchronous tasks is incredibly difficult. - Multimodal Integration Pain: Integrating and processing images, audio, video, and documents alongside text requires specialized, often disparate, tooling. - Observability Gaps: Understanding why an agent made a decision or failed requires visibility into its state and data lineage, which is often lacking.

And doing all of that while finding the right abstraction layer to leave all the application and business logic to the dev/users so they don't feel limited. It's difficult!

Besides, I don't know where you see a commercial offering? Everything is Apache 2.0/Open Source from A to Z.


PocketFlow is not from me, but just my current favorite ;)

I just got the feeling that the lib is tied to pixeltable, but maybe I misunderstood? Maybe that's why this is dead? pocketflow is completely standalone and the main thing is that you vibe code what you need (works awesome so far!).

I don't want to sideline the discussion about pixelagent, but here's some more about pf:

- https://the-pocket.github.io/PocketFlow/design_pattern/multi... (multi agent, queue) - https://github.com/The-Pocket/PocketFlow/tree/main/cookbook/ here are more advanced examples. Pretty easy to follow imho.

PS: re the observability, yesterday I coded tracing for pocketflow, just need to put it up on github haha


Congrats! Take whatever you want :)

Yes: Memory, observability, versioning, and lineage being built-in is a derivative of unifying orchestration and storage: https://docs.pixeltable.com/docs/datastore/computed-columns.


Thanks!

https://github.com/pixeltable/pixelagent/tree/main/examples/... for multiple tools/agents. A tool is just a UDF. You can have as many as you want.

The goal of this reference agent SDK is to showcase the flexibility of Pixeltable (the underlying unified storage and orchestration system) that is Open Source and Apache 2.0.

This is where storage is defined: https://github.com/pixeltable/pixelagent/blob/main/pixelagen.... These are very simple examples.

Here's how an Agentic Reddit bot would work for instance: https://github.com/pixeltable/pixeltable/tree/main/docs/samp...


You can indeed turn anything that you want into an MCP server, e.g. https://github.com/pixeltable/pixeltable-mcp-server.

Pixelagent is a reference implementation for a multimodal agent framework to show that an agent class is easy to build and users should be empowered to build their own from scratch for their use cases.

Regarding Memory, to me it's just about Data Storage, Indexing, Orchestration, and Retrieval and I don't know why we should abstract Memory away from users. Memory will mean so many different things for many use cases.

Let's say you want:

- Working memory: Holds current context and immediate interaction history within the agent's context window -> this is just about passing Q&A pairs to maintain context alongside with roles.

- Episodic memory: Stores specific past experiences and interactions -> this is just about indexing past exchanges and having semantic search on it.

- Semantic memory: Organizes specific knowledge in structured formats -> this is just about building a custom logic (udf) to decide how and what to extract insight from and then retrieve it.

I've implemented them all in this example: https://github.com/pixeltable/pixelbot


We got sick of fighting with YOLOX's dependency every time we needed to update the integration for a decent object detector demo, so we forked it and fixed the annoying issues.

pip install pixeltable-yolox

Key stuff fixed:

- Works with Python 3.9+

- Compatible with current PyTorch

- No more version mismatches

- Actually runs in Colab without hacks

Will keep the Apache 2.0 license. No architectural changes to the model, just focused on making it usable in production.


ChatGPT directly into Jupyter Notebook with Notebookgpt by Noteable, notebookgpt.com as a plugin: http://notebookgpt.com/


Also a different one: https://github.com/noteable-io/genai with additional magics.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: