Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ive been looking at a way to use transformer based models on tabular data. The hope is that these models have a much better contextual understanding of words. So embeddings from these models should be of better quality than just word2vec ones



Same here. Find any good resources? I've been leaning on auto-encoders to encode better than word-2-vec and its ilk.


Network node embeddings are the best for tabular data. I maintain a library on it here, but there's plenty of good alternatives:

https://github.com/VHRanger/nodevectors


My idea is to use make a table row into a textual description and feed it into a transformer and get a get effectively a sentence embedding. This is effectively a query embedding. Then make a couple of value embeddings for the target you are trying to predict and use cosine similarity to predict the right value embedding and feed that to the ml model as part of the feature set. It works if the categorical values in your table are entities that the model might have learned.

I tried this approach and it did improve the overall performance. The next step would be fine tuning the transformer model. I want to see if I could do it without disturbing the existing weights too much. Here's the library I used to get get the embeddings

https://www.sbert.net/


For sparser data, you should just do normal network node embeddings.

Look into node2vec libraries for instance




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: