Ive been looking at a way to use transformer based models on tabular data. The h...

kevin948 · on March 27, 2022

Same here. Find any good resources? I've been leaning on auto-encoders to encode better than word-2-vec and its ilk.

VHRanger · on March 27, 2022

Network node embeddings are the best for tabular data. I maintain a library on it here, but there's plenty of good alternatives:

https://github.com/VHRanger/nodevectors

rdedev · on March 28, 2022

My idea is to use make a table row into a textual description and feed it into a transformer and get a get effectively a sentence embedding. This is effectively a query embedding. Then make a couple of value embeddings for the target you are trying to predict and use cosine similarity to predict the right value embedding and feed that to the ml model as part of the feature set. It works if the categorical values in your table are entities that the model might have learned.

I tried this approach and it did improve the overall performance. The next step would be fine tuning the transformer model. I want to see if I could do it without disturbing the existing weights too much. Here's the library I used to get get the embeddings

https://www.sbert.net/

VHRanger · on March 27, 2022

For sparser data, you should just do normal network node embeddings.

Look into node2vec libraries for instance