Major Fail at XMLTeam Tonight

thwarted · on Oct 3, 2009

My issues with XMLTeam has been that their salesmen don't seem very technology oriented and the database their interface/API accesses is only the most recent info and not everything that they actually have available. If you want anything historical, there was some random cut-off date in the middle of last season where stats disappeared without indication, it's not available via the charge-per-document API but only via the salesman-runs-the-query-and-emails-you-a-massive-zip-file which was error prone as files were missing even from that or they "forgot" to include one of the document types. They were generally nice and took care of problems like missing files, but when you're on a time budget to do a proof of concept, this is very frustrating (which is why I preferred to just get query for the data I wanted automatically, but even that required numerous back and forths).

Additionally, it looks like their data is going to be really clean, but it's not (like, what are you paying for otherwise?). A significant portion of the data loading code I wrote did cleanup and tried to detect variations that a human needed to look into. The data itself isn't often normalized. Positions in basketball were sometimes single characters, other times were strings ("F" vs "forward" or "forward center", "FC" or "CF" (but never "center forward")). Dates had differing formats, even in the same field. Sometimes things were specified in seconds, other times in minutes:seconds, sometimes in just minutes (with no label, so I had to use heuristics to determine if a number represented minutes or seconds and multiply accordingly). Some documents had a different XML element nesting structure at the top level, almost like the XML was generated by hand with someone typing it into an XML tool. Which is odd because they have this whole database schema they talk about that is supposed to be able to handle all the sports they support.

We also spent a lot of time hand comparing to other sources (like stats.com data available on Yahoo Sports) because we found a lot of odd outliers that didn't make any sense. I had stat names diverge between seasons, and they provide a lot of data literally that is derivable, which in a few cases was wrong (the free throw percentage should be the free throws made over the free throws attempted, but was often different). The schema is really weird. They know who played which position for every game, but rosters are only available on a per-season basis. The mobility of players between teams (and even between sports or leagues) isn't acknowledged well enough in the data (the same physical person having different primary keys, for example).

They seem to have all documents pre-generated and then the API just selects which documents to send you. This would be a good optimization, but was slow unless the result set was only a handful of documents (significantly less than a season's worth), and the API/interface implied that data was available that wasn't.

They seem to have a good product if your goal is only to show some sports stats on your website in an embedded frame. That's ephemeral by nature, so if something is goofed up, it'll be fixed in the data for the next game. But doing any kind of bulk analysis or browsable database from their data, there are many deficiencies. There were also some odd licensing limits we couldn't get straight answers from them (like "must state you got the data from XMLTeam on your website", but our use of the data didn't have a website in the common case and we would have been strictly violating the licensing terms). I was often like "Dude, I'm throwing you money to make this work for me" and didn't get a lot of satisfaction out of it.

They could be a serious contender against the (significantly) more expensive stats.com service (which I don't have any direct experience with because they are so expensive), and I hope they improve, but it seems like one of those companies/services that, at least to a tech person like me, makes money in spite of themselves.