Hacker News new | past | comments | ask | show | jobs | submit login
Major Fail at XMLTeam Tonight
5 points by vdibart on Oct 3, 2009 | hide | past | favorite | 1 comment
To date I've had nothing but good things to say about XMLTeam (http://xmlteam.com/), an alternative to high priced/low value professional sports stat providers like Stats Inc. I'm the last person who wants to see them go out of business, but sounds like today was a bad day to stop drinking if you're an employee over there.

This evening I got an email telling me that my password was reset due to security concerns, and any applications I have deployed have to use the updated password before October 5th.

There was only 1 rather large problem - the email contained the username and password for someone else's account. And I couldn't log into my own account. (No, I didn't try to log in with the other guy's credentials).

So, to summarize, someone at XMLTeam realized there was a security hole in their software (problem #1) and decided to reset everyone's passwords with barely any notice (problem #2). Then they sent the usernames and passwords in clear text (problem #3) via email (problem #4) to the wrong emails (problem #5).

In his defense, the CEO responded quickly to my email and assured me that I would not be charged for any requests that are submitted to my account while they get things worked out (I'm on a pay-per-request plan, which is honestly one of the best deals out there for this kind of thing). He sounded as rattled as you might expect.

Look, I sympathize to some degree, but this is a colossal fail. Get it together guys! Small companies like mine depend on you.




My issues with XMLTeam has been that their salesmen don't seem very technology oriented and the database their interface/API accesses is only the most recent info and not everything that they actually have available. If you want anything historical, there was some random cut-off date in the middle of last season where stats disappeared without indication, it's not available via the charge-per-document API but only via the salesman-runs-the-query-and-emails-you-a-massive-zip-file which was error prone as files were missing even from that or they "forgot" to include one of the document types. They were generally nice and took care of problems like missing files, but when you're on a time budget to do a proof of concept, this is very frustrating (which is why I preferred to just get query for the data I wanted automatically, but even that required numerous back and forths).

Additionally, it looks like their data is going to be really clean, but it's not (like, what are you paying for otherwise?). A significant portion of the data loading code I wrote did cleanup and tried to detect variations that a human needed to look into. The data itself isn't often normalized. Positions in basketball were sometimes single characters, other times were strings ("F" vs "forward" or "forward center", "FC" or "CF" (but never "center forward")). Dates had differing formats, even in the same field. Sometimes things were specified in seconds, other times in minutes:seconds, sometimes in just minutes (with no label, so I had to use heuristics to determine if a number represented minutes or seconds and multiply accordingly). Some documents had a different XML element nesting structure at the top level, almost like the XML was generated by hand with someone typing it into an XML tool. Which is odd because they have this whole database schema they talk about that is supposed to be able to handle all the sports they support.

We also spent a lot of time hand comparing to other sources (like stats.com data available on Yahoo Sports) because we found a lot of odd outliers that didn't make any sense. I had stat names diverge between seasons, and they provide a lot of data literally that is derivable, which in a few cases was wrong (the free throw percentage should be the free throws made over the free throws attempted, but was often different). The schema is really weird. They know who played which position for every game, but rosters are only available on a per-season basis. The mobility of players between teams (and even between sports or leagues) isn't acknowledged well enough in the data (the same physical person having different primary keys, for example).

They seem to have all documents pre-generated and then the API just selects which documents to send you. This would be a good optimization, but was slow unless the result set was only a handful of documents (significantly less than a season's worth), and the API/interface implied that data was available that wasn't.

They seem to have a good product if your goal is only to show some sports stats on your website in an embedded frame. That's ephemeral by nature, so if something is goofed up, it'll be fixed in the data for the next game. But doing any kind of bulk analysis or browsable database from their data, there are many deficiencies. There were also some odd licensing limits we couldn't get straight answers from them (like "must state you got the data from XMLTeam on your website", but our use of the data didn't have a website in the common case and we would have been strictly violating the licensing terms). I was often like "Dude, I'm throwing you money to make this work for me" and didn't get a lot of satisfaction out of it.

They could be a serious contender against the (significantly) more expensive stats.com service (which I don't have any direct experience with because they are so expensive), and I hope they improve, but it seems like one of those companies/services that, at least to a tech person like me, makes money in spite of themselves.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: