Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Microsoft.... built a custom XML tool into its word processor in 2007... this was a tool for power users, and was only used by a small percentage of its user base."

I'm definitely confused by that statement and its link, because it implies the relevant tool is the disk format for every Office file, which has been described by an Excel program manager as "complicated enough to reduce a grown programmer to tears." https://www.joelonsoftware.com/2008/02/19/why-are-the-micros...



It's not referring to the XML formats.. it was a feature of Word specifically which allowed you to embed a user-defined xml schema in your Word document, and use XML data that fits the schema in your document.

See https://www.zdnet.com/article/custom-xml-the-key-to-patent-s...

(edit: grammar)


Ah, thanks for explaining.


> it implies the relevant tool is the disk format for every Office file

Does it imply that?

Another commenter has already pointed out why it's likely not the case.

But also, I don't think the article is well written. Partly because it doesn't clearly explain what the infringing tool was, or did, or how it operated. Also I'm pretty sure there's a typo in "ex part" instead of "ex parte". But another major issue is the following:

> $40 million of that judgment [against Microsoft] was imposed by the court as punishment for continually arguing that i4i was a patent troll even though it had an operating business in a manner that was “persistent, legally improper, and in direct violation of the Court's instructions.”

What?

Why would i4i operating in a manner that was persistent, improper and in violation of the court's instructions preclude it from also being a patent troll? It could do both?

Or is the "persistent..." descriptor meant to apply to Microsoft? That might make more sense, but the "even though" seems to be a comparison between two types of activity by one entity - namely i4i.

But then again, I might be reading "it had an operating business in a manner" wrong, because it feels ungrammatical to me. I might not be putting the emphasis in the right place, and that's what's causing me to misread the sentence?

The whole thing just feels confusing.


Thanks for reading. Sorry if this was confusing! Microsoft said that i4i was a patent troll despite the court repeatedly telling Microsoft to not do that. The judge referred to Microsoft's repeated ignoring of its instructions as "persistent" etc. i4i had an operating business; it wasn't a patent troll. That operating business is niche and small, but it is real. I have updated that sentence to make it clearer. Thanks for your feedback!


Depends on one's definition. I don't think "not having a real product/service" is the defining charateristic of "patent troll". Here's what Wikipedia says.

> attempts to enforce patent rights against accused infringers far beyond the patent's actual value or contribution to the prior art

> often do not manufacture products or supply services based upon the patents in question


No problem.

Looking back at it again now, I can see the intent of the original sentence where "it had an operating business" refers to i4i, but "in a manner that was..." refers to Microsoft. I didn't get the change of subject at that point.

Maybe an additional comma would have been all that I needed to figure it out: "even though it had an operating business, in a manner that was..."


The article says the feature has been removed; if it was the disk format:

1) it has never been removed, afaik Word still uses OOXML, so Word would keep being infringing

2) LibreOffice would probably be infringing too, as ODF is also XML based

So... it has to be some other form of XML tool and not the file format.

As for Joel's comment, IIRC he was an Excel PM before OOXML; in any case his blog post refer to the binary format that precedes OOXML. I'm pretty sure OOXML is equally if not even more complicated, as the product themselves are way more complicated than they appear, but the fact is that he was talking about a different thing.

Edit: as many users pointed out, it's not the file format itself, but the ability to add arbitrary attributes/elements to the file format XML as additional data.


Nitpick: Joel is referring to the old BIFF-style format (from 2003 and before) in that quote. The new "Office Open XML" formats are not mentioned in that post at all. However, one of the many criticisms of the Office Open XML formats is that they are, in some areas, nothing more than an XML serialization of the BIFF records.


This isn't want Joel is talking about here.

On the backend, all .docx files use XML. Joel is saying the root XML format was difficult to work with.

What my article is about is this: Microsoft used to allow users to write their own custom XML rules on top of Word. (This was mostly app developers using XML for macros rather than end users, and overall it was very rare.) This is the feature that was at issue with the patent.

Sorry if this was not clear!


> Joel is saying the root XML format was difficult to work with.

Joel wasn't writing about the XML version of MS Office documents, he was writing about the binary versions.


Thanks for clarifying!


Looking at the patent application, it doesn't appear to mention XML at all (it does talk about SGML, though), and the application appears to claim any mapping of a symbolic name to style properties (think Word styles or CSS classes); in other words, technical trivialities, reflecting poorly on US lawyers and their patent law.


It's not about storing XML, it's (as far as I understand the patent) about a specific representation of XML that can be more efficient to read.

The patent is about representing documents with markup (XML or otherwise) not by embedding them in the text, but rather having them stripped and maintained as a separate list of (tag, position) pairs, with the document only containing the raw text.

I'm only surprised that Microsoft couldn't find prior art, because having a (content-type, address) index at the beginning of a file is not exactly an unusual representation. It also reminds me that the USPTO's idiosyncratic usage of non-obviousness doesn't really match my intuition.


This is a huge issue with the patent world in general. There's just so much prior art out there, and you have to be really clear about showing that it applies. This isn't a patent case, but I have a great Google Maps case involving Wi-Fi where a judge completely borked it. As for this particular patent, I'm not enough of an XML expert to say whether the court got it right here. But it is worth noting that Microsoft tried to invalidate the patent several times with USPTO and failed to do so there as well. So perhaps there's something more to the patent than meets the eye, or that is was novel at that time but not modern XML. Remember, the actual i4i patent at issue was filed in 1994, and it only matters if there was prior art from before 1994. It might have been novel at the time.


> Remember, the actual i4i patent at issue was filed in 1994, and it only matters if there was prior art from before 1994. It might have been novel at the time.

I am aware of the date of the "invention". I was programming on 8- and 16-bit computers in the 1980s and I was using this and similar kinds of formats for non-textual data, simply because it was easier to do this in assembler than writing a parser, paired with the difficulty of finding unused special bytes in binary data to separate meta-information from the data proper.

And I was also talking about non-obviousness, not novelty.


Fair enough. I haven’t seen the invalidation proceedings and am clearly less of an expert than you. So don’t know whether they got it right. Non-obviousness is, erm, non-obvious.


Am I right to understand that it would be the equivalent of visual studio's wpf designer [1], where you have the WYSIWYG editor side by side with an xml editor and you can make the change in either of them and it translates into the other?

If it is, it would have been really really cool.

[1] https://i.stack.imgur.com/8pJnn.png


No. It's more like what the following piece of code produces:

  def convert(xml):
      import re

      parsed = re.split(r"(<.+?>)", xml)
      output = parsed[0]
      tags_with_pos = []
      for i in range(1, len(parsed), 2):
          tags_with_pos.append((parsed[i], len(output)))
          output += parsed[i+1]
      return tags_with_pos, output


> the USPTO's idiosyncratic usage of non-obviousness doesn't really match my intuition

Remember that USPTO gets paid for each patent application, and not penalised when it's later falsified.


Well, it was apparently upheld twice on reexamination, where they could have fixed that. The problem is more that the bar for non-obviousness is so low, it's basically on the floor. Paired with a discipline (software development), where independent reinvention is common, this is just a recipe for disaster.


Everyone knows 1+1=2, so why did Russell spend many many pages/hours on a proof, surely if people know it then it's easy to demonstrate? /s

Programmers are notoriously good at documenting everything after all. /s

It's easy to give documentary evidence for things someone found self-evident and so only wrote a scribbled note about in a workbook 40 years ago. /s

FWIW patent law obviousness is not the same thing as ordinary notions of obviousness either.

All my personal opinion, ofc.


I believe I ran into this issue a few years ago and discovered the patent case when trying to work around. The xml file format allowed for arbitrary properties to be added (as xml does), and we were trying to embed metadata in word files. But when MS Word opened a file with anything extra in it it gave a warning like "this file has extra stuff in it" and it automatically removed anything that wasn't explicitly expected.


Not sure why this is downvoted, it’s absolutely correct. I tried this myself; it would have -greatly- simplified scraping Word docs because the custom tags would have been available for XPath querying. Alas, Word strips it all on open.


Yep, we had a similar use-case. I remember the error message pointed to a help page which pointed to an article about this patent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: