Friday, 7 December 2007

Long Essay - The Open Document Format

1 – An introduction to the OpenDocument format

The OpenDocument format (ODF) is an alternative document file format to the one predominantly used in Microsoft Office. Unlike Microsoft’s proprietary document file format ODF is open source. This effectively means that the source code used in the creation of the format is freely available to anyone under a General Public Licence (GPL) and they may do with it as they wish (i.e. modify it for their own needs or redistribute it). Microsoft’s proprietary document file format does not allow users the same freedoms (such as modifying the format) that ODF does.

ODF was originally developed by the Organisation for the Advancement of Structured Information Standards (OASIS). The organisation represents companies from around the globe that have a desire to develop and implement standards within the IT world. The format itself is XML (a computer language whose primary purpose is to make it easier to share data across different applications) based and as such is promoted as an independent file format. This means that users will be able to view and edit files (saved in ODF) across different applications and on different operating systems should they desire to do so.

The initial version of the file format – OpenDocument 1.0 (Second Edition) – was approved by the International Organisation for Standardisation (ISO) and the International Electrotechnical Commission (IEC) on 04/05/2006 (ISO/IEC 26300:2006 refers). As a result of this achievement and through the continued ownership of the file format by OASIS it truly can be said to be an open standard. This means that users are not restricted in any way by what they do with it (such as modifying it or redistributing it).

As it stands this is the only file format that is an open standard and as such it has been implemented through various vendors (mainly through OpenOffice) and is slowly, but surely, beginning to find its way onto the public domain. Governments (such as those of France and Germany) are either evaluating ODF or have already implemented it to some degree with more likely to follow. ODF is highly desirable for governments as it will ensure that data saved in present times will not be inaccessible in the future due to different technological specifications. Microsoft clearly sees ODF as a threat to its dominance of electronic office applications and as such is currently trying to push through its own apparent open sourced based document file format called Office Open XML (OOXML).

However, a key obstacle for ODF is that the vast majority know how to use and are comfortable with Microsoft Office and as such they will be unwilling to switch to a different office suite. Microsoft Office suites up until Office 12 (2007) do not include native support of ODF but plug-ins, with the permission of Microsoft, have been developed. One such plug-in has already been released by the OpenDocument Foundation.

2 – The standardisation process of ODF

Whilst ODF was submitted to the ISO in September 2005 and approved as a standard in May 2006 the whole process started out several years before in 1999. In that year StarDivision decided to begin the development of a XML default file format. This decision was primarily based on the limitations of the older file formats. StarDivision’s ambition was to create an interoperable file format that would not be limited to one application or operating system and as such be used by other software companies. After Sun acquired StarDivision the project was expanded to include the open source community (in October 2000) primarily through the website OpenOffice.org. The idea behind this was for the open source community to be involved in defining a specification for the open file format.

In 2002 the OASIS Open Office Technical Committee had its first conference on the open file format’s specification and that year also saw two office applications (OpenOffice.org 1.0 and StarOffice 6) using the open file format as the default file format for the first time. The period of 2003 to 2004 saw the open file format being adapted to reflect recent technical developments in XML and office applications. The result of these adaptations saw the open file format become more refined and consistent. By the end of 2004 the open file format was first called the OpenDocument Format (ODF).

2005 resulted in further refinements of ODF as well as a public review being carried out. In May 2005 ODF was finally approved as an OASIS standard, which would allow it to be submitted to the ISO to be approved as an international standard.

3 – Application support for ODF


Whilst there is no native support (as of yet) in any versions of Microsoft Office there are a number of applications out there that support ODF as the default file format and there are plug-ins available for Microsoft Office (available from sources like the OpenDocument Foundation or Sun) that enables it to support ODF. There are currently eight applications (such as OpenOffice.org and StarOffice) that include ODF as the default file format and further to this there are numerous applications that can convert to and from ODF as well as online office applications (like Google Docs) that include support for ODF.

Whilst many of these applications both work in a very similar manner to Microsoft Office (up to 2003 (version 11)) and are either free (such as OpenOffice.org) or relatively cheap to buy (such as StarOffice) a huge problem remains in that Microsoft Office in effect has a near monopoly over the office application market. If ODF is to truly take off then it will need to be supported in the Microsoft Office application. There are plug-ins (as mentioned previously) that will include support for ODF in Microsoft Office but this is the next best thing to Microsoft actually including ODF has one of the file formats in the program. As governments begin to see the advantages of ODF they will apply pressure on Microsoft to include support for the format in Microsoft Office. Microsoft reported in July 2006 that it would include support for open formats like ODF in its latest office suite (Office 2007 (version 12)) though as of December 2007, nearly a year after Office 2007's release, Microsoft has yet to include support for ODF in the application.

4 - Adoption of ODF

2007 has seen the start of many governments evaluating the adoption of ODF. As ODF guarantees long term access to data without technical or legal barriers many governments now view ODF as a public policy issue. ODF would allow governments to access any policies made in the present day in the future, without having to change the file format as technologies and restrictions evolve. This is because ODF standardises file formats giving users better control over their documents. This advantage over traditional proprietary formats has resulted in many governments including those of France, Belgium and Germany adopting ODF in some form.

The aforementioned governments are starting to exchange documents in their current file format to ODF and are recommending that any future documentation be created using open standards (i.e. ODF). With these leading governments realising the benefits of ODF it is only a matter of time before other governments follow suit. It is likely that in the near future ODF will be the main format in the public domain and with this position of power it could start to make an impact on the private sector.

Even now businesses are responding to the needs of governments and their customers by implementing ODF in their products. In the market today there are multiple applications (both open source and commercial) that support ODF. With the increasing use by governments the pressure on companies to implement ODF into their products is only going to increase.

5 – Benefits and criticisms of ODF


There are many key benefits that come from using ODF over closed source formats. The users of ODF have improved ownership and access to the data they save. Using closed source formats users may not actually own their documents making them dependent on the technology vendors they purchase the products from. This is not the case with open source, as you cannot get technologically locked out of ODF files. The open nature of ODF allows for long term access to files without the worry of changing technologies.

As ODF aims to separate the document (information) from the application that created it a greater choice is available to open source users. This is because the document can be used by other applications without restriction or interference from proprietary code. This interoperability of ODF is a big advantage over closed source rivals. Due to this standardised base for interoperable document formats, it openly encourages innovation. Any company can design and distribute new applications and services but access to the documents will remain after this innovation because of the standardised base.

Many open source enthusiasts would argue that criticisms of ODF are only arising because Microsoft is backing the people making the complaints. However, there are some legitimate criticisms of ODF.

One criticism is the implementation of MathML as the standard to represent mathematical formula in OpenDocument form. Most mathematicians do not use or like this standard and would much prefer the use of the older TeX format for typesetting complex mathematical formula. TeX is the de facto standard used by mathematicians and is thought to be far more accurate than MathML. It is therefore unclear why MathML has been chosen as the standard for mathematical code.

The open source nature of ODF is the main benefit but it has also caused a few problems. ODF does not contain a defined formula language, which means some files are not compatible. OASIS is currently creating a standard for formula language (Open Formula) but until then some inaccuracies may occur. Applications using ODF as their standard document format do not use the same methods of providing scripting capabilities. A standard here would again be required to produce some level of consistency between different applications using ODF.

The functionality of ODF has also come under attack. ODF still has a few missing features like the ability to include tables in presentations. Microsoft also claims that ODF handles new extensions in a complex manner compared to their own open source format OOXML.

6 – Microsoft’s rival Office Open XML

Due to the threat posed by ODF to its proprietary office suite and the increased pressure from consumers to provide an open source product Microsoft has created its own open source rival to ODF called Office Open XML (OOXML).

Prior to the 2007 edition of Microsoft Office, its applications like Word and Excel used binary file formats for storing data. This restrictive manner of storing data has been under fire recently with the development of open source office equivalents like ODF. The benefits of open source file formats have attracted the interest of many governments and businesses and in response Microsoft began to develop OOXML.

So far OOXML has not managed to get approval for ISO standard. This is because there are concerns about the limitations to OOXML’s openness and the complexity of its standard document. The ISO does not want to create an incompatible rival to ODF that will allow the non-interoperable environment to continue. If OOXML and ODF were compatible then OOXML may find it easier to earn standard status. As it stands ODF meets all the definitions of open standard and is already in use worldwide, so the ISO see OOXML as unnecessary.

With the might of Microsoft behind it the adoption of OOXML could be quicker and more extensive than ODF. Many large companies have already backed OOXML and have started incorporating OOXML translators into their applications. However, OOXML is still an entirely new format so existing Microsoft Office files will have to be translated anyway so many companies may chose the ODF ISO standard.

7 – Conclusion

The use of ODF is only going to increase in the next couple of years, as organisations now understand the benefits of implementing open source applications. The better flexibility and choice that comes from using ODF will result in more governments and companies choosing ODF as their main file format. Competition from Microsoft’s OOXML will increase but if it continues to fail at getting ISO approval ODF will continue to compete with OOXML in the market.

8 – References

ODF Alliance – ODF Resources
http://www.odfalliance.org/resources.php

Wikipedia – The OpenDocument Format
http://en.wikipedia.org/wiki/OpenDocument

Free/Libre/Open Source Software: Policy Support
An Economic Basis for Open Standards
http://pascal.case.unibz.it/retrieve/3327/FLOSSPOLS-D04-openstandards-v6.pdf

Google’s Position on OOXML as a Proposed ISO Standard
http://www.odfalliance.org/resources/Google%20XML%20Q%20%20A%20(2).pdf

ISO welcomes Open Document Format
http://www.pcpro.co.uk/news/86931/iso-welcomes-open-document-format.html

History of OpenDocument
http://opendocument.xml.org/milestones

“Why ODF?” – The Importance of OpenDocument Format for Governments
http://www.odfalliance.org/resources/whyODF.pdf

ODF Alliance Hails Record Growth in Application Support for ODF
http://www.odfalliance.org/press/Release20071026.pdf

Application support for the OpenDocument format
http://opendocumentfellowship.com/applications

Microsoft Expands Document Interoperability
http://www.microsoft.com/presspass/press/2006/jul06/07-06OpenSourceProjectPR.mspx

Open or Closed? The ODF debate spills in LinuxWorld
http://searchenterpriselinux.techtarget.com/originalContent/0,289142,sid39_gci1179223,00.html

This project was a collaboration between Rory Linwood and Stuart Nibloe

1 comment:

PaulTopping said...

Regarding the choice of MathML, there are many reasons why MathML is superior to TeX for representing math: it is XML, it represents the structure of math in an unambigous, self-contained way, if Content MathML is used, it represents the meaning of the mathematics, it is accessible as it can be spoken and converted to braille.

MathML and TeX are really not in the same category. The TeX language is designed as a user interface for entering math while MathML is a computer representation and not designed for humans to type equations.

Finally, while it is true that many research mathematicians use TeX, they are a very small minority of people who do math. Talk to a high school math teacher -- most have never even heard of TeX (or MathML either, but it isn't for humans to use). Even if we limit the scope to researchers, TeX is just not that popular. Around 90% of research paper submitted to journals for publication are MS Word documents with Equation Editor or MathType equations.

Paul Topping
Design Science