Open Source for Open Government – Can it be done?

The use of Open Source software for public administrations has recently been given official backing by the EU. The “e-Europe 2002 Impact and Priorities” communication from the Commission to the Council and the European Parliament, prepared for the Spring European Council in Stockholm in March 2001, (COM(2001) 140 final) made the following recommendations on the subject of e-Government. Public administrations should:

  • Develop internet-based services to improve access of citizens and businesses to public information and services,
  • Use the Internet to improve the transparency of the public administration and to involve citizens and business in decision making in an interactive fashion. Public sector information resources should be made more easily available, both for citizens and for commercial use,
  • Ensure that digital technologies are fully exploited within administrations, including the use of open source software and electronic signatures.

One could summarize these recommendations as “Open Source for Open Government”, but can it be done? Could one run a parliament using just Open Source software?

If the proposed legislation before the Congress in Argentina to make the use of Open Source compulsory in government is passed, their parliament could be the first to have to do so.

This paper looks at some of the freely-available software that could be used to implement a digital parliament. Not all of the software discussed is “Open Source” in the true sense, i.e. released under the GPL, but for the purposes of this paper the software is included if it is (a) free for non-commercial use and (b) the source code is available. It should also be borne in mind that software sometimes starts life as commercial and then becomes Open Source and vice versa. Other products may have basic or “lite” versions released as Open Source with the full-feature versions sold commercially.

Some of the programs mentioned are stable and usable now; others are “ones to watch”. Most of the software mentioned in the paper does not require source code compilation, except where indicated, either because pre-compiled versions are available (e.g. in Linux distributions) or because they are interpreted scripts, usually Perl.

Also, inclusion of a program in this paper does not, necessarily, constitute a recommendation. It is more a survey of some of the many freely-available programs that could be used to set up a digital parliament.

The Foundations

Linux

Linux would, of course, be central to any Open Source project. Linux has moved from being a hacker’s playground to a mainstream corporate product, supported by major industry players such as IBM, SGI and, to some extent, Sun. The many hackers are still, thankfully, contributing (for example, see Sourceforge) but their efforts have been made more disciplined by the selection activities of the end-user Linux packagers, such as Red Hat, SUSE and Mandrake.

Microsoft hates Linux and Open Source software in general, with senior executives recently describing it as “a cancer” and “un-American”, yet they are working with Ximian to develop an open-source version of the .NET infrastructure, the Mono project. That, and SOAP, seem to indicate that Microsoft now supports open protocols but not open code. Sun has a foot in both camps at the moment. They bought Cobalt and their boxes run Red Hat Linux. The ONE Open Network Environment they acquired with NetScape is Open Source but Solaris is not, although Sun keep hinting that it might become so.

It must be remembered that Linus Torvalds actually wrote only the Linux kernel, most of the extensive functionality in the hundreds of programs in Linux packages came from the work of the many thousands of programmers and contributors to the GNU (GNU’s Not Unix) project which continues to improve existing programs and introduce new ones. In many cases, the programs are on a par with or superior to equivalent commercial programs, GCC and GIMP being good examples.

Linux provides an operating system that can be used for both server and client systems and has the Unix file and print sharing protocols NFS and LPR/LPD that allow clients, servers and printers to communicate. Unix-based products have always been synonymous with the Internet, as most of the early Internet functionality was developed on Unix systems, and Linux includes all of the industry standard Internet protocols, both the early ones, such as Telnet, FTP, NNTP and SMTP, and the newer ones, such as HTTP and LDAP.

Although the stateless protocol used by NFS has resiliency advantages, especially between servers, NFS has drawbacks as a file sharing protocol, because of the inherent difficulty of securing UDP-based services. Samba can be used to achieve file sharing in an MS Windows environment. However, for file retrieval and storage, Web protocols can now be used instead and the file browsing can be made more user-friendly than just directory lists or a sea of file and folder icons. WebDAV also provides distributed authoring and versioning capabilities.

The recent addition of the Reiser journaling file system to the SUSE version of Linux is an important integrity feature both for mission critical servers and those wanting to run Linux on a laptop.

The Linux packagers are addressing the need to simplify the configuration and maintenance of Linux in an enterprise environment. Unix’s greatest strength, its incredible flexibility, is also its greatest weakness; there are too many commands to learn and it’s too easy to do the wrong thing. Sensible default installations and Web-based remote management of deployed installations will help to increase the penetration of Linux. Most of today’s generation of system administrators are not impressed by telnet and a command line prompt. The flexibility of the Unix command line, with lots of small programs being linked via pipes and backticks to produce “instant programming”, is legendary but strictly for power users.

Enterprise Services

Web Servers

Implementing Web servers is one area where Open Source has always been very strong. One of the earliest Web servers, developed at NCSA, has been enhanced in the Open Source environment to produce the world’s number one Web server, Apache, included in all Linux packages. It’s the most popular web server, with over 18 million web sites (even though many of those run on Microsoft-based servers), the fastest, the most flexible and still evolving. The new version includes features such as multithreading, I/O filtering, WebDAV support, multi-character and alphabet handling, IPv6 support, an updated API and improved operation under NT.

Web Proxy caching is another area where Open Source is very strong with the Squid cache software having very high performance and market penetration, often embedded in other products, especially firewalls. Squid can also cache DNS lookups.

Scripting

Where there’s a Web server, Perl is never far away. Larry Wall’s superb text manipulation language (and that is a recommendation!) has evolved to become the Web scripting language of choice and, via MOD_Perl, is directly built-in to the Apache Web server. So dominant is its position in CGI scripting, with thousands of scripts available (CGI Resource Index now lists over 2500), that Microsoft felt it necessary to contract an Open Source Perl developer, ActiveState, to improve the performance of Perl with IIS running on Windows servers. As a result, ActiveState now produce the definitive version of Perl for Win32 systems, as well as Solaris and Linux versions, but all Linux systems come already equipped with Perl.

Perl is capable of far more than just CGI scripting and, thanks to the many hundreds of modules contributed by the Open Source community (see CPAN), it can be applied to a very wide range of information handling applications and system administration tasks. Of particular interest are the large number of modules (over 75) for handling XML and a smaller number for handling SOAP.

Search engines

The HtDig search engine is very popular but is a C++ program that requires compilation. It offers fuzzy searching, subsection and limited depth search, notification of document expiry and full ISO Latin-1 support.

The Fluid Dynamics search engine is a Perl-CGI suite that, unlike many other similar programs, does not require the index program to be run via a command shell on the server; it is done as a series of META refreshes from a web client. The program also supports attribute indexing, relevance listing, optional use of mySQL, keyword triggered banners (intended for advertisements, but could be used for non-commercial things) and comes with Dutch, French, German, Italian, Portuguese, Romanian, Spanish and Swedish language modules.

KSearch is another Perl-based tool that features Boolean search, file and directory exclusion, templates, score weighting and the ability to search within previous results.

Those interested specifically in search tools should check the Searchtools site, which also contains information and reviews on other search tools. Another good site for search engine information is Search Engine Watch

Those of you who need to index MS Word documents might be interested in the Antiword program, which converts Word files to a text file. It does require compilation but, as it is a command line utility, it could be embedded in a script to do mass conversion of Word files to create text files that could then be indexed. Interestingly, it can also convert Word files into Postscript and handles mapping from Unicode to ISO 8859 local single byte character sets.

News Page Management

There are several good News Page managers available (as opposed to “What’s New?” new file finders). Typically, these use SSI to include the news material into your web pages. NewsPro is a very popular one with lots of support forums and addons available, always a good sign for an Open Source program. Features include automatic deletion or archiving of old news, email notification of new news items or personal “new since last visit” web presentation and the user interface is available in French, Swedish, Spanish, German, Dutch, Russian, Portuguese, Polish, Finnish, Italian, Hungarian and Norwegian. Some of the addons allow automatic addition of a news item as a new topic in a discussion forum, where people can then comment about it. Newspro is soon to be known as Coranto, perhaps indicating a shift to commercial operation.

Security

The are Open Source security offerings, such as e-Smith, Mandrake Firewall and others, but the temptation to outsource the problems to a commercial product or third party, such as the Internet Service Provider, are strong and understandable in the high-profile environment of a parliament.

However, for those that want to do it themselves, e-Smith is a combined server and gateway firewall, offering mail and file and print sharing. Mandrake Firewall is a dedicated proxy firewall and cache with URL filtering, intrusion detection and IP spoofing attack protection.

The use of SSL (Secure Socket Layer) to encrypt Web connections is well known. Less well known is SSH (Secure Shell), which allows you to log into an SSH server, transfer files, run X11 and encapsulate any TCP/IP session via an encrypted socket connection. SSH started as Open Source, then went commercial but is now available in a separately developed Open Source version OpenSSH.

In Germany, for the Aegypten project, the Federal IT Security agency (BSI) has hired commercial companies to develop Open Source software for their Sphinx email security standard. Essentially, Sphinx consists of S/MIME, a PKI compatible X.509 certificate profile and certificate revocation lists based on LDAP. Sphinx support is being added specifically to the email clients Kmail and Mutt.

Email processing

Most of the vast quantities of mail moved around the Internet run on one of the most famous, and infamous, Unix programs of all, sendmail. Love it or hate it, sendmail is still the king of Internet mail transport and ships with all Linux packages.

Some new Open Source programs have emerged recently to challenge its dominance, such as the very popular qmail and the very good PostFix from IBM which has better performance, flexibility and security design. Both have lots of add-ons available. Some Linux packagers (such as Mandrake) make PostFix the default, but it has not yet displaced sendmail as the dominant mail transport agent on the Internet

For email delivery, procmail is very good, very reliable and very fault tolerant. It is infinitely preferable to the traditional Unix offerings and makes an excellent partner to PostFix. It can also be used to implement sophisticated mail filtering. All parliamentarians are becoming increasingly concerned at the amount of email they are receiving from pressure groups either directly or indirectly, by providing a way of assisting individuals to send emails. The recent Congress Online project at the US Senate () highlights some of the problems and makes useful reading for anyone involved in providing email for elected representatives.

Directory services

One of the foundation directory services of the Internet, DNS, has long been closely associated with Unix. Linux offers DNS server capabilities (but not usually in “personal” versions). Newer directory service protocols, like LDAP, are also supported. Directory services have yet to take off in the way that they could and should; interoperability is the issue. DNS servers return a number, that’s easy, but returning richer information is more complex and requires both the protocol and the content representation to be standardized. XML to the rescue, perhaps?

DHCP is another, specialised kind of directory service, for the dynamic allocation of IP addresses. It is also widely supported by Linux “professional” distributions, i.e. those intended for servers rather than clients.

Server Applications

Zope is an application server, written in Python (another scripting language, like Perl, but more object-oriented), which ships with many Linux packages. It has many interesting features that make it a good choice for advanced Web-based services. Conceptually it is much harder to understand than a conventional Web server (think of it as publishing object methods instead of documents) but does have the merit of good documentation.

Enhydra is a Java/XML application server with a servlet-based application framework, an XML compiler, embedded Java for Dynamic HTML, graphical object/relational database mapping and reliability features such as clustering and failover.

Extropia Webware is a suite of 25 open source, web-based modules, many of which could be very useful to parliaments, including calendaring, discussion forums, real-time chat, task lists, mailing lists, web surveys, search, e-shopping, inventory and client management, contact books, document manager, project tracker, staff recruitment and web news broadcasting.

Databases

MySQL is the most widely used open source database. It is fast, stable and supports transaction or transaction-less control. It ships in most Linux distributions but is also available for Solaris and Win32 platforms.

dbXML is interesting because it is neither a database nor an application server, but a database application server. The core edition is a data management system for large XML document collections. With the difficulties and performance problems of mapping XML documents that are not data-centric onto a conventional RDBMS, this may prove an interesting alternative.

Fans of Borland’s Delphi can now use Borland’s Open Edition Kylix development system on Linux to develop license free, Delphi-compatible, native Linux applications.

For those applications that don’t justify a full RDBMS and SQL, there are several Perl CGI “flat” databases that may meet your needs. EZDB supports multiple user and group accounts, with permissions granularity down to the record level, and provides simple linking between databases, a search engine, rollback and email responders and notifiers. French, German, Danish, Spanish and Dutch language packs are available.

Workflow systems

Workflow is an area where the Open Source community lags behind the commercial vendors. Vivtek claim that their workflow toolkit wftk is the only open source workflow system. It does require compilation, to create either an ANSI C library or a DLL for Windows, but looks very promising. At present the system can use Oracle as the database backend but adaptors for Postgres, mySQL and Sybase are coming, along with integration with Perl, in the form of a workflow module, and with Zope. Definitely not ready yet, but perhaps one to watch.

The Desktop

The Linux desktop situation has improved dramatically over the past year, with the Gnome and KDE projects now starting to provide real alternatives to MS Windows and Office. Both usually ship in all Linux distributions with one or other set as the default.

The KDE desktop suite is now available in 34 languages and contains over 100 applications, including a GUI frontend for mySQL, a UML diagram tool, a 3D modeller and an XML document editor.

Koffice now includes:

  • Kword - frame-based word processor
  • Kspread – spreadsheet
  • Kpresenter - presentation graphics
  • Kvivio – a Visio-like flow charter
  • Kontour – an illustrator-like vector drawer
    Krayon – pixel-based image manipulator

  • Kugar – business report generator
  • Kchart – graph and chart drawer
  • Konqueror – web browser, probably the best available for Linux

The Gnome office suite () takes a library and component approach (the Bonobo architecture) and includes even more applications. In addition to the usual office tools on would expect to find, programs of particular note are:

  • GIMP – the leading Linux image editing program (a rival to Photoshop)
  • Two project managers – Touxdoux and Mr Project (promising but missing some key features)
  • Evolution – an integrated calendaring, email and PIM (which looks very similar to Outlook)
  • AbiWord – a word processor which also works under Windows and supports import of Word 2000, XHTML, Palm and RTF documents and export to XHTML and LaTeX typsetting formats.

The Gnome project has also produced an interesting email client, gmail, which is a mail frontend built using mySQL as the backend message store, allowing it to support very large volumes of mail (ideal for those users who never delete anything!) The mail “folders” (up to 255) are actually database “views”, based on SQL query filtering on a mail store of 20,000 messages or more.

Another highly-regarded mail client, although not part of the Gnome project, is Sylpheed. Looking a lot like Netscape Messenger, it supports multiple accounts, threading and image viewing. It is regarded as both fast and reliable.

Sun is integrating its Open Office suite with Gnome and it will soon become part of Gnome Office. Open Office is Star Office without the browser and the third party proprietary bits, such as the spell checker, the Adabas database component or the clip art gallery, but with improved Microsoft file filters and XML file formats. One drawback for non-English language users is that other language versions require a recompilation of the source code.

Whilst it might be argued that the open source office tools cannot yet compete with the “feature-rich” or “over-complex” capabilities of Microsoft Office, depending on your point of view, are those capabilities really required in a parliament? Most parliamentary documents are still all text and have relatively simple layouts. Legislation is the main exception, with its complex indenting and, often, the need for line numbering.

What is important here is the separation between content and appearance. XML now effectively forces content management and publication to be considered and handled separately, which is a good thing, strategically and practically. Paper and Web publication are two very different things and both are very different from the management of reusable information.

Citizen Interaction

Static information published on a Web site may satisfy the legal requirement of public availability but, if it is not easy to find, is that really “transparency”? Also, it is certainly not going to meet the EU recommendation to “involve citizens and business in decision making in an interactive way”. Closing the loop between citizens and elected representatives is seen as vital in “re-connecting the European institutions with the citizens of Europe”, one of the themes of the debate on the future of the Union, and “reversing the democratic deficit”, a key theme of the Third Global Summit in Naples, last March.

Discussion Groups

NNTP Newsgroups (“Usenet News”) have existed since the early days of the Internet. Linux contains both NNTP servers and several threaded newsgroup readers but NNTP servers are seldom deployed in a private, i.e. non-Internet, context.

Discus is a widely used free and for-money discussion board system, written in Perl as a series of CGIs. The basic version is free but the “professional” model costs money. It is very easy to setup but harder to customise.

IkonBoard is more flexible and has more features than Discus. In particular, it supports automatic user registration with the password emailed to them, online user facilities, interface skins, email support and extensive forum options. Other languages can be used but not readily. A new version is coming that promises to make this easier.

YABB is another very popular system, also Perl/CGI based, with even more features, including membership management, instant messaging, news, personal new message alerts, excellent navigation and CSS-based page customisation. Language files are available for Dutch, German, Italian, Norwegian, Spanish, French, Finnish, Hungarian, Catalan, Danish, Russian, Turkish and Swedish.

Peer-to-peer networking

The peer-to-peer file sharing concept, dramatically popularised by Napster for music files, has many other file storage, retrieval and sharing applications, including distributed, resilient file management that scales to the Internet. Open Source packages such as Gnutella and Freenet provide a more generalised peer-to-peer system than Napster. Freenet may have interesting possibilities in the area of democratic and free speech systems, as it supports the publication, replication and retrieval of information whilst protecting the anonymity of both authors and readers.

Filetopia is a similar system, which combines instant messaging, chat, email, file sharing, search engine and message boards, all protected with strong ciphers and PKI encryption.

Conclusions

So, could it be done? Could a digital parliament be implemented using only free software? Yes, it is possible now but probably not desirable. Linux, as an all-round, off-the-shelf, large-scale, enterprise solution is not yet ready. Smaller parliaments would be in a better position to implement it throughout. Larger parliaments would probably find the deployment and management issues a problem.

But, the situation is changing fast. In a year’s time it may be possible to say that it can be done. The shortcomings are not on the server side; there, Linux equals or surpasses most of Microsoft’s products, is fully Internet compatible and does not use proprietary protocols. It is enterprise configuration management and the client side where the gaps are apparent. But, much of the enterprise management software that administrators rely on for Windows is actually from third parties and when Linux becomes popular enough, they will want to do Linux versions. Some are already available, but nearly all commercial. Open Source enterprise management software is a wide-open area.

On the client side, it is not the OS that is the issue, but the office applications. Microsoft is still ahead. But, that offers the opportunity for a hybrid installation: Windows clients and Linux servers. That is certainly possible now and could be easier to deploy. Some Microsoft proprietary enterprise features, such as Exchange server, would not be available, of course. When the Linux client side matures to the extent deemed necessary or desirable, or the issue is forced by yet another unfavourable change in Microsoft’s enterprise licensing arrangements, the clients could be migrated to Linux also.

(Presentation to ECPRD WPICT Seminar, Dublin, 25 & 26 October 2001.)