On Semantic vs Visual

by: Allen Firstenberg

This is an open letter to the lily and Flow communities. In a way it is an apology for my behavior, although many people will see it as a justification for that behavior instead. It probably is. It also addresses several other rants and extreme positions I've taken over the years on these forums. They have ranged from my anti-PDF stance to my most recent outburst about message formatting. In every case, I have taken an extreme position - usually in favor of semantic content formatting as opposed to visual formatting. I would like to explain my underlying thought process for all of these and why I consider it so important.

Before I begin, however, I should be very clear. I have understood every argument against my position, and for the most part I appreciate the comments of the people who make them. Their points make a great deal of sense in the context of their requirements, and in many cases, they are attempting to address their issues in the best way possible. In short - they have customers who have needs, and their solutions solve those needs in the best way.

Where I tend to have issues is that their solution often disregards other customers or potential customers. There are other needs which they may not understand, appreciate, or think are relevant. If they do, they may feel that there are better solutions for their specific requirements. They may be right. All I ask, however, is that due consideration be given to my thoughts on the issue. I have not tried to find specific answers to questions, I have sought general ones. The soutions I find are, I suspect, not perfect either. In most cases, they are not implementable yet. That does not mean they should be disregarded. In other cases, I admit that I do not see a good solution - but this does not mean that a bad solution should be the one we accept.

In this message, I already have made some sweeping generalizations. I will likely make more. Those who know me know how much I detest this, and yet, I hope these generalizations will be able to illustrate some of the issues I raise.

Let me make my first obvious generalization about the assumed target audience most of us are working with. We are working with ourselves, largely, or people very similar to us. Most of us are able to read printed texts and we use computers and browsers that, for the most part, can handle graphics. Where they cannot, we often have tools at our disposals to get the graphics for us to see. Our systems can handle a variety of fonts. We understand the semantic value of many of these as well. We have the cognitive ability to read things in columns and understand what they mean based on their row and column labels, and can usually detect that rows and columns are labeled. We process whitespace as meaningfull. We can distinguish between most major colors on the color wheel and those colors often have cultural context to us. We can distinguish between different audio signals.

There are classes of customers who do not have all of these abilities. Some of them are prevalent in our society, and are quite probably using lily and flow today. Others are only potential customers, who could benefit from these systems if we choose to do so. Let us think about a couple of them and how our system impacts them.

Can a blind person use lily today? I don't know of any who do, but the concept is reasonable in theory if we think about lily delivering semantic messages. Events can easily be given an audio cue instead of the *** prefix they have today. Replies to commands, similarly, can get a different cue or be said using a different voice. Messages, of course, can be read as identified from a person. In the future, we could easily visualize (so to speak) various voices being set for different members in the discussion.

But what about two messages from the same person in a row? We should have something to identify this break as well - it is certainly a peice of information that is given to those of us who can see the messages. The break in messages is semantically important and should be represented.

Let us think about some proposed changes. What if we allowed newlines in messages? It is a reasonable request (even if I have behaved unreasonably against it). What semantic meaning does it give and how can we represent this? The answer is that it depends. There may be cases where we are starting a new row of information. In other cases we are sending a poem and the poet, due to reasons of meter or other artistic concerns, breaks at this point. There are probably other reasons, but each one represents a semantically different meaning and, if we read it aloud, we will need to read them differently. How do we distinguish these?

If we think about XML, the way would be to define multiple tags. Visually, the client can display them identically. To someone listening, they can be rendered completely differently because the renderer now has an understanding about what is being transmitted. There is a downside to the XML route, however. It means that we have a large pile of tags that must represent every possible semantic meaning, and we must have a set of definitions for how we can render them. And we must have systems to do so. This can be daunting.

But what is the counter-argument to semantic formatting? It is the argument from layout specialists that they, the experts, know what is proper and best formatting. Many times they are right. Newspaper columns are the width they are (at least historically) to minimize eye movement and speed reading, for example. But what happens when the magazine now decides to publish a "large text" edition? Many things change, and sometimes the layout has to change with them. If things were stored sematically with various guidelines how to render them visually, those who fall outside those guidelines can follow along.

It poses an interesting challenge to artists, and I am curious to see where this will evolve. Can an artists create a work of art that can function by people who use their senses differently? Can this, in fact, be the purpose of a peice of art? Imagine, if you will, a symphony that is specifically designed such that you will capture the full spirit of the work if you were wearing protective earwear. Similarly, imagine a painting whose colors are chosen such that if you were wearing different shading eyeglasses, you would experience different messages - that it would be impossible for a person to "understand" the work if they did not do this? I am not an artist, but it seems to me that this is equivalent to what we must acomplish for semantic markup to work.

Finally, let us think about another target audience who is very different than those mentioned. This target does not understand many of the visual cues that we do, and must be carefully instructed every time what those cues are. Visually, columns are very difficult for this client, and spacing may cause all sorts of confusion. Simply, this audience is another automated program.

You might ask why this is relevant in the context of lily and Flow. After all, most of the clients are humans and not automated systems. The few automated systems that are currently out there do simple tasks that either respond to a simple query or just log their messages. Yet it is not too hard to envision using lily or Flow as a conduit between two automated systems. We already have a bot that takes data (stock quotes) from one system that was intended for humans and massages it into something printable. Why is it unreasonable to assume there will be another bot that collects this and does something else with this? And yet, the authors of these applications seek to be able to simply represent tabular format to make it easy for humans to understand the layout. What will happen when there are three dimensions of data to represent? Our screens are only two.

The semantic markup solves both these problems. For automated systems, it grossly simplifies the receiving program and reduces it to understanding (possibly nested) object, attributes, and values. For humans, it lets them feed the data into a local program to do all sorts of simple or complex visualization. Here is an illustration where the medium restricts the message (the program writer can only present 2D worth of data) and if we change the medium to use semantic markup, we can create a more rich message for humans and programs.

It is, perhaps, important to understand the background for all these feelings. I do not choose them arbitrarily, for the most part, and most of you know that I am in reasonable command of the senses that most humans work with. I urge for all these things because I have encountered them in past jobs - and peoples lack of attention to them then have caused me hours of time, some of it quite pointlessly.

The first professional task I ever received out of college was to take a program my team was maintaining and "stop making it look like a Christmass tree". It turns out that the program used red and green to distinguish between valid and invalid fields, and one of our major customers was red-green color blind. "Simple," I thought, "All I need to do is change the color configuration table." There was no such table. In fact, every color was individually set in every location in the program. I spent two weeks locating all these color settings and making a color configuration table and utility. Two weeks, because somebody felt that using red and green would be "right" for all customers.

A few years later I was responsible for creating a database of financial information. We received this data in a number of ways - some automated and some manual. The automated system displayed in "pages" that were generally (but not universally) formatted as rows and columns on an 80x24 screen. Color changes could represent if the value had just increased or decreased or even just that a change was made - it was not consistent. The vendor provided an application that could turn screen regions into "data streams" - all you needed to do was indicate the coordinates of the value you were capturing. This worked well until one number overflowed its field and the entire page format changed to handle it. I was not intimately involved with this problem, but I understand it was several hundred person-hours involved in changing the settings for the pages.

The printed data was an even bigger problem. It was a printed publication that was faxed to us daily that contained pricing information that was crucial to the business. Our process at the time was for a person to receive the fax at 5am and manually enter about 100 numbers into a database. A second person would review the entries about an hour later to verify there were no problems. While most of the numbers were in a table, several were not. There were frequent data entry errors. We met with the data vendor, and discussed the problem. They claimed to understand, and promised us that a digital version of the publication would be coming out shortly. When we finally received the new online publication, to our dismay we discovered that it was in PDF format and rendered exactly the same as the printed version. There was no semantic content we could use to locate the numbers we needed among a pile of other data, and it was not possible to use a pdf to text converter because PDF did not even understand columns. We contracted out the data entry. The company never really understood that we did not want their newsletter to read - we wanted their newsletter for the numbers that we entered into a database. They thought only humans read their paper.

There are many more examples, and I'm sure you can think of cases of your own. Even in the current debate about message formatting, there was the request that we have a way for the client to distinguish URLs so we can do semantic processing of the contents instead of specifically creating a visual display of them. I am sure that if you think of your own work, you will see many cases where you or your clients want, need, or have a way to semantically understand a chunk of data and think and talk about it in terms of the visual presentation of that chunk instead of the logical representation of it. We are visual creatures - so in many cases this makes sense. Many of the requirements specifically talk about the visual needs of the customers - and this also makes sense. I urge you to think about the semantics of this data, however, and provide a translation from this semantic level to the visual level.

If you have read this far, thank you. I hope I have at least caused you to think about these issues, and perhaps to understand why I am so passionate about them. I am willing to discuss and clarify any point that you feel is unclear. Please let me know if you have read this far, as I am anxious to know if I'm just shouting in the wind again.

$Id: rant-visual.html,v 1.2 2001/08/01 13:31:52 prisoner Exp $