Archive for the ‘HTML’ Category
Bring back STX and ETX
Saturday, December 30th, 2006Some people think that XML is the greatest invention since sliced bread. (I won’t name names, to protect the non-existent.) In contrast, I think that it’s just a symptom of a disease, a terminal illness infecting the entire computing world. Programmers are supposed to be smart, but we’re actually the ones who are responsible for the spread of the disease. We started and promoted the practice of using printable text to delimit printable text. In my haughty opinion, this was one of the worst ideas in the history of computers (surpassed only by SMTP and MacAppADay). It was doomed to fail from the beginning, kind of like the paradox of the liar. The use of printable text to delimit printable text has been the cause of countless bugs — let’s say 500 billion — and indeed, countless security vulnerabilities. It continues to plague us today.
Having worked on a feed reader, I do know a thing or two about this issue. I can tell you that it’s a major pain to parse XML feeds. Parsing HTML is even worse, but thankfully we can leave the majority of that to WebKit. It’s hard enough when everything is perfect, but we inevitably run into issues where the text is improperly escaped or not properly escaped. This is no fun for anyone.
Since the beginning, ASCII contained a number of non-printing control characters, but for some reason they have fallen out of favor. Among the control characters are STX (0x2) and ETX (0x3). Their position in the list of character codes indicates their importance: they were used to delimit text. With character codes such as these, parsing data into strings becomes trivial:
- Start parsing a string when you see a STX code.
- Continue until you see a ETX code, you see a non-character code, you reach a preset maximum length, or you run out of data.
- If the last code was ETX, you’ve got a good string. Otherwise, you’ve encountered an error, and you can do whatever error handling you like.
- There are no more steps. The characters in the string are all literal, no unescaping necessary.
Unicode has added some similar codes such as SOS and ST. I’d like to see even more control codes, to allow for fine-grained specification of the structure of the text. For example, we could have control codes to delimit words, sentences, paragraphs, etc. This would be similar to tags in HTML but without the use of printable characters to represent the tags.
Why don’t we do this now? One objection is that files containing control characters are not human readable
. I think that this is a lame excuse, because no computer file is human readable. Although my hard drive is enclosed, preventing me from examining the files on there, I have burned text files to DVD, and no matter how long I stare and squint at the shiny bottom, all I can see is my own reflection. Anyway, a lot of markup is human readable only in the sense that Derrida is human readable: there is a series of legible text characters, but do you really want to wade through all the crap to make sense of it?
Perhaps the real point underlying this objection is that control-character delimited text would not be readable by simple (i.e., dumb) text editors. This is true, but why should we be ruled by the lowest common denominator? Many modern text editors are quite intelligent and could handle the new format easily. They can already parse various forms of syntax and highlight them for the user. Let’s not let backward compatibility hold us back. That’s certainly not the Apple Way. It’s not entirely the Microsoft Way either; after all, the Word file format makes no concession to simple text editors. Neither does the cross-platform Adobe PDF.
The most powerful objection to using control characters as text delimiters is that we shouldn’t force users to learn how to input control characters along with text. I agree, which is why I think the burden should be placed on computer programs — the text editors and command line interpreters — rather than on users. When taking text input from users, an app should do the following:
- Use the context to guess the user’s intention.
- Give a visual indication of the guess to the user. Syntax coloring is one example, but the possibilities are endless. Be creative.
- Make it easy for the user to correct bad guesses.
In command line interpreters, by the way, there’s no good reason why the space key needs to separate arguments, as opposed to a key for a non-printable character such as escape. It’s the 21st century, by Jove, and we should be finally be able to use any printable character in a file name, including colons, quotes, slashes, and spaces, without having to do voodoo on the command line just to refer to it! (I won’t even mention hierarchical file systems, which are themselves a bad idea. Oops, I just did. Since I mentioned it, the ideal behavior when a user enters a file name is to quickly find the named file or files, which any decent file system should be able to do, and show a visual preview so that the user can verify or choose the correct file, if necessary.)
This rant has been brought to you by BBEdit. The makers of BBEdit, I assume, take no responsibility or credit for the content here, nor do they endorse the opinions I’ve expressed. (Or do they?)
(No, not as far as I know, which is nothing. In any case, I do endorse BBEdit.)
CSS tips for HTML
Thursday, November 9th, 2006I apologize for the absence of posts recently. I’m not dead! I just thought I’d go for a walk. I was feeling the pressure to be funny constantly, so I’ve contracted with Bobcat Goldthwait to ghostwrite my blog posts for me. Umm, I-I-Arrgh, I want a yacht!
During the hiatus, I’ve also been redesigning my web site. Superficially it looks similar, but under the hood it’s now almost entirely hand-coded (with Smultron). The only iWeb-generated code I kept was the JavaScript photo slide show. The benefits of the redesign are that the pages should load faster, and it’ll be easier for me to update the content, as I just did by adding a link to Mac OS Forge, which has finally emerged from its beauty sleep.
In honor of our new site styles, the subject of this post will be CSS. Sorry, folks, one cannot live on Cocoa alone! One also requires marshmallows, and Fritos. You may have heard of collapsing margins, which occurs in a number of contexts, for example, on election day. It also occurs when the bottom margin of a block-level element on a web page adjoins the top margin of another block-level element. The two margins collapse, and the length of the displayed margin becomes the length of the larger of the two adjoining margins. This can be useful, because typically you want a minimum margin around a paragraph of text, and you want a different minimum margin around a heading, but you don’t want a huge gap between the heading and the first paragraph. (Unless you’re trying to turn 8 pages of actual writing into a 15-page term paper. I know who you are, slackers! I was a teacher. Your grade will reflect your ef-fort.)
What you may not realize is that collapsing margins also occurs when the top margin of a block adjoins the top margin of its parent block. If you load the following HTML in a browser, such as Safari or Camino, you’ll see a purple box at the top of the window.
<html lang="en-US">
<head>
<title>Collapsing Margins</title>
<style type="text/css">
body {
background-color: #000000;
margin: 0;
color: #cccccc;
}
#all_content {
background-color: #590079;
margin-top: 0;
margin-bottom: 0;
margin-left: auto;
margin-right: auto;
width: 40em;
height: 40em;
}
p {
margin-top: 1em;
margin-bottom: 1em;
}
</style>
</head>
<body>
<div id="all_content">
</div>
</body>
</html>
This is to be expected, because body and all_content both have top margins of 0. However, if you add a paragraph, <p>This is a paragraph.<p>, inside the element all_content, something unexpected happens. Instead of the purple box at the top of the window with the paragraph spaced from the top of the box, you see the purple box spaced from the top of the window with the paragraph at the very top of the box.
The margins of the paragraph are supposed to be calculated relative to its containing block, which in this case is the element all_content, so you would expect that the margin should appear between the paragraph and the box rather than between the box and the top of the window. The reason the opposite occurs is that even though the box has a top margin of 0, the top margin of the box and the top margin of the paragraph are technically adjoining, which means the margins collapse, and the one true margin becomes the larger of the two top margins. Since margins are transparent, the black background shows through.
The CSS specifications for collapsing margins are, to put it frankly, idiotic. Or to put it more tactfully, they’re insane. Who in the world wide web would want the top margins of parent and child blocks to collapse? I gave the paragraph a non-zero top margin because, well, I wanted it to have a non-zero top margin! I don’t blame the browser developers for complying to the standards; web designers demand and expect the browsers to be standards-compliant. Rather, I blame the standards developers for developing substandard standards. I can’t imagine what they were thinking. Anyway, the workaround for this bug, as I shall call it, is to give the parent block, in this case all_content, a non-zero border-top or padding-top. If the parent has a border or padding, then the top margin of the child does not adjoin the top margin of the parent, the margins do not collapse, and all is right with the world again.
Speaking of standards-compliance, I used to consider Mozilla’s Gecko, the rendering engine behind Camino and Firefox, the standards king. Perhaps it was, in the early days of Safari. I believe that Gecko has lagged behind while WebKit, the rendering engine underlying Safari and Vienna, among others, has caught up and surpassed it. While testing my web site in Camino, I ran into a couple of problems. First, Gecko doesn’t handle the inline-block display property. This bug has been open since 1999. I would advise web designers not to use inline-block if they want their pages to display properly in Gecko browsers.
Second, there is a bug, open since 2004, in how Gecko calculates percentage margins for a float with width: auto. You can see this by comparing how my résumé looks in Safari and Camino, under the “Degrees” heading, for example. (Is this a ploy to get you to read my résumé? No, just a happy coincidence.) Percentage margins are supposed to be calculated relative to the size of the containing block, but Gecko calculates them relative to the size of the float, so that the width of the margin is a percentage of the sum of the widths of the float and margin. This makes the margin far too small. My workaround, as you can see in the HTML source, was to increase the margin’s percentage size somewhat. It still looks too small in Camino, but I don’t want to make it too large in Safari. Although I could have used a table in this situation rather than floats, it made much more sense semantically to arrange the content by columns rather than by rows.
I hope that you enjoyed my rants, err, tips. I promise to talk about Cocoa programming in the next post. I also promise to lower taxes, raise spending, balance the budget, ask not what my country can do for me, fear nothing but fear itself (that is, the form of fear, not those shadowy particular fears), put a chicken in every pot, a car in every garage, and a kitten in every lap. Of course, this all depends on what the meaning of the word “I” is. I want a yacht. Thank you very much. Argh.