Sat, 26 Jul 2008

Dzongkha Linux

There's a Bhutanese version of Linux available, Dzongkha Linux. If you head over there, you're greeted by a picture of Tux in a maroon monk's robe. Dzongkha is the Bhutanese name for their language. It looks like the Dzongkha word for Linux in Li.nag.so. At least, that's what's written above Tux's head. Dzongkha uses the Tibetan script. (Otherwise I would not be able to read it.) The word Dzong.kha means fortress in Tibetan. There's more about the language in the Wikipedia article. The article also mentions the controversy about the Dzongkha version of Windows:

In October 2005, an internal Microsoft proposal blocked the term "Dzongkha" from all company software and promotional material, substituting the term "Tibetan - Bhutan" instead. This was done at the request of the mainland Chinese government, who insisted the name "Dzongkha" implied an affiliation with the Dalai Lama, and hence, with Tibetan independentism.

I think the controversy stemmed from a confusion between the word Dzongkha and the name Tsong.Kha.pa, who was the founder of the Gelug school of Tibetan Buddhism that the Dalai Lama belongs to.

/code/ | permanent link

Sun, 06 Jul 2008

Repair Guy

I hope everyone had a good holiday. At least, those of you here in the United States, where it was a holiday. I spent my weekend making fixes to my web sites. Yes, sad, nut true. Having a web site is like having a dog. It must be fed, walked, and occasionally de-wormed. I only mention it because I fixed a long standing bug with search and permalinks on this site. So get ready for my geeky explanation.

All my code is written in Perl and it interfaces to the web using Cgi.pm, the standard module for doing this stuff in Perl. One thing my scripts need to do is get their own url (the address you type into a browser window.) The function that gets this information is broken on my Web hosting service for some unknown reason. So I had to write my own. It fetches some of the environment varbles that are passed to the script and parses the information it need out of them. I assume Cgi.pm does the same thing, but the environment variables it uses are configured in some non-standard way on my system. So everything should be working now.

/code/ | permanent link

Thu, 15 Mar 2007

A Visit from Google

Two guys from Google stopped by the Space Telescope today. Part of the reason was to grab a copy of the Hubble Archive. They built a neat computer to do this, a little aluminum box with three terabytes of disk drives. They snarf the data onto the computer, put it in a padded suitcase and Fed Ex it to their lab. This is to support their work on Google Sky, the counterpart of Google Earth. One of the guys gave a short talk on this "sneaker net" technology and the other gave a short talk on Google's commitment to open source. I hear that the Space Telescope is going to be one of the mentoring institutions for Google's Summer of Code. As far as my own code goes, I've finished testing the second version of Yeti, my small html templating library. It's really tiny, less than a hundred lines of code and I think it's one of the cleverest things I've written. The only thing left to do is to update the documentation.

I think Heinlein said that a technology is only perfected when it becomes obsolete. One example is the self-wicking candle, which was invented around the same time as the light bulb. So I have html templating code for Perl CGI scripts, hardly the cutting edge of Web technology.

/code/ | permanent link

Wed, 10 Jan 2007

Lola Lives

It's been a while since I wrote about the weblogging code I'm writing called Lola. Lola has been running on my weblog at work since the middle of December, but you can't see it, because my boss made me block access to people outside the Space Telescope Institue. Actually, if you saw it you wouldn't notice any difference from Blosxom, as the Perl code under the hood is invisible to the user. It now has through the web editing of posts in addition to ftp upload of the sort that Blosxom supports. It still needs a fair amount of work before I switch over this site. I need to clean up the code a bit, add support for plugins, and add user comments. When I do get comments working I definitely will switch, as that's my main motivation for writing it. Once I switch over to Lola, all the permalinks will break, as it uses a different naming scheme for posts

As everyone knows, yesterday Apple introduced a $500 nerd magnet called the iphone. One feature of the iphone that bears mentioning is that it runs the same widgets that the Mac runs. Widgets are small web browser applications that run in Safari. Widgets are a nice feature on the Mac, but on the iphone they're much more important, as they make it possible to reformat information from the Internet in an attractive, readable format on the iphone's small screen. Apple is bundling a development environment with the next release of OS X called Dashcode that will make it very easy to develop and test new widgets on the Mac. These widgets can then be downloaded to the iphone. I think Dashcode will be the "killer app" for the iphone and user developed widgets will sell a lot of iphones.

/code/ | permanent link

Sat, 11 Nov 2006

Lola Revisted

This weblog is done in Blosxom, but the version I use has been orphaned. For some time I've been working on my own weblogging code called Lola, but it's grown and grown with no end in sight. So I decided to take another look at the problem and come up with something simpler and smaller. Blosxom is a hack by Roel Dornfest, who has since moved onto other things, namely yet another Web 2.0 startup. The hack in Blosxom is that the Unix filesystem is the database. This is a fruitful ideas that saves a lot of code and effort, but as with everything there are trade offs. The trade off in Blosxom is that the metadata for each entry is limited to what the file system will store. My first attempt at Lola moved the metadata into a separate file, but this adds complexity to the code. Coordinating separate files is more problem than it's worth, as I should have learned with my work at the Space Telescope on different image formats. So the metadata will go back into the file with the post, called out by "#meta" lines. This idea comes from one of the many extension to Blosxom. The trick is to use the same format (and code) for all the files used by Lola to save time and effort and yet have a flexible enough system that you're not painting yourself into a corner. Conceptually, displaying a weblog is simple. You parse the path info passed in the url, fetch the configuration information, fetch the entry or entries based on the path info, pour the data into the html template, and print it with the proper headers. I hope to pare this process down to a thousand lines or so of Perl code. Anyway, those are my thoughts this weekend.

/code/ | permanent link

Mon, 07 Aug 2006

New KTD Site

The new KTD web site is in the process of coming on line. Naomi and Jack have been working on it for months. As far as the look goes, it's an improvement. But when you look at the html, you see it's an old fashioned design, using tables and spacer gifs for layout and javascript image rollovers for navigation. This makes for fat web pages, though in the days of DSL I suppose many won't notice. The Space Telescope site has many of the same problems, but it was designed five years ago when Netscape Four users were still a significant fraction of our user base. We're trying to move to a CSS based layout and I wish KTD had used this redesign to do the same. A site redesign of this scale is a lot of work to be handled by so few people and will no doubt be in place for many years. It's going to look dated as 2010 approaches.

/code/ | permanent link

Fri, 21 Jul 2006

Unpersuaded

I've been on the Internet for a long time and have been in my fair share of arguments. For all I know some of the people I've criticized and insulted are reading this weblog. I've come to some conclusions in that time and one of them is this: I don't take your arguments seriously. That's not because I don't like you or I think you're stupid. I don't take anyone's arguments seriously, including my own.

Here's why. Most people use arguments to support a conclusion that they've already reached. In a way, such arguments are like computer programs, where you're trying to generate a desired conclusion. However, as any programmer can tell you, most programs are buggy. It's very rare to write a program and have it work the first time. The human mind is just not capable of a sustained thread of reasoning without making at least one mistake. Hence, there are bugs in computer programs and errors in arguments. You can debug programs, but it's pretty unusual for someone to go back over their argument and try to pick it apart. And programs are handed to someone other than their author to test, because the author typically won't test is thoroughly enough. So for all these reasons, no matter what your argument is, I think it's probably wrong.

/code/ | permanent link

Sun, 16 Jul 2006

PHP Security

While I was at KTD, Jack showed me the upcoming KTD website. Because they're going to be taking reservations through the web, I offered to point him to some articles on PHP security. Since what I found is of general interest, I thought I'd post it to the web.

Here's a short introduction to web application security, with a focus on PHP. I have to warn you that I'm not that familar with PHP, the examples are untested and based on what I read in the PHP manual. But the principles are the same as in any other programming language.

The idea is that you have you have a web application that has permission to do the work it needs to run the application. It uses the server's operating system, reads and writes files, reads and writes to the database, and builds web pages to display to the user. The person attacking the security of your site tries to get your application to do what you didn't expect or want it to by supplying input to your application that you never expected. The solution to the problem is to restrict the input that your application gets by carefully checking it and rejecting or converting input that causes trouble.

There are several standard routes of attack that are used to subvert web applications. Your application can be tricked into running a command you didn't want it to. It can be tricked to write (or overwrite) a file you did not expect. It can be tricked to execute an sql command you don't want. (This is called an sql injection attack.) Or the web page your application builds can contain Javascript code that steals information from your interaction with the person running it. (The usual target is the session id cookie. This is known as a cross site scripting attack.)

This article gives a good introduction to common PHP security mistakes. Many are obvious, but it's surprising how often these problems are overlooked. This weblog post covers some of the same ground, but is short and makes some different points.

I'm going to focus on sql injection attacks. The two articles above both cover sql injection attacks. But to get more familiar with the technique, it would be good to read SQL Injection Attacks by Example, which works out a example attack on a web application.

The way to protect yourself is to take the input fields that are going to be used in your sql command. Divide them into fields that should have numeric values and string values. Use is_numeric() or is_int() to check the numeric values and reject them if they fail the test. Here are the PHP manual pages for these two functions.

To protect the string fields, use the function mysql_real_escape_string(). This function escapes characters that could be used to subvert your application by preceding them with backslashes. Here is the manual page describing this function.

Using this function is a little tricky, because of the PHP "magic quotes" feature also adds backslashes that escape some, but not all of the characters that mysql_real_escape_string() escapes. If you use this function and magic quotes are turned on, you'll wind up with double slashes in your input, where the same character has been escaped twice. You may or may not be able to turn off the magic quotes depending on your ISP. But what you can do is check to see if the feature is turned on and undo the damage it causes by stripping the backslashes that it adds. That's what the smart_quotes() function in Example 3 on the manual page does. The function looks fine to me, but I wouldn't rely on it to detect the difference between string and numeric fields. Some string fields, like zip codes look like numbers and you shouldn't rely on a function to try to guess if a field a string or numeric based on the user's input. The comments at the end of the manual page raise this point. Test numeric fields separately, as I suggested.

/code/ | permanent link

Mon, 19 Jun 2006

Unit Tests

Sorry, no dharma tonight. I was writing unit tests for Lola, more specifically for the flat file database, which I call Metabase. So far new and create pass their tests. I'm still working on add. I'd be more enthusiastic about testing if there weren't more lines in the test file than the code tested. I'm going to wind up writing 5000 lines of tests for a 2000 line file. Still, writing tests is addictive, like eating potato chips. Just one more test, get it to pass. Then the cycle starts again. Sixteen tests so far, fourteen pass. Which is why there's no dharma post tonight.

/code/ | permanent link

Sun, 18 Jun 2006

Lola

I haven't been posting much because I've been working on my new weblogging software, which I call Lola. While I was having trouble connecting, I finished all the remaining TODO's in the main package, Lola.pm.The next step is to start writing unit tests. The goals of Lola are similar to Blosxom, to make a simple to install weblogging package that places as few requirements on the host environment as possible. Where Lola differs from Blosxom is that metadata is stored in a flat file database separate from the posts. Much flows from this difference, First, you have to create a way to add posts through the web, so that you can collect the metadata along with the posts. That requires user authentication, so that only authorized persons can add posts. After attending the Python conference in February I decided to rework the code as another CRUD framework (create, read, update, delete), like Ruby on Rails. The difference between my code and Rails is that it uses a flat file instead of relational database, and software is written by subclassing a base class instead of using a code generator. The code which actually implements the weblog hasn't been written yet, Lola contains the base module from which it inherits. The point is that the weblog itself is just a small amount of code that defines the metadata fields, sets their values, and reads and writes the post to a file.

/code/ | permanent link

Sun, 11 Jun 2006

TextWrangler Scripts

TextWrangler is a free text editor for the Macintosh. It's what I use to wite for this weblog. It has one feature I haven't exploited until now: you can write scripts to modify text in the editor. You simply copy the script to the proper directory, mark the text you want changed, and select the script from the menu. The script reads the text from standard input and writes the result to standard output. This is the standard Unix way of doing things, so it's easy to use any of the standard scripting languages. Already I've dusted off my script to convert my own markup to html. I'm thinking about a Wylie to Tibetan Unicode script. More about that sometime later.

/code/ | permanent link

Tue, 06 Jun 2006

The Tibetan Web

I can be quite a nit picker sometimes and my neurosis was fed by a mistake I found on the Karmapa's link site. They had typed the Karmê in Karmê Chöling as Karme'. So I sent KTD an email with a brief note on html character entities and they fixed the problem. That got me thinking again about putting Sanskrit and Tibetan on the web. The transliteration of Sanskrit into the Latin alphabet requires a bunch of accented characters. All are available in the Unicode font, but some are rather obscure. This page has the fullest explanation of generating acceneted characters that I've found on the web.

The transliteration of Tibetan into the Latin alphabet poses no specual problems. But Tibetan script does have its problems. The following is the Tibetan script for sbyin.pa, which is the Tibetan word for generosity. It was generated by this neat javascript program. Unfortunately it will render poorly or not at all in all browsers I know about, because Tibetan is a "stacked" language. This means syllables grow vertically as well as horizontally.

སབྱིནཔ་

Let me close by recommending Tibetan on the Mac, a new weblog that discusses creating Tibetan texts on the Macintosh.

/code/ | permanent link

Tue, 23 May 2006

The Logical Camel

There was an interesting program posted which implements logic programming in Scheme. I've been reading MJD's very good book Higher Order Perl and it would be an interesting exercise to use the code in the Streams chapter to implement lazily evaluated logic programming in Perl. Interesting, but I don't know when I'd ever find the time to do it.

/code/ | permanent link

Thu, 23 Mar 2006

Less Is More

Mark Jason Dominus (MJD) now has a weblog and he's using Blosxom to produce it. He says that when he says Blosxom sucks he's really complimenting Blosxom and it's less is more philosophy. I, on the other hand, am moving away from Blosxom and writing my own weblogging software. I'm having fun, but so far there's no light at the end of the tunnel. MJD says:

When I went looking for blog software back in January, I was conscious that I was looking for the Worse-is-Better software to an even greater degree than usual. In addition to all the reasons I have given above, I was acutely conscious of the fact that I didn't really know what I wanted the software to do. And if you don't know what you want, the Right Thing is the Wrong Thing, because you are not going to understand why it is the Right Thing. You need some experience to see the point of all the complexity and subtlety of the Right Thing, and that was experience I knew I did not have. If you are as ignorant as I was, your best bet is to get some experience with the simplest possible thing, and re-evaluate your requirements later on.

When I found Blosxom, I was delighted, because it seemed clear to me that it was Worse-is-Better through and through. And my experience has confirmed that. Blosxom is a triumph of Worse-is-Better. I think it could serve as a textbook example.

I'm not sure what the Right Thing is either, which is why it's taking me so long to write my code.

/code/ | permanent link

older | newer

Powered by WebRing.