Archive for the ‘Uncategorized’ Category
Building a blog atop a version control platform
Most blogging platforms are database-backed: they run with a relational database storing all your posts, and then query the database when it needs to present a post on a web page. You create posts by making INSERT commands, and edit with UPDATEs.
This model has some great aspects: it’s pretty fast (though not the fastest), it allows many users to run blogs on the one server (by adding a foreign key named blog_id to the posts table, for example), and it’s easy to query and search, just by using SQL (the web app you’re using to run your blog generally takes care of that for you). You can blog from anywhere, as long has you have a web browser.
But there are some weaknesses to this setup as well. For example, it’s hard to track changes (resulting in paragraphs at the end of posts starting with UPDATE: or EDIT:). And unless you work directly on the server (which is risky), you’ll have to use that web app to write your posts. Not a huge burden, unless you happen to be on dialup, or spotty 3G. It’s also hard to back up.
So how do we get around these problems, without getting rid of too many of the advantages that a web app-based blog gives us?
Of course, I wouldn’t be writing this if I didn’t think I had an answer.
A Version Control System (VCS) is a tool often used by programmers to keep track of the changes they make their code. If I introduce a bug to my code (it happens), then I can use my VCS to revert to the version of the code that didn’t contain the bug, and give it another shot. Or if somebody else introduces a bug, I can use the VCS we both use to find out who they are, and … umm… provide them with constructive feedback.
Although VCSs are used mainly for software development, that’s not their only use.
If we make a blog that uses a VCS for storage instead of a relational DB, and treat each post as a document, we immediately get some awesome benefits:
- You can track your edits. Writing an UPDATE: paragraph is no longer the only way to indicate that you’ve made a change since your original post. You might still choose to do it if your readers are in the habit of reading your blog that way, but it’s not the only way. For example, you could point to a list of edits on a site like Github.
- You can serve up posts blazingly fast. Because most VCSs track files living in folders on your hard drive, the most convenient way to use a VCS to track your posts is to store your posts as files. It just so happens that the fastest way to serve information over the internet is from files stored in exactly that form. And what that means for you is that you can use a less powerful, less expensive server to host your blog.
- Uploading images and videos is taken care of automatically. When you save your files to a VCS (a step known in the software business as “committing your changes”), everything you’ve saved in the VCS gets uploaded automatically. All you have to do is save it into your version-contolled folder, then link to it from your post. Simple.
- You don’t need specialised publishing software. If you’re hardcore, you can create your posts with just a text editor and a VCS tool. If you’re less hardcore, then I’m planning to make the software that will do this for you. It’s a “someday” goal at the moment though, so no promises.
Now, this hypothetical software isn’t going to work miracles. There are some things you might miss. For example, the “post from anywhere” concept becomes a bit more work for your friendly software developer, so that might not be there from the word “go”. I’m envisioning a desktop app as the first editor for this guy, with your VCS of choice running in the background. I really haven’t thought yet about the whole “theme” thing either. It’s a work in progress, people.
*lightbulb*
Although I guess you could do something like a build process, the step in programming where code is compiled and tested, ready for deployment. I guess that’s the step where you apply your layouts, assemble your RSS document, and whatever else you need to do to make your blog functional and amazing.
Mobile accessibility might also be a bit tricky – as far as I know, nobody’s found a good enough reason to write a mobile VCS client yet (please let me know if I’m wrong!), and that’s kind of a necessary component. Again, there’d be ways of getting around that limitation, but they’re not simple, and they’re not necessarily easy.
Anyway, that’s the idea. Feel free to take it and run with it – or point out who has done it already.
Art, sharing and copyright
There are some things that Western society usually encourages. Not that other societies don’t, but the West is the one I know. Art is held up as a sign of civilised life. Sharing is considered one of the most basic social skills. The rule of law is something we have set up enormous governing entities to try to preserve. Yet we are at a point in our development as a culture where a sharp conflict exists between these three principles: create art and cultural works, share what you have and obey the law.
The conflict arises out of a set of laws that were intended to give artists protection against other people passing of the artist’s work as their own. I am referring, of course, to copyright law. Copyright law allows the creator of a work to control who copies it, and under what conditions. The law says that we are forbidden from copying a work unless the creator gives us permission.
This rule gave artists an easy way to make money: I, the artist, make many copies of my work, and sell them to you. That works as long as I have control over the creation and distribution of copies. Which, in turn, works as long as copies are hard to make.
The problem that has arisen lately is that many works of art and cultural works (e.g. music, movies and TV shows) have been getting easier and easier to copy, now that we use computers so much. Copyright law has become unenforceable, because copying is so deeply embedded in the way we get data from one place to another. Copying had to become cheap, because we rely on it so much that we had to make it cheap. And now that it’s cheap, anybody can do it.
That means that the creator of any work that can be converted to digital form is no longer in control of its distribution. Well, legally speaking they are, but in practical terms that control is so easily broken that it’s easier to say it doesn’t exist.
Not only that, but as the number of copies of a work explodes, the value of each copy collapses. When record labels only distributed copies of songs on tapes, making illegal copies was a lot harder. The only people who did a lot of it had special equipment that cost a lot of money. Those people generally considered that cost an investment; they were selling their illegal copies for a profit. We bought them because we couldn’t afford to make them ourselves. Now that we can, the “bootleg” recording trade is dead and gone. We swap songs with each other for free, and because everyone can do it, there’s no profit to be made from illegal copies.
So is there any profit to be made from authentic copies?
As with so many seemingly straight-forward questions, the answer is "well, it depends." If authentic copies of your work are easily distinguished from unauthorised copies, then yes. You still have something scarce enough that you might be able to make money by selling copies. Paintings, sculptures or films on celluloid might qualify here. But if your work is easily copied with little to no loss (say, a movie on DVD), then you’re going to have to find a different way to make money from it. Public performances (like at the cinema) and merchandise are popular options, and convenience is a good one too (think of the iTunes store). But as long as you’re selling something that can be reduced to bits, you’d better be selling something other than just the bits themselves.
Some copyright owners choose to use the legal system as a way to make money from their works. Mostly these copyright owners are not the creators of works, but instead have made an agreement with the creator that allows them to make copies of the work, sell them, and sue others who do the same without permission. The most common practice is to use a court order to get the contact details of someone whom you suspect has been making illegal copies of your work, then get in contact with that person and threaten to sue them, unless they pay you an amount of money that is just a bit less than the cost of going to court. Because going to court would cost you more money, and there’s a chance you might lose.
What this means is that you’re using the court to get money out of someone, and not to start a court case. And the legality of such a move is … blurry. (I am not a lawyer. This is not legal advice.) It’s pretty hard to prove that someone is threatening to litigate without intending to go to court. But the strategy has been compared to extortion; it certainly has a “pay up or else” ring to it.
Anyway, the practical upshot of everything I’ve been saying is that we have some strong community values that are in conflict, and the balance between them won’t be restored easily. In the meantime, if you are creating works that can be made digital (and the number of things that can’t is going down steadily), you might have to be a bit more clever in order to make money. Not because you don’t have the right to make money from your work. You do. It’s just not going to be as easy as it was. You’ll need a model that’s more than just “copies for cash.” Because that’s going away. Piracy, sadly, is not.
A word on website hosting
I’ve had a few people and groups ask me about setting up their own website. After having this conversation a few times, I thought perhaps I should distill the common elements here. So here goes.
When choosing website hosting, most of us have 3 basic options:
A static website (i.e. no content management tools) is a simple way to host unchanging information. This solution costs the least money. The trade-off is that the content takes some effort and savvy (e.g. knowledge of HTML, CSS and FTP) to set up and to change. Problems such as “dead links” (links that go nowhere) can be a real pain point for readers and for the site maintainer, especially if the structure of the site changes during its life. You won’t need to worry about the inner workings of this website, because there really won’t be any.
A more complex and powerful option is a hosted content management system (CMS), e.g. Tumblr, Blogger or WordPress. The most important difference from the first option is that this solution allows you to change things easily. These solutions usually have a “freemium” model, in which the basic offering has no cost, but extra options (such as using your own template or your own domain name) attract a modest fee. Such solutions are aimed to meet the needs of those whose website is a tool to raise awareness of their offering, rather than being the offering itself. Hosted systems usually include basic support to help you along the way, meaning that while you need a little bit of tech savvy, you don’t need to be a guru to manage it. The basics are easily learnt by anyone who can use the web, and the inner workings are taken care of by the people you’re paying.
A self-hosted system (e.g. a virtual private server (VPS) or dedicated server) is best for those with complex needs, or those who want to host a web app that they have developed themselves. This solution will give you all the complexity and all the power. But be aware that when things break, there’s nobody to help you. These solutions also have a higher financial cost than the other solutions mentioned here, as well as needing more time to monitor, maintain and support. If you’re going with this, you definitely want to be a geek, or at the very least have a geek on hand who owes you several large favours.
I hope that this is useful to you. If questions arise, please comment.
URL Shortening Sucks When Bandwidth Is Scarce
There’s plenty of people on various blogs telling us that URL shorteners like bit.ly or tinyurl break the web, because they create an arbitrary dependency on themselves. For example, if you hit a URL on the bit.ly domain and bit.ly is down, you’re out of luck. But not only do URL shorteners break the when they go down, they make it slower even when they are working as designed.
Here’s why: when you click on a link from bit.ly (or ow.ly, or ur1.ca, or arseh.at) your browser has to go and make a request to the server that runs the URL shortener before it can do anything else – before it can even think about loading what you actually want to see. So every short link you click is slower than the unshortened link would be.
And that’s just for starters. URL shorteners also mess up most of the fancy things that browsers – especially mobile browsers – do to make pages load faster. I’m talking about techniques like DNS prefetching and HTML prefetching and… well, mostly different kinds of prefetching. But they rely on knowing which URL you’re likely to look at next, and if someone has put a mask over the URL you’re heading to (say, by using a URL shortening service), those optimisations can’t do you any good.
“Sure, that’s fine,” you might well be thinking, “but why mention it now?” Well, sir or madam or neuter, I’m glad you asked. The reason I’ve embarked on this particular rant at this particular point in time is that mobile web usage is only going up. That goes double if you include the use of apps that link, like Twitter, Facebook, Google+ and all the rest. The mobile web is becoming so prevalent that it’s almost difficult to get a new mobile phone that doesn’thave a built-in browser. So ubiquitous, in fact, that I am starting to use it rather a lot. And, as a consequence, get mightily annoyed at how bloody slow it is.
We don’t need more things that slow the web down. Please keep those links long. Because let’s face it: it’s not like we’re doing anyone but Twitter any favours.
Where baby HTTP 404s come from
DISCLAIMER: I maintain the code that I refer to in this post, but I didn’t write the code.
I’ve been hunting HTTP 404 errors (that’s the “File Not Found” variety), and today I came across a bit of a puzzler. The non-existent image files were being referred to by a stylesheet as part of a CSS background: instruction, but the rule that contained the instruction was never invoked. That is, the rule was for a class name that was never used.
How the hell was I getting these image requests if the code that started the requests was never being used?
Then I noticed that the errors were coming from just a few different user agents. And they all had something about them…
- LG-GW305/V100 Obigo/WAP2.0 Profile/MIDP-2.1 Configuration/CLDC-1.1 UNTRUSTED/1.0 lg-gw305
- Nokia2730c-1/2.0 (10.47) Profile/MIDP-2.1 Configuration/CLDC-1.1 nokia2730c-1/UC Browser7.7.1.88/70/352
- Nokia5130c-2/2.0 (07.95) Profile/MIDP-2.1 Configuration/CLDC-1.1 nokia5130c-2/UC Browser7.6.1.82/70/352 UNTRUSTED/1.0
- Nokia5330-1d/5.0 (06.85) Profile/MIDP-2.1 Configuration/CLDC-1.1 nokia5330-1d
- Nokia5530/UC Browser7.6.1.82/50/352
- Nokia5800 XpressMusic/UC Browser7.7.1.88/50/352
- NokiaC3-00/5.0 (04.45) Profile/MIDP-2.1 Configuration/CLDC-1.1 nokiac3-00/UC Browser7.7.1.88/69/352 UNTRUSTED/1.0
- NokiaC3-00/5.0 (04.45) Profile/MIDP-2.1 Configuration/CLDC-1.1 Opera/9.60 (J2ME/MIDP;Opera Mini/4.2.13337Mod.by.CHIZZY/503; U; en)Presto/2.2.0 UNTRUSTED/1.0
- NokiaC3-00/5.0 (04.60) Profile/MIDP-2.1 Configuration/CLDC-1.1 nokiac3-00 UNTRUSTED/1.0
- NokiaC3-00/5.0 (07.20) Profile/MIDP-2.1 Configuration/CLDC-1.1 nokiac3-00 UNTRUSTED/1.0
See it? That’s right, they all come from mobile browsers. This puzzled me a bit, until I realised that it might be something you’d do to speed up the user experience if you knew they’d be on a slow connection (and you didn’t care if they downloaded heaps of data they didn’t need).
So here’s the piece of knowledge I want to add to the internet (he said, arrogantly assuming it wasn’t there already): Mobile browsers prefetch images from stylesheets as an optimisation in the face of low speed connections.
You’re welcome.
Passion vs detachment
I’m confused about something, namely: how you can be both passionate and detached.
Example: I was working on a problem the other day, something about grade point averages for a graduate employment campaign. This problem and I have a history. I was determined to solve it, if not once and for all, then as close as practical. Then I ran into a roadblock – something or other to do with writing results back to a database. The details aren’t important. I asked my 1up for help, and he obliged by telling me that I should change my approach and avoid the roadblock completely.
I hated that idea. Admittedly I was already a bit cranky because I was hungry (cf. my blood sugar issues), but I was also really attached to my way of solving the problem. The idea that we should take a few extra steps for caution’s sake (like not writing to the DB before we check the results) was repugnant. Surely we should just get it done, right?
The problem here is that I was attached to my solution, not to solving the problem. So how do we avoid that? How can you be both passionate about solving a problem, and detached from your solution?
Protip: Keyboard-activated, cross-browser bookmarks
I’m a web developer. That means that I use at least 3 different browsers every day. And maintaining a set of useful bookmarks across all those browsers is a pain in the arse. Also, I tend to prefer typing over mousing. It’s generally quicker. Wouldn’t it be nice to have a set of bookmarks that I can use in any browser, and access from the keyboard?
Here’s what I did to get this working. I installed Texter (made by Adam Pash of Lifehacker). Texter is a little background app that does text substitution. You type foo and hit Tab, and it gets replaced with bar. Or whatever you like.
After you’ve installed Texter, you set up your bookmarks. You make a hotstring (Texter’s word for a thing to replace) for each of the URLs you want to bookmark. You might, for example, have one for your production environment, one for dev, and one for staging. And while you’re at it, one for Gmail, one for your blog and one for Twitter. Then you just type in your shortcut and hit Tab, and the URL is there.
Nice work. Well done. I’m proud of you.
There’s a lot more that you can do with Texter. This is just one example. Explore and have fun.
Personal Context – a ramble
Personal Context is a concept I (or anyone) might use to help a machine figure out what is interesting to me (them). It uses data sources like:
- what I read, watch and listen to
- where I go, online and off
- who I spend time with or talk to
- things sent to me, and who sent them
- what I write and talk about
- entities whose output I read (people, companies, machines, etc)
- my calendar
It figures out what I’m likely to be
interested in, based on metadata like:
- recency
- number of links from important (to me) sources
- what I do at certain times of day or week
- terms I search for often
It displays thing in real time, with the idea that if something is still important, it’s still being linked to. It relies on Jay Rosen’s “back story button” to fill us in. Hence old stuff is culled ruthlessly.
Obviously this requires access to a hell of a lot of personal, maybe sensitive data. That would be a problem for many people. It might be for me – I don’t know yet. But it would be amazing, assuming you could trust it.
Hang on – isn’t this the kind of thing Google has been working in for a decade and more anyway? On the other hand, do we trust Google enough to give them all the info I mentioned before? At this moment the answer seems to be no. So who would we trust? I can’t think of anybody, and neither, I suspect, could most of us.
So how does it change if we don’t have to trust anybody with our data, except perhaps in aggregate? Your own machine does all the crunching, you can access it remotely if you want, and you only download rules for processing, which are the same for everyone, or which you can add noise to (a false trail of data, if you will). Then would you trust the software?
I’ve only just realised that I’d assumed this thing would be open source, and hence open to scrutiny by anybody with enough patience. You can usually assume that by the time an open source product is remotely popular, somebody reasonably smart, cynical and suspicious has given the code a good look over. And if they’re not making noise, you’re probably safe.
Even given all that, it’s still a game of chance to some degree. No system is totally secure, but we get as close as we can.
Hopefully this ramble has proved interesting. If not, better luck on you next reading list item
Testing out Dave Winer’s podcast device idea
Turns out it’s not so easy to make a podcast on an Android device, at least not from the word “go.” I used the voice recorder on my phone to make a recording as proof of concept, then tried to upload it using WordPress.com. Couldn’t do it. It just wouldn’t work for some reason.
After that I tried installing the WordPress app, in which I’m typing this post. The app, it appears, doesn’t support audio uploads. Pics yes, audio no.
Obviously I’ll have to do a bit more work on this, and maybe if I still feel like a Winer fanboy in a couple of days I’ll do a “for poets” howto on the subject.
Edit: I totally didn’t read that last sentence before I posted. The Android keyboard is OK, but its predictive fu is not perfect.
Comments (4)