Monday, January 04, 2010

Your own search engine and RSS feed

Written for Tav of the WFA, but the basics apply to all.

If you're using one of the main pieces of blogging software that are available they pretty much all come with an RSS feed and internal search tool built-in. If however you aren't; let's say you want some more flexibility in your blog posts don't like the licence terms, or simply have been doing this longer than most of the tools got to a reliable point how do you join in the fun?



I'll start with the RSS feed as that's the simplest - it's an XML file. If you're already using your own bit of software chances are you're already writing HTML so this shouldn't be a huge step.

At its most basic a RSS feed is a list of several items each comprising a title a description and a link with all items enclosed in a standard RSS specific body-style tag.

So if I were to create an RSS feed of the latest items for Tav it would look like this-

<?xml  version="1.0" ?> 
<rss version="2.0">
<channel>

<title>The Wyre Forest Agenda</title> 
 <description>Weblog of Tavis Pitt on Wyre Forest</description>
<link>http://www.wfa.org.uk</link>

<item> 
<title>Monday 4 January 2010: School's nostalgic exhibition
</title>
<description>From Stourport News and Kidderminster Times, 
Friday 3 June 1977...</description>
<link>http://www.wfa.org.uk/?/cal/20100104</link>
</item>

</channel>
</rss>
Save it as an XML file then in the head of your main page add in
<link rel="alternate" type="application/rss+xml" 
title="WFA RSS Feed" href="http://www.wfa.org.uk/feed.xml" />

New items would go above the old items, once you hit 15 take off the last one in the list. So that's the downside, blog software manages that for you you have to add it in manually unless you can auto-generate it from tags in your entry.

Okay onto the search engine and this is even easier. Rather than write one yourself use Google. You need a Google account, but that shouldn't be a hardship they don't ask for much information. Just visit here and follow the steps.


So is that it? Well not quite for Tav. See he writes his entries in their own separate htm files so clicking on a link from either the feed or search results will leave a rather blank page with no real way to navigate. In this instance he's spotted this flaw and has a small snippet of vbscript that checks to see if it's in a frame and if not puts one around it. Sadly vbscript is one of those delights from Microsoft that tend only to work in their own creations.

So as Tav loves his javascript here's an alternative specific to his needs to try.

<script language="JavaScript">
  if (window.frames.length == 0) {
    window.location = window.location.hostname + "/?" +
window.location.pathname;
    };
</script>

Not tested, but should work for every browser.

That should be that.

2 comments:

Tavis Pitt said...

Thanks for this FlipC.

I guess I should explain some history of the 'agenda. It started in 2004 and at that time I had at least six years experience of web development and even longer in database development. For work I would follow the same routine: atomise the data into the database (build the database); storyboard the website; build server-side script (VBScript, ASP); add the cosmetics (graphics), and then add content (usually the responsibility of the user, but not always). The server-side scripting was really big back then because the total lack of functionality from Crapscape Crapivator which pulled down client-side scripting. When it came to the 'agenda I decided to turn this procedure on its head and I created the content first, in the understanding that I would build up the functionality as I went along.

To be honest this really worked and in 2004/2005 there were very few weblogs around for me to pick up an idea of functionality. However, about two years ago Don requested a search facility and I also wanted to try out RSS. I did look at Google for searching, but it didn't work how I wanted it. For instance go to Google SitSearch and enter 'www.wfa.org.uk' in the URL textbox and 'Stourport' in the Query textbox. The first result 'Stourport is a canal town' directs you to the root index (no good), the second result directs you to a page successfully but the title of the result is 'http://www.wfa.org.uk/cal/20070130.htm Tuesday 30 January 2007 ...' whereas I would have preferred something like: 'Tuesday 30 January 2007: Meet Tesco if your name's down'. I have looked into it (back then and just now) and I cannot customise it to how I want. So back then (early 2006) I started to put all my blogs into an Access database (and then SQL Server) manually. It took ages, about two weeks of blogs took about 3 months and that was also eating up my 'agenda blogging time. I even tried to write an application to automate the process, but they never got off the ground.

I thought that since I'm writing a database for searching, I could use it to write the 'agenda (bye bye iframes); the comment functionality; and also easily (at the touch of a button) create RSS. That was (and still is the plan).

Also I did originally write my out-of-iframe script in VBScript, but soon found Firefix users couldn't read my blog so I changed it to ECMAScript (JavaScript). See FrameControl.js. Since I was writing a htm page for each day (usually one blog) I didn't have the time or resources to retrospectively update the pages. So very old pages have no frame control, old ones a VBScript and then new ones have JavaScript. Recently I changed my comments provider and I had to remove the script because it was interfering with the comment functionality.

Anyway I have already started to build some my RSS feed (even before I started to read this blog!). See: RSS.

FlipC said...

No disparagement meant of your skills and I hope none taken.

It's a shame that as vbscript can call javascript functions that Firefox et al won't process it at all as that would mean simply altering the old script.

As for Google I'm sure you know that that's down to a lack of a title element in the cal pages so it just reads off the first bits of text it can find. Depending on how you have them stored processing the 'title' from the colon onwards and turning it into a title shouldn't be that difficult.

Yes Google does like to return the main page and does it for mine too in my case because it's reading the entries listed down the side for each month.

Would be nice to get rid of the iframes how about using PHP to generate the equivalent structure? Should tie nicely to the SQL database.