This version of the Delicious website will be shutdown by April 2014.
We encourage users to switch to the new Delicious site at http://delicious.com , which features a responsive design for mobile and tablet users, offline access, faster loading, and more.

Building a (fast) Wikipedia offline reader

http://www.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html

Here's a screenshot: Wikipedia needs no introductions: it is one of the best - if not the best - encyclopedias, and it's freely available for everyone.

Everyone can be a relative term, however... It implies availability of an Internet connection. This is not always the case; for example, many people would love to have Wikipedia on their laptop, since this would allow them to instantly check for things they want regardless of their location (business trips, hotels, firewalled meeting rooms, etc). Others simply don't have an Internet connection - or they don't want to dial up one every time they need to look something up.

Up to now, installing a local copy of Wikipedia is not for the faint of heart: it requires a LAMP or WAMP installation (Linux/Windows, Apache, MySQL, php), and it also requires a tedious - and VERY LENGTHY procedure that transforms the "pages/articles" Wikipedia dump file into data of a MySQL database. When I say *lengthy*, I mean it: last time I did this, it took my Pentium4 3GHz machine more than a day to import Wikipedia's XML dump into MySQL. 36 hours, to be precise.

The result of the import process was also not exactly what I wanted: I could search for an article, if I knew it's exact name; but I couldn't use parts of the name to search; if you don't use the exact title, you get nothing. To allow these "free-style" searches to work, one must create the search tables - which I'm told, takes days to build. DAYS!

Wouldn't it be perfect, if we could use the wikipedia "dump" data JUST as they arrive after the download? Without creating a much larger (space-wise) MySQL database? And also be able to search for parts of title names and get back lists of titles with "similarity percentages"?

Follow me...

Identifying the tools

First, let's try to keep this as simple as possible. We'll try to avoid using large, complex tools.

Comments