Today's Technical issues.

With a bit more work to get done with AHIRC (work I'm doing with CivicActions and LINC) - I'll need to make Semantic Search fast enough for importing more than 2000 nodes. AHIRC has around 6000 nodes and can take 10-12 seconds for longer searches. Why?

  • MySQL 4 had a bug* with the statistics phase of optimizing SQL when you had many inner joins on one large table (this actually caused queries to hang)
  • MySQL 5 solved this bug, but again, this is many inner joins on one large table. This isn't only because of ARC's use of MySQL, but also appears in many other implementations of RDF stores using MySQL. AHIRC has 200,000 triples.

What it is not is:

  • PHP iteself. If written in C, some parsing would be faster but, when the regular Drupal page load times take 1-4 seconds, this might shave that down to nothing, but the query would still take 10 seconds.

I use the triples as a search index, and also a way to generate the results. So all content is duplicated in this index. This may not be the best approach for speed, but it converts Drupal content and taxonomy to RDF.

* I'll update this article with references when time allows