Archive for the ‘Computers’ Category

Dashes, underscores, and CamelCase in your URL’s

I learned this the hard way at work probably about 8 months back. Take a guess which of the two Google doesnt like and cant understand? Thats right, dont use underscores ‘_’ or camel case ‘camelCase’ in your URL’s! Although URL’s arent the most important part of your site but they definitely do have an impact. To Google, as dash represents a space and thats what you should live by.

Need proof? Notice anything different about the two result sets?

http://www.google.com/search?q=main+page
http://www.google.com/search?q=main_page

Thats right. They are two different entities. The underscore doesnt create a space and therefore is NOT stemmed. This is a problem inherent to MediaWiki (my favorite wiki). Although wiki’s werent made for this purpose they have surely evolved into something different. Actually, as I am writing this now I just got the thought to make an Apache rewrite proxy to change the underscores to dashes. If I get it to work I’ll post my findings.

Learn from my mistakes. If you have the chance to make a rewrite or better yet start from scratch use dashes.

H1 tags are very important but use them wisely

When browsing the Gallery 2 installation on the A1 Imports Gallery I noticed that the default implementation was to use H2 as the headings! This is horrible, pointless, and not very well thought out. I realize that its not the job of the Gallery 2 engineers but they went far enough to make it an H2 so why not just make it an H1? Anyway the problem has been solved and we’ll see what type of results we get. Be sure to use them on your site, but use them wisely. A good rule of thumb is to only have one per page that is very specific to the content. Over-bloating your page with these tags will definitely send the wrong message to Googlebot.

This site is complete! Plus 1 for the portfolio.

I have had so many sites over the years and Im really sick of it! I finally have one site that I am happy with, serves its purpose, and is easily updatable. Although I am mostly a Java engineer, PHP is where I came from. There’s a right tool for ever job and this site is no exception. WordPress is a great little app and suits its job well. It even loosely follows a pseudo type of tiles MVC system.


Just in case I get bored of the look I can always modify the view via CSS. Although this site doesnt validate due to WordPress, its strict XHTML. Although tables are great for tabular data and I still believe in them this site is a table-less design.

Redirecting all subdomains to www

In order to get the cleanest and best ranking possible, always redirect your subdomains (the ones not in use) to www! I cant stress this enough. When Google comes by and sees that http://site.com and http://www.site.com are the same it thinks that this is duplicate content, which it is! I just implemented this today for my friends site A1 Imports Autoworks. Go ahead, try it. Here is the rewrite for Apache:

RewriteCond %{HTTP_HOST} ^a1importsautoworks\.com(.*) [NC]
RewriteRule ^(.*) http://www.a1importsautoworks.com/$1 [L,R=301]

Javascript fading effects with MooTools

As well as using MooTools for validation, I am also using it for effects. Its nice to the the error boxes fade in and out. I think its just one of those touches that people appreciate (whether they really call it out or not). Anyway, MooTools makes it really easy as you can tell by this code.

function showErrorMessage() {
    exampleFx = new Fx.Style(‘error-message’, ‘opacity’, {
    duration: 500,
    transition: Fx.Transitions.quartInOut
    });

    exampleFx.start(0,1); /*fade it in*/
}

MooTools Javascript framework

Let me start by saying, I dont like Javascript but I love MooTools. Its not that I dont think Javascript is great and highly powerful, I just hate dealing with it especially when my job is the backend. I would do anything for a stacktrace like there is in Java!

Anywho, MooTools is a Javascript framework developed by a fellow coworker at CNET. Much like Prototype, Dojo, Scriptaculous and others it offers and extensible and extendable framework on which to build upon. I will cover more of my toilings when I have the time. I would highly recommend looking into it even if you have already used other frameworks. Here is a link to the MooTools site.

Solr configuration

Solr is quite easy to setup once you understand it. It is much like any other database setup. So given the following the table, we have to mimic something similar in Solr. It need to know what is “stored” versus what is “indexed” as well as facets and many other options. I will explain more later. Here is my MySQL table representing ‘things’.

things-table.gif

Okay. Simple enough right? So we want to do here is store most of this data; however, for facets we dont really need to store them. What does that mean? Well we want to be able to search through them and index them but when we ask for all columns of a given field it wont return this field. This will come into play later when I discuss faceting. Here is the schema.xml file for the above MySQL table:

<field name=”id” type=”string” indexed=”true” stored=”true”/>
<field name=”name” type=”text” indexed=”true” stored=”true”/>
<field name=”fileName” type=”text” indexed=”false” stored=”true”/>
<field name=”tags” type=”text” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”isBackground” type=”text” indexed=”false” stored=”true”/>
<field name=”dateCreated” type=”text” indexed=”false” stored=”true”/>
<field name=”dateModified” type=”date” indexed=”false” stored=”true”/>
<!– for faceting –>
<field name=”tagsFaceted” type=”string” indexed=”true” stored=”false” multiValued=”true”/>


Now that we are complete with our basic table structure we have to tell Solr a few things about our index. It wants to know the primary key, or in Solr terms, the unique key. Also we have to tell it our default search field.


<uniqueKey>id</uniqueKey>
<defaultSearchField>tags</defaultSearchField>


Now we have one more very important variable that we have to tackle if want proper faceting on tags. We have to make sure that any time we write to the tags text field we also write to the tags string field. The difference is that the ‘tags’ field is stemmed, i.e. searching for ‘kids’ returns ‘kid’ and so forth. The ‘tagsFaceted’ field will return the whole words. One is human readable and the other is for the machines.


<copyField source=”tags” dest=”tagsFaceted”/>

Starting to work with SOLR

solr-head.gifSOLR, a wrapper for Lucene, was developed by a fellow coworker at CNET. It has recently graduated from the Apache incubation cycle and is now a full fledge project. It not only does it wrap Lucene with a simpler interface it more importantly creates a Restful type of API. I only been using this for a few months now but I love it. It will be launching it into production at work and for my own personal project.

So what is SOLR/Lucene

Lucene is an indexing system, much like a database, except faster and narrower. Narrower? Well essentially an index is a single table with a primary key per entry. It allows extremely fast full text searching with stemming that other databases cannot handle, or even support. MySQL for example, especially MySQL 5 will fall over under heavy load no ifs-ands-or-buts. I’ve seen it happen at work in the lab as well as in production.

Lucene, as well as Solr, are built in Java but Solr needs a servlet container to run like Tomcat, Resin, WebLogic, etc. It runs “next to” your database. All the data in the Lucene index is the same as 1 table in your database. When you write to your DB you will write to Lucene (via SOLR). When you delete, same thing. Updating, yes. You get the point.

Faceting

Solr, and Lucene of course, also support faceting which is very powerful. You have seen this on many sites especially in the world of online shopping comparison. It allows you to see how many other entries also share the same common attributes. In the example of shopping comparison you can see how many other products are also, under 20 dollars, made of cotton, and from Amazon Marketplace. This is very powerful feature and it is perfect for allowing users to drill down through the data.


I will post more findings when I have the time, including configurations.

Spring binding many-to-many

Wow. This was a lot more complicated than I thought. After several days of research and questions I finally discovered how to make this work. Its all about custom property editors. Please note that many-to-many relationships are not the best thing to use in a production environment. They dont scale well and they are not easy to manage. I later on replaced with Solr, my new favorite implementation.

In this example, many ‘things’ have many ‘tags’ and inversely many ‘tags’ have many ‘things’.

protected void initBinder(HttpServletRequest request, ServletRequestDataBinder binder) {
   binder.registerCustomEditor(Set.class, “tags”, new CustomCollectionEditor(Set.class) {
     protected Object convertElement(Object element) {
       long id = NumberUtils.parseNumber((String) element, Long.class).longValue();
       Tag tag = tagMgr.getTag(element.toString());

       return new Tag(id, tag.getName);
     }
   });
}

Use Apache Rewrites much like an ‘if’ block

Recently I had to fabricate some nasty rewrite rules involving some unruly query parameters where if one was present do this, whereas if two were present do that, or if none were present redirect somewhere else. I had to step back after a few hours of work to try and look at this problem differently. All of sudden it became clear. Some code is omitted but here are two examples that I am using in production:


RewriteCond %{QUERY_STRING} ^pid=([0-9]+)&pid=([0-9]+)
RewriteRule ^/oldurl.html
   http://%{HTTP_HOST}$1/newurl.html?%{QUERY_STRING} [R=301,L]

RewriteCond %{QUERY_STRING} ^pid=([0-9]+)
RewriteRule ^/oldurl2.html
   http://%{HTTP_HOST}$1/newurl2.html? [R=301,L]


As you can see I am setting up gates or switches. The list went on and on and the order is highly important as you may notice, just as some ‘if’ blocks are (consider refactoring). In the case of a top-down script like mod_rewrite I feel this is not only acceptable but necessary to accomplish this task.

Copyright © 2005-2011 John Clarke Mills

Wordpress theme is open source and available on github.