Archive for the ‘Software’ Category

Ajax and IE 7: cache not invalidating

For the past day or two I had been struggling at work to figure out why Internet Explore 7 would not pay attention to the response headers stating not to cache the response. First, I tried setting the date header to expire instantly, with a value of -1. QA confirmed that this solved the problem in some revisions of IE 7 but not all. After digging around the web it turns out that you have to set a few more headers, one of which I had never even heard of. Here’s what solved the problem for me. This snippet is in Java but could apply to any language:

response.setDateHeader( "Expires", -1 );
response.setHeader( "Cache-Control", "private" );
response.setHeader( "Last-Modified", new Date().toString() );
response.setHeader( "Pragma", "no-cache" );
response.setHeader( "Cache-Control", "no-store,
                                 no-cache, must-revalidate" );


IE, ImageIO, and Microsoft’s ‘image/pjpeg’ prank

Today I came across an issue at work regarding the made up MIME type of ‘image/pjpeg’ while uploading images from IE 7. Java’s ImageIO was choking while trying to upload this type of file because it is not in the list of supported MIME types. The supported MIME type list can be viewed by invoking ImageIO.getReaderMIMETypes(). After reading many blogs of others pounding their head, making fun of Microsoft, and using other solutions like ImageMagick, I found a solution. Add this to your list of supported types and move on. This format is made up and will not cause any problems. Pass the input stream onto your File IO and be done with it. Microsoft owes me half an hour of my life back.

Dynamic Proxies in Java

Up until recently I never had to write any code that dealt with reflection. Before now I worked for CNET which is basically a series of differnent CMS’ with one request, one response. My first week at the new job required me to build a JMS system that would accept method invocations across the wire seamlessly for the producer. Java.lang.reflect was the answer.


Once you can wrap your head around the idea its pretty simple to write. Essentially what we are attempting to do is shield the caller from having to know anything about the other method that is going to invoked behind the scenes.


public Class MyConcreteProxyClass implements InvocationHandler {

   Object obj;
   public MyDynamicProxyClass(Object obj)
     { this.obj = obj; }

  public Object invoke(Object proxy, Method m, Object[] args)
         throws Throwable {
     try {
        // do stuff
     } catch (Exception e) {
        throw e;
     }
        // return something
     }
}

Now we have to create the proxy interface which the caller will implement. Any of the methods of this class that he implements and fire will invoke method on the MyConcreteProxyClass object.


public interface ProxyInterface{
    public Object myMethod();
}


Now its time to create wire everything up. There are several ways to do this which help hide this proxying like a static factory method but this simply demonstrates how this works.


ProxyInterface proxy = (ProxyInterface)
Proxy.newProxyInstance(obj.getClass().getClassLoader(),
         Class[] { ProxyInterface.class },
          new MyConcreteProxyClass(obj));

Java Concurrency in Practice

java-concurrency.gifThis is now one of my favorite books on Java which I am probably going to read again just to be sure I have soaked up as much information as I can. This is a practical book written by a practical person who understands his audience, engineers. He is straight to the point with his code examples and doesnt bore you to death explaining every little detail or referencing lines of code and functions 8 pages back. I wish more technical writers would be this concise and straight to the point. There is something to be said for any technical writer who can get their message across with fewer words. No where else can I think of a better example of the old adage “less is more”. I would highly recommend this book to any engineer, especially associates.


DoS attack at the office

Unfortunately for us we dont have a full time operations team at the moment. Its me really. The engineer, the frontend web developer, and admin. Im lucky that I saw it, put two-and-two together, and stopped it.


Anywho, this script kiddie managed to actually take down our appservers because our servlet container was only set to use 512 MB of RAM! This was not my doing although it has been operating fine for almost 2 years. Anyway, when we saw the increased spike in traffic we said great, maybe Googebot has come back around to save us from our perpetual Google dance. I noticed the increase in bot activity but thought nothing of it and went home. Little did I know it wasnt Googlebot.


The next day, i.e. today, I got into work to find our leads were still down but our page views were up. Thats odd? How and why could this be happening? I knew this wasnt people traffic because Google’s Urchin Tracker wasnt registering the requests because its Javascript (bots done fire it). It had to be a malicious user. I dug through yesterday’s logs and it turned out to be some script kiddie in Australia making about 25 requests per second from one IP address. What? No DDoS? Anyway, our machines performed well after more memory was allocated but they were thrashing a little.


Page load time
count.gif


CPU load
cpu1.gif

Faceting with Solr

After my discussion with Yonik Seeley, one of the Solr developers, I have to come to realize the way I was faceting was not correct. Although it worked, it did not work for speed and scalability. The proper way to query by facets is to use the ‘fq’ field as many times as needed. Each of these result sets is then cached, speeding up the next query as you add more facets. In line with the example I have been posting about, things have many tags, here is an example. When you run this query you will get anything with the phrase ‘stuff’ that is also tagged with ‘things’ and ‘junk’.


q=stuff&rows=10&facet=true&facet.field=tagsFaceted&fq=things&fq=junk

Javascript fading effects with MooTools

As well as using MooTools for validation, I am also using it for effects. Its nice to the the error boxes fade in and out. I think its just one of those touches that people appreciate (whether they really call it out or not). Anyway, MooTools makes it really easy as you can tell by this code.

function showErrorMessage() {
    exampleFx = new Fx.Style(‘error-message’, ‘opacity’, {
    duration: 500,
    transition: Fx.Transitions.quartInOut
    });

    exampleFx.start(0,1); /*fade it in*/
}

MooTools Javascript framework

Let me start by saying, I dont like Javascript but I love MooTools. Its not that I dont think Javascript is great and highly powerful, I just hate dealing with it especially when my job is the backend. I would do anything for a stacktrace like there is in Java!

Anywho, MooTools is a Javascript framework developed by a fellow coworker at CNET. Much like Prototype, Dojo, Scriptaculous and others it offers and extensible and extendable framework on which to build upon. I will cover more of my toilings when I have the time. I would highly recommend looking into it even if you have already used other frameworks. Here is a link to the MooTools site.

Solr configuration

Solr is quite easy to setup once you understand it. It is much like any other database setup. So given the following the table, we have to mimic something similar in Solr. It need to know what is “stored” versus what is “indexed” as well as facets and many other options. I will explain more later. Here is my MySQL table representing ‘things’.

things-table.gif

Okay. Simple enough right? So we want to do here is store most of this data; however, for facets we dont really need to store them. What does that mean? Well we want to be able to search through them and index them but when we ask for all columns of a given field it wont return this field. This will come into play later when I discuss faceting. Here is the schema.xml file for the above MySQL table:

<field name=”id” type=”string” indexed=”true” stored=”true”/>
<field name=”name” type=”text” indexed=”true” stored=”true”/>
<field name=”fileName” type=”text” indexed=”false” stored=”true”/>
<field name=”tags” type=”text” indexed=”true” stored=”true” multiValued=”true”/>
<field name=”isBackground” type=”text” indexed=”false” stored=”true”/>
<field name=”dateCreated” type=”text” indexed=”false” stored=”true”/>
<field name=”dateModified” type=”date” indexed=”false” stored=”true”/>
<!– for faceting –>
<field name=”tagsFaceted” type=”string” indexed=”true” stored=”false” multiValued=”true”/>


Now that we are complete with our basic table structure we have to tell Solr a few things about our index. It wants to know the primary key, or in Solr terms, the unique key. Also we have to tell it our default search field.


<uniqueKey>id</uniqueKey>
<defaultSearchField>tags</defaultSearchField>


Now we have one more very important variable that we have to tackle if want proper faceting on tags. We have to make sure that any time we write to the tags text field we also write to the tags string field. The difference is that the ‘tags’ field is stemmed, i.e. searching for ‘kids’ returns ‘kid’ and so forth. The ‘tagsFaceted’ field will return the whole words. One is human readable and the other is for the machines.


<copyField source=”tags” dest=”tagsFaceted”/>

Starting to work with SOLR

solr-head.gifSOLR, a wrapper for Lucene, was developed by a fellow coworker at CNET. It has recently graduated from the Apache incubation cycle and is now a full fledge project. It not only does it wrap Lucene with a simpler interface it more importantly creates a Restful type of API. I only been using this for a few months now but I love it. It will be launching it into production at work and for my own personal project.

So what is SOLR/Lucene

Lucene is an indexing system, much like a database, except faster and narrower. Narrower? Well essentially an index is a single table with a primary key per entry. It allows extremely fast full text searching with stemming that other databases cannot handle, or even support. MySQL for example, especially MySQL 5 will fall over under heavy load no ifs-ands-or-buts. I’ve seen it happen at work in the lab as well as in production.

Lucene, as well as Solr, are built in Java but Solr needs a servlet container to run like Tomcat, Resin, WebLogic, etc. It runs “next to” your database. All the data in the Lucene index is the same as 1 table in your database. When you write to your DB you will write to Lucene (via SOLR). When you delete, same thing. Updating, yes. You get the point.

Faceting

Solr, and Lucene of course, also support faceting which is very powerful. You have seen this on many sites especially in the world of online shopping comparison. It allows you to see how many other entries also share the same common attributes. In the example of shopping comparison you can see how many other products are also, under 20 dollars, made of cotton, and from Amazon Marketplace. This is very powerful feature and it is perfect for allowing users to drill down through the data.


I will post more findings when I have the time, including configurations.

Copyright © 2005-2011 John Clarke Mills

Wordpress theme is open source and available on github.