Friday, April 2, 2010

Java annotations and Lucene

I’m about to start a project where I hope to write a loosely coupled, extensible framework for full indexing ... specifically for searching content. However, in doing so, I don’t want my work to need to be repeated. Ideally the engine that is responsible for building the index should be a factory driven system that wraps the implementation.

TextWriter writer = IndexFactory.getTextWriter();

writer.write(obj);


The getTextWriter() method would return a factoried, default instance of a Lucene TextWriter. The reason to do this is it allows you to completely decouple the concern. Now our API is based upon he API of the concern, not the implementation of the concern.

This can be loosely coupled even more by create a var args block at the end of the API methods.


package com.me.indexing;

public abstract class TextWriter
{
        public void write(IIndexable _indexable, Object... _data)
        {
                if ( Validate(_indexable, _data) )
                        return DoWrite(_indexable, _data);
        }

        public abstract boolean Validate(final IIndexable _indexable, Object... _data);

        private abstract void DoWrite(final IIndexable _indexable, Object... _data);
}



which is subclassed as


package com.me.indexing.lucene;

public class LuceneTextWriter extends com.me.indexing.TextWriter
{
        @Override
        public boolean Validate(final IIndexable _indexable, Object... _data)
        {
                // validate that the _data block contains all data needed for the particular engine
        }

        @Override
        public void DoWrite(final IIndexable _indexable, Object... _data)
        {
                // write to the particular implementaiton
        }
}


This provides the ability to create a loosely coupled interface that all applications can implement against and removes the worry that the interface couple could change over time but provides the flexibility to change the engine on the backend.

We can take this one step further and introduce Java annotations into the objects that are being indexed to allow for metadata to be built into the object and consequently pushed into the index. This allows the contract to be solidified in the design phase, the metadata to be plugged into the object and to have an abstract implementation that is data driven and not product focused. Sound good? I think so.


package com.me.indexing;

import java.lang.annotations.*;

public @interface IndexData
{
        public string field();
        public boolean tokenized();
        public boolean additive();
}


Allowing for such information in the class that is passed to the writer as:


package com.me.someproductneedingindexing;

import com.me.indexing.*;

public class SomeClass implements IIndexable // this could additional be a class level annotation
{
        public SomeClass(String _someDescription)
        {
                m_description = _someDescription;
        }


        /**
         * Typical header
*/
        @IndexData(
                field = “DESC”,
                tokenized = false,
                additived = true
        )
        public string getDescription()
        {
                return m_description;
        }
}



Granted this entire post is barely a skeleton but it’s also 9.30 in the morning.

No comments: