I have been documenting my steps towards this machine learning lead generation tool from the start on a private extranet and thought I would share my journey here.
In March 2009, I began to realise that because Google had to be all things to all men, it did not have the power to continue to deliver high quality search results for the niches my clients were in. It returned listings of pumps as a type of shoe, rather than the fluid mover I wanted.
So I decided to build my own vertical search engine, specific to the needs of my clients. I believe that no one person has an idea in isolation, so someone would have had the same idea. So I searched the web. The interesting thing was that because search engines are built on keywords (at least in 2009) they did not understand context and meaning. So, I did not get meaningful results until I stumbled on to the jargon in the field. I was looking for a ‘focused web crawler’ and they had been built before usually as some academic programming exercise.
And here was first lesson: sometimes all you have to do is ask.
I emailed the Prof through his web page and he forwarded my message to the student and I was sent his project report. How cool is that! Source code and everything.
Second lesson: as we used to find working in the pharmaceutical industry a lot of the time academic papers cannot be reproduced. So I had the source code but it did not work in practice.