If you are a Google+ user, then you now have a new search tool (the encrypted site is https://www.google.com/insidesearch/plus.html). When you are signed into your Google+ account your search engine results will be sorted for relevance in different fashion. Your search results will be sorted by what your Google+ friends say about the search term. This process assumes what your friends say is more important than other content.
This personalised search relevance is a boon for advertisers that want your attention. Google isn’t the first to do this. In 2010 Bing began ranking sites in search results based upon how many of your Facebook friends “like” the site.
The search engines and advertisers have decided that people want to search for other people and their opinions over other content. How convenient for the search engines and advertisers!
If you want a full explanation of the impact this will have for the Investigator, then read Phil Bradley’s article titled Why Google Search Plus is a disaster for search. Google is no longer my first choice, I start with Bing, then DuckDuckGo, and last but not least, I search Blekko.
RTBot (Real Time Bot) is a Real-time information service, where you can enter a topic title and get results from multiple sources (e.g. Wikipedia, Youtube, Twitter, Facebook, Flickr, Books, Newspapers, Magazines) all at once. This may sound like a normal search engine, but it isn’t.
RTBot provides content only for specific topics such as concepts, subjects, personalities, events, places, companies, products, etc., but not for broader, unspecific searches.
If you use this properly, you often get a lot of video in the results that would require separate searches to find. This can be quite useful when searching by a company or person name.
I have used Copernic for years, and just accepted its lack of a Google search. I just got used to it, and never sought a way to add Google.
At a recent conference, Kevin Ripa told me that a registry entry would solve the problem after I mentioned that it didn’t search Google. If you’re going to feel like an idiot, its good to shown-up by a really smart guy like Kevin.
Go to the registry key:
and insert the following string:
with value, http://updates.copernic.com/k2upd/agentex
iSeek is a good search engine to use when you are searching by a person’s name. It clusters search results by topic, people, places, and organisations.
An excellent article about a beta search engine with promise.
… a new data search engine called Zanran – that focuses on finding numerical and graphical data.
Zanran focuses on finding what it calls ‘semi-structured’ data on the web. This is defined as numerical data presented as graphs, tables and charts – and these could be held in a graph image or table in an HTML file, as part of a PDF report, or in an Excel spreadsheet. This is the key differentiator – essentially, Zanran is not looking for text but for formatted numerical data.”
The AROUND(x) Operator
A common complaint about Google was that there was no proximity search. Most people think that you cannot find thisword within x words of thatword. Wrong!
Google supports an undocumented search operator called AROUND(x) that works as a proximity search. To make the operator work properly, you must write it in all capitals and place it between the words. It will return results with variables of the words such as plurals, etc., as is normal for Google. This may be used with other operators within normal Google search syntax, for example you might add the site: operator.
Manipulated Search Terms
Huge amounts of money is being spent to manipulate highly competitive search terms in Google. I’m not talking about the normal link-building or link-buying and other normal efforts. The trend is related to criminal organizations trying to sell counterfeit goods through the US search results, and to a lesser degree, the results for UK, France and Germany.
The spammers do this through keyworded anchor-text heavy links provided by automated forum and blog spam along with hacked websites. These gangs create such large numbers of these sites and links that Google is having quite a hard time catching up with the spam. The Caffeine update that ranks sites faster may be degrading overall search quality as this trend seems to go back only 7 months or so.
- Don’t trust what you read on the Internet; it may be planted data
- If you don’t find anything interesting on the first few pages of Google, then you’re doing it wrong! Set-up the search preferences properly and make them persistent.
- Press releases and promotional websites are not a source of reliable data
- The Internet is not a “neutral” source. Fact-check and evaluate everything you find.
- The Internet is only one research venue
DIY Research is Not Practical
Over the last two years we have seen a DIY trend really take hold due to shrinking budgets. This has appeared in the areas of Due Diligence and Background Investigations particularly. This is false economy because the DIY Researcher doesn’t recognise changes like those described above, let alone what to do about such a distortion of the results.
The solution may be as simple as using OptimizeGoogle directed at a version of the search engine that does not implement Google Instant or it may mean conducting the search using a proxy in another country. If you don’t understand how to do this and why you should do this, then don’t give money to somebody based upon your research.
This application is particularly useful for searching for a person’s name in Google as it returns results in the same case as given by the user. The Query Box supports phrase search (quotes) but no other advanced search options. If you want to use advanced search options, then type your advanced Google query in the Query Box and use the second input box to provide case sensitive filter terms as in this example.
The user can set the maximum number of Google results that will be scanned through in the “Limit” drop down box. This is an upper limit for the depth of a search and it’s maximum value is 1000. Google does not serve more than 1000 results for any query. Actually the search will stop when 10 case-matching results have been found. The user can click on the “Next” button to get the next page (continue with scanning through the Google results).
In conducting Internet research we encounter the problem of persona isolation. In national security circles this is called the “27 Mohammeds problem”. Essentially, how do we know that the John Smith mentioned in a blog is the specific John Smith we are researching?
This leads to a another difficulty. An Internet reputation may not reflect reality. The Internet reputation may be fabricated out of malice. We must evaluate a conviction in the august Internet Court and determine if we believe it enough to not take a risk on the subject firm or person.
The following related articles may help you deal with this problem:
Open the Search settings at the top right of the Google Search page. This brings you to the Preferences Page. In the Number of Results section select 100. Next go to the last section for Google Instant, select the second option, “Do not use Google Instant“.
By disabling Instant, the full 100 search results should appear.
Facial recognition software
Enter a photo at http://developers.face.com/tools/#faces/detect and locate all photos of the same individual on Facebook. This is limited to your friends at this point, but some developers are putting this on iphone apps. You can snap a photo on the street and get all their info through Facebook and other services this way. In May 2010 they state that their Facebook apps have scanned over 7 billion photos in total and identified no less than 52 million faces.
This is something to watch as it has some interesting applications for the Investigator. Of course some people will think the sky is falling due to the mere existence of this app, but the technological genie was let out of the bottle a long time ago.
The following two articles are required reading for anyone who must search by company or product name.
Furthermore, the Official Google Blog post titled Showing More Results from a Domain, indicates that their algorithm is intended to show searchers more results from a single domain where evidence exits that there is a “strong user interest in a particular domain.” They also note that the last few results (on a search results page set to show 10 results) are from other domains to preserve diversity in the results.
This has serious implications for anybody doing due diligence research as many derogatory entries in the search engine database will not appear without additional search terms. It also means that search results set to 10, 20, 30, 50, and 100 per page may give radically different proportions of search results when sorted by domain.