Tag Archive for 'google'

Explicit Words

It’s apparent that Google believes that its search algorithms are capable of determining the searcher’s intent. It is also obvious that Google filters out explicit image content, regardless of user settings. If you don’t believe me, just search for a few sex acts in the image search without any filtering and witness the effectiveness of the over-riding search restrictions.

This leaves the researcher wondering what words are on the “restricted” list. With all the euphemisms for sex acts it is easy to see that searches not related to sex acts might be restricted by Google’s all-knowing, all-seeing, algorithm.

Firefox Addon — Google site: Tool

I have written about the site: command in Google before.

The site: command in Google is an invaluable tool for doing Investigative Internet Research (IIR), especially in combination with other advanced operators.

Google site: Tool

Google site: Tool only works Firefox 14 or later on Windows 7.

It allows you to add site: or -site: to modify your Google search results. To limit your query to a particular site in the results, or to re-run the query excluding that site from the results, click the green URL below the result header. This works best on Google.com rather than the country-specific versions of Google. It also works on the encrypted version of Google.com.

This addon requires Greasemonkey.

Greasemonkey

A Firefox add-on called  Greasemonkey allows you to customize the way a web page displays using small bits of JavaScript.

Google’s Secret Proximity Operator

Serious searchers need a proximity search operator. In Google, it’s an undocumented feature.

The Google proximity operator is AROUND(x) which MUST BE IN CAPS. The number sets the maximum distance between the two terms. To make the operator work properly, you must write it in all capitals and place it between the words. It will return results with variables of the words such as plurals, etc., as is normal for Google.

This operator is handy when the combination of search terms is dominated by one term, but you’re interested in the relationship between two query terms. This is particularly important when searching names. A person’s name may appear with a middle initial in some instances and without it in other instances. This operator will find both instances. It will also be very helpful is the person’s last name is common or also used by another prominent person.

Where there is no RSS

Keeping track of sites that don’t offer RSS feeds or email updates can be a problem for Researchers and Investigators.

As of September 30th, Google Reader will be turning off track changes. Track Changes allowed you to create a custom feed to track changes on pages that don’t have their own feed. Page2RSS seems to be one of the few alternatives available to replace this.

Page2Rss will convert any web page to RSS feed. You can even add a button to your browser’s bookmarks toolbar that will create Page2RSS feed for the page you are currently viewing.

Another alternative to Google Reader’s Track Changes is  in the bottom left corner of the FeedBlitz home page. Insert a URL and get email updates from a website or blog that doesn’t offer email subscriptions.

Copernic Tracker – automatically looks for new content on Web pages, forums, and Social sites. When a change is detected, our Web site tracking software can notify you by sending an email, including a copy of the Web page with the changes highlighted, or by displaying a desktop alert.

WatchThatPage is a service that enables you to automatically collect new information from your favorite pages on the Internet. You select which pages to monitor, and WatchThatPage will find which pages have changed, and collect all the new content for you. The new information is presented to you in an email and/or a personal web page. You can specify when the changes will be collected, so they are fresh when you want to read them. The service is free!

 

Custom Search Engines

Google Custom Search Engine is a powerful tool that lets you set a list of specific web sites that Google will check when you search. Google Custom Search Engines can be made to search specific sites for government documents, recipes, or how to survive the zombie apocalypse. A search engine may be set-up to search one website or multiple websites. Of course you need a Google Account to create the custom search. Go to the above link and create one for yourself if you wish.

However, there are quite a few that are available because somebody else has done the work for you. Each custom search engine has an ID to refer Google to the correct custom search engine. For example, the Canadian Government Documents search engine that I use has ID: 007843865286850066037:3ajwn2jlweq. To get to it, put http://www.google.com/cse/home?cx= before the ID as follows:

https://www.google.com/cse/home?cx=007843865286850066037:3ajwn2jlweq

The U.S.A. Government information search engine that I often use is at

https://www.google.com/cse/home?cx=007843865286850066037:4-bnftxu7fu

The Intergovernmental Organizations (UN & the like) site is at

https://www.google.com/cse/home?cx=007843865286850066037:b0heuatvay8

You might want to use SaskSearch – the Saskatchewan, Canada Search Engine which is a regional search engine for the province of Saskatchewan, Canada, or go to the Caribbean Newspaper Search.

These custom search engines can save the researcher or investigator a lot of work if they are employed properly.

 

Google — Search, Plus Your World

If you are  a Google+ user, then you now have a new search tool (the encrypted site is https://www.google.com/insidesearch/plus.html). When you are signed into your Google+ account your search engine results will be sorted for relevance in different fashion. Your search results will be sorted by what your Google+ friends say about the search term. This process assumes what your friends say is more important than other content.

This personalised search relevance is a boon for advertisers that want your attention. Google isn’t the first to do this. In 2010 Bing began ranking sites in search results based upon how many of your Facebook friends “like” the site.

The search engines and advertisers have decided that people want to search for other people and their opinions over other content. How convenient for the search engines and advertisers!

If you want a full explanation of the impact this will have for the Investigator, then read Phil Bradley’s article titled Why Google Search Plus is a disaster for search. Google is no longer my first choice, I start with Bing, then DuckDuckGo, and last but not least, I search Blekko.

Google Verbatim

Google announced the demise of the ‘+’ operator a few weeks ago.  The new Verbatim tool supposedly replaces the ‘+’ search operator to get exact terms users search for.

To switch on the verbatim search tool,  go to “2. More search tools” in the column on the left side of the screen.

Verbatim is not the same as the unary operator ‘+’.  In a unary operation, in a mathematical system, one element is used to yield a single result. Verbatim forces all terms to be searched “verbatim” not just one term. Verbatim searches also switch-off some of the standard corrections. Sometimes this hinders your search. According to SearchEngineLand, Verbatim searches without the following:

  • making automatic spelling corrections
  • personalizing your search by using information such as sites you’ve visited before
  • including synonyms of your search terms (matching “car” when you search [automotive])
  • finding results that match similar terms to those in your query (finding results related to “floral delivery” when you search [flower shops])
  • searching for words with the same stem like “running” when you’ve typed [run]
  • making some of your terms optional, like “circa” in [the scarecrow circa 1963]

If you want to conduct a search where one word is misspelled, but the other is correct, and you also want synonyms, stemming, etc., then you can’t use verbatim unless you put the required word in double quotes.  This will make searching for misspelled names (the “27 Mohammeds problem”) along with other search terms more difficult.

Verbatim may help limit the impact of “personalisation” that makes some searches difficult in Google, but the loss of functionality isn’t worth the gain in my opinion.

If as Google insists, it dropped the + operator because it wasn’t used, then I shall begin worrying about search operators such as intitle, allintitle, ~, *, – and other advanced search features that make Google my first choice.

 

Google Power User Tips

With the demise of the unary operator, + in Google search, I went looking for a reliable list of query operators.  (In a unary operation, in a mathematical system, one element is used to yield a single result.)

Query Operators List

Google query operators must be entered in lower case. The best list I found is at Search Engine Land.  The query operator list was compiled by Stephan Spencer. This article was written before the demise of the + operator.

Google SERP URL Parameters

Google SERP (search engine results page) URL Parameters are are name/value pairs placed in the query string portion of the Google search URL.  The URL parameter most used in our office is the strip parameter in a cache search to eliminate any trace of your pageview in the visited website’s analytics.  The SERP URL Parameters article was written before the demise of the + operator.

Google Search Syntax has Changed

Google has removed the “+”  search operator.  Now if you try adding a + sign in your query, Google will ignore it.  You must now use the quotation marks operator instead of the “+” operator.

Normally, using double quotes around a single word turns off stemming/synonym searching.  I am not sure how this will replace the + operator that told Google that “this word MUST to be on the page”.

Searching AROUND(x) Google

The AROUND(x) Operator

A common complaint about Google was that there was no proximity search. Most people think that you cannot find thisword within x words of thatword.  Wrong!

Google supports an undocumented search operator called AROUND(x) that works as a proximity search. To make the operator work properly, you must write it in all capitals and place it between the words. It will return results with variables of the words such as plurals, etc., as is normal for Google. This may be used with other operators within normal Google search syntax, for example you might add the site: operator.

Implications of Organised Spam Taking Over Google

Manipulated Search Terms

Huge amounts of money is being spent to manipulate highly competitive search terms in Google. I’m not talking about the normal link-building or link-buying and other normal efforts. The trend is related to criminal organizations trying to sell counterfeit goods through the US search results, and to a lesser degree, the results for UK, France and Germany.

The spammers do this through keyworded anchor-text heavy links provided by automated forum and blog spam along with hacked websites. These gangs create such large numbers of these sites and links that Google is having quite a hard time catching up with the spam. The Caffeine update that ranks sites faster may be degrading overall search quality as this trend seems to go back only 7 months or so.

Lessons

  • Don’t trust what you read on the Internet; it may be planted data
  • If you don’t find anything interesting on the first few pages of Google, then you’re doing it wrong!  Set-up the search preferences properly and make them persistent.
  • Press releases and promotional websites are not a source of reliable data
  • The Internet is not a “neutral” source. Fact-check and evaluate everything you find.
  • The Internet is only one research venue

DIY Research is Not Practical

Over the last two years we have seen a DIY trend really take hold due to shrinking budgets. This has appeared in the areas of Due Diligence and Background Investigations particularly. This is false economy because the DIY Researcher doesn’t recognise changes like those described above, let alone what to do about such a distortion of the results.

The solution may be as simple as using OptimizeGoogle directed at a version of the search engine that does not implement Google Instant or it may mean conducting the search using a proxy in another country.  If you don’t understand how to do this and why you should do this, then don’t give money to somebody based upon your research.

Synonym Searches in Google

The tilde (~) helps you find synonyms of words in a Google search. This is usually done by preceding the term with a ~.  For example, searching using the term ~investigator will yield results with synonyms for investigator. It is also an excellent search to do in Google RealTime when searching social media to ensure you are using the right search terms.

The tilde search is excellent for search term discovery and variance testing.

Scroogle

Anonymous Searching

In the past I have written about hiding your tracks as you search the Internet and about the Google SSL search interface.

Scroogle via SSL

Now let me introduce you to the SSL version of Scroogle.  Like the SSL Google, it hides your search terms from IP logging.  No one snooping between your browser and Scroogle can figure out what you were looking for, because the information is encrypted.  Unlike the SSL version of Google, your IP address is dropped before your search terms are sent to Google. Therefore, Google has no idea who is conducting the search.

When you click on any of the links in the Scroogle results on the secure results page, SSL does not allow the browser to record the address of where that secure page came from, and attach it to any outgoing non-SSL links on that page. Using SSL blanks-out this referrer, so that any non-SSL site you click on from a Scroogle SSL page won’t even know that you arrived at their site from Scroogle or anywhere else.

Using Scroogle

In practice, Scroogle isn’t the greatest for finding video and clicking on a link does not open a new window in Firefox. This makes it somewhat awkward when doing high-volume searching, but it offers excellent security.

Google – Getting more than 10 results

Open the Search settings at the top right of the Google Search page. This brings you to the Preferences Page. In the Number of Results section select 100. Next go to the last section for Google Instant, select the second option, “Do not use Google Instant“.

By disabling Instant, the full 100 search results should appear.

Managing reputation through search results

Karen Blakeman’s Blog has an interesting article on removing unwanted references in Google and social media.

Removing information about you from Google

…you cannot make Google remove information you do not like except in very specific circumstances, for example copyrighted material on YouTube, images of you or your house on Street View.

…oft cited example of  how not to tackle bad publicity is that of Nestle. (Just Google Nestle social media fail or Nestle social media disaster.) “Nestle fails at social media