Tag Archive for 'Search Strategies'

Professional Licencing Verification

The Council on Licensure, Enforcement and Regulation (CLEAR) listings of online license verification databases maintained by state agencies/provincial regulatory bodies provides links to sites where you can verify a professional license.

Google Search Operators

GoogleGuide is one of those things you find and say, “why didn’t I think of that.” If you need a guide to using Google’s advanced search operators, then bookmark the table that lists the search operators that work with each Google search service.

Finding Slides

SlideFinder.net offers a search engine powered by Slide Executive, a PowerPoint software and tools company.

Searching “McEachin” in Google I get 37 hits. Doing the same search in SlideFinder, I get one hit. In the Google results, the SlideFinder result appears third from the bottom with a different file name than found by SlideFinder.

According the SlideFinder  blog, they concentrate on indexing presentations from university websites as these “will often contain high quality content.” The blog is worth following if you regularly search for PowerPoint presentations.

This thing works very well for finding references to company names and Web sites. The person who prepared the presentation usually knows things that interest me. It’s usually easy to find the person who made the PowerPoint file. Write-out my questions, make a telephone call, get answers, write report, and move on to the next job.

Google-Free Wednesday

FindThatFile

Previously, I wrote about file searches using OSUN.ORG.

findthatfile.com provides a file search  encompassing Web, FTP, Usenet, Metalink and P2P resources (ed2k/emule) including 47 file types and 554+ file extensions including over 167 file upload services. It also offers an alert service sent to your email.

However, not all information in the search database has every property you might be searching for, therefore, you have to explore the different ways to search for the file in the advanced search screen.

In my experience, this is not a good search engine to use to search by a person’s name or a company name. The files are not well indexed in this fashion.  One must also be careful to select the “All Files” button in the “Adult Filter” to be sure all the files found appear in the search results.

I usually search by a file name for other versions of a file that I already know about. In some cases, findthatfile.com will give me an understanding of how widely circulated a file may be, or turn-up different versions of the same file.

Avoiding Google’s Own Censors

Better off with Bing

This excellent article by Lawrence Solomon illustrates why a researcher or investigator must use more than one search engine.

Googlegate: The search engine may be standing up to Chinese censors. What about Google’s own censors? 

Search for “Googlegate” on Google and you’ll get a paltry result (my result yesterday was 29,300). Search for “Googlegate” on Bing, Microsoft’s search engine competitor, and the result numbers an eye-popping 72.4 million. If you’re a regular Google user, as opposed to a Bing user, you might not even know that “Googlegate” has been a hot topic for years in the blogosphere — that’s the power that comes of being able to control information.

… Google began to minimize the Climategate scandal by hiding Climategate pages from its users.

Bing, in contrast, didn’t make climategate pages disappear. As you’d expect from a search engine that wasn’t manipulating data, search results on Bing climbed steadily until they peaked at around 51 million…

Document Hunting on Google-free Wednesday

Searching for specific terms in indexed documents on the Web is something many searchers fail to do. It is amazing what you can find when you go looking for it. I’ve written about searching by file type before. Now I have found a search engine for .pdf, .doc, and .ppt files.

OSUN.ORG

OSUN.ORG provides a simple interface for searching PDF documents, MSWord documents, and PowerPoint files. The large search engines allow one to search more file types and you must search one file type at a time using OSUN.ORG as you do in Google. I don’t know what database this search engine uses, but it doesn’t compare very well with Google. A search for my name in PDF files give 52 results in Google and only 9 in OSUN.ORG. This is not a good performance.

Sometimes it’s really hard to find an alternative to the big three search engines.

The First Google-Free Wednesday of 2010

DevilFinder

According to the site, DevilFinder began as a project to display results from search engines like Google and Yahoo without setting cookies while presenting fewer pages of results.  It does not collect search data from users and no invasive cookies or JavaScript is used.

DevilFinder seems to rank the search results on the search term alone, rather than a combination of relevance and the popularity of the site. This is why relevant results from less popular sites may appear  at the top. It is might also be the reason the result set is so small. DevilFinder shows the results arranged 100 per page and I rarely get more than 2 pages.

The Image search works quite well. The images are much larger than  other search engines. The Video search only returned hits from Youtube for any search I have done - not exactly useful. To be fair the Video search seems to be a new feature. The News tab is just a crude collection of feeds that aren’t searchable.

Search Strategy

This has become a favorite choice for searching the names of people and companies. The results often provide more useful sites in the first page than Google and I don’t have to go to the last page of results to find out what wasn’t searched, as I do with Google.

For long, complex search statements, I still rely on Google, Bing, and Yahoo!, but for searching names and some other common short search statements, DevilFinder does an excellent job and sometimes a better job than the big guys.

TinEye for CI

I have written about Tineye before.

For Competitive Intelligence research, I use TinEye  to search for images used by the target company to find where they are buying advertising space and to find affiliated sites.

Internet Detective 105 - Paid Monitoring Services

Social Media Monitoring

As an Investigator, you must realise that even the Vatican uses social media. Some forms of social media are taking on some of the characteristics of email. This information rich environment is something that Investigators and Researchers must understand. To be effective, one must also understand the tools available to conduct thorough research of the social media content.

One must also be able to create accurate budgets for this type of research. To set-up, optimise, and monitor research feeds that cover multiple social media and news sites can take many hours. These services allow one to monitor the social media space for new data or derogatory content. One particular strength of these services is that they search Blog comments, and can track comments and posts of individual contributors. While these services are aimed at PR agencies, they also offer significant utility for the Investigator, but they can be very expensive tools to use.

Techrigy

Techrigy (pronounced tek-err-jee) offers a free account that gets you up to 5 Search Words/Phrases, and store up to 1000 results. This is a great way to learn how to use the system.

Radian6

Unfortunately Radian6 is expensive — you pay just to have it in your toolbox, and then pay more for each social media research project you undertake. These costs must be understood at the outset and budgeted into the costs of the Investigation.

Filtrbox

Unfortunately, at Filtrbox their annual fee for individuals appears to be $1,000USD.

Backtype

Backtype lets you search comments that mention a brand, company, or topic, but it also lets you search comments left by a particular person.

Attaain

AttaainCI costs $150 per month for unlimited searching and monitoring. It’s less sophisticated than Radian 6 and Filtrbox which rate Blog comments from positive to negative. This is aimed at the Competitive Intelligence professional rather than the PR agency.

Internet Detective 104 — Forums, Boards, & Social Sites

Searching Boards, Forums, and Social Media sites can be a hit and miss affair using the large search engines. Google does an excellent job, but it is not the only game in town.

BoardTracker

BoardTracker – searches across 37,000 forums representing more than 63 million threads. Set up your own custom alerts using RSS or use the site’s search function.

SocialMention

SocialMention – this will find your search term in many different blogs and social outlets.  It will tell you how many times a keyword was used, the time frame, and let you subscribe to an RSS feed for that term or export the information as a CSV file.

Internet Detective 103 - Monitoring Changes

In Real-time Search Engine,  I looked at a Meta search engine called Colecta that is useful for real-time monitoring certain types of sites. Now I will look at monitoring changes in sites that interest you.

Copernic Tracker

Copernic Tracker – automatically looks for new content on Web pages, forums, and Social sites. When a change is detected, our Web site tracking software can notify you by sending an email, including a copy of the Web page with the changes highlighted, or by displaying a desktop alert.

WatchThatPage

WatchThatPage is a service that enables you to automatically collect new information from your favorite pages on the Internet. You select which pages to monitor, and WatchThatPage will find which pages have changed, and collect all the new content for you. The new information is presented to you in an email and/or a personal web page. You can specify when the changes will be collected, so they are fresh when you want to read them. The service is free!

Internet Detective 102 — Pipes

Yahoo Pipes  is an interactive feed aggregator and manipulator. Using Pipes, you can create feeds that are more powerful, useful and relevant.

Yahoo Pipes is a free online service that lets you remix popular feed types and create data mashups using a visual editor. A Web mashup is a Web application that combines data from more than one Web data source into a single integrated Web application. Yahoo Pipes combines several different data sources but is generally not sufficient to create a useful application, it is a data mashup tool rather than a complete mashup editor.

How-to videos abound to act as tutorials on using Pipes. The best I found was here. You might also read Working with Yahoo! Pipes, No Programming Required.

Go Straight to the Last Page

Google doesn’t always search for ALL the words in your search statement. Sometimes, you see this at the bottom of the last page of results:

Tip: These results do not include the word “something”. Show results that include “something”.

“Something” could be any term in your search statement. This will appear at the bottom of the last page  page of results. How many people go there? I set my Google results to show 100 results and I may get 8 to 10 pages of results.

Now I go straight to the last page to see what wasn’t included in the search results.

Knowem

James Ruotolo at FraudPro found Knowem to be a good way to find what social sites have a particular user name. I’m going to add this to my list of ways for Finding Usernames.

Stealth Searching III

In a previous article on Stealth Searching I wrote:

You will not click on any links on the cached pages as these will go to live pages. You will not allow your browser to download any images on the cached pages, as they may be live images from the target domain. You will be STEALTHY. They won’t see you coming.

A reader suggested that this requires some further explanation.

Google Cache Risks

Google caches only the text of the Web page. When  the Googlebot copies the first 101K of HTML to a Google server, external files such as Javascript, Cascading Style Sheets, images, Flash, etc. are not saved. The images load from the live site not the Google cache.  Normally, when you view the cached copy, you are not connecting to the live site. However, following any link on the cached page will connect you to the live Web site, if it still exists. Some pages in Google’s cache load the entire page from the original server thanks to a simple redirection script. If a cached page has no external files, then you will not show up in the site’s log by viewing Google’s cache; but how likely is that?

The Wayback Machine

The Wayback Machine changes the links of cached pages, to allow navigation within the cached pages. However, there is always the chance that you will navigate yourself out to the original site. Remember, nothing is prefect and this stuff wasn’t designed with anonymity as its objective.

The Dangers of TOR

Using TOR to explore the Google cache and The Wayback Machine seems to be the only option. However, Web history and geographic origin affects search results when you use TOR or similar methods.

TOR does require a certain level of technical knowledge and sophistication or it can backfire on you. For example, the SSLstrip attack that is now in the wild:

The attack is more than theoretical. Marlinspike tested the software on a public server he hosted for users of the Tor anonymous browsing network; he was, by his own account, able to grab passwords to 117 e-mail accounts, 16 credit cards numbers, seven Paypal logins and about 300 other logins to supposedly secure sites ranging from Gmail to Ticketmaster to Facebook.

If a TOR server is set-up for the purpose of running SSLstrip, then you’re in trouble. The very nature of TOR makes the possibility of a corrupt TOR server rerouting your data to the attacker very possible and an ideal situation for the crook.To use TOR effectively, the proxy must be configured properly and the user must be very observant to prevent an attack via SSLslip and similar threats. Google Cache Google The Onion Router The Wayback Machine Private Investigator Toronto Ontario Canada