Tag Archive for 'Search Strategies'

Page 3 of 6

Internet Detective 103 – Monitoring Changes

In Real-time Search Engine,  I looked at a Meta search engine called Colecta that is useful for real-time monitoring certain types of sites. Now I will look at monitoring changes in sites that interest you.

Copernic Tracker

Copernic Tracker – automatically looks for new content on Web pages, forums, and Social sites. When a change is detected, our Web site tracking software can notify you by sending an email, including a copy of the Web page with the changes highlighted, or by displaying a desktop alert.

WatchThatPage

WatchThatPage is a service that enables you to automatically collect new information from your favorite pages on the Internet. You select which pages to monitor, and WatchThatPage will find which pages have changed, and collect all the new content for you. The new information is presented to you in an email and/or a personal web page. You can specify when the changes will be collected, so they are fresh when you want to read them. The service is free!

Internet Detective 102 — Pipes

Yahoo Pipes  is an interactive feed aggregator and manipulator. Using Pipes, you can create feeds that are more powerful, useful and relevant.

Yahoo Pipes is a free online service that lets you remix popular feed types and create data mashups using a visual editor. A Web mashup is a Web application that combines data from more than one Web data source into a single integrated Web application. Yahoo Pipes combines several different data sources but is generally not sufficient to create a useful application, it is a data mashup tool rather than a complete mashup editor.

How-to videos abound to act as tutorials on using Pipes. The best I found was here. You might also read Working with Yahoo! Pipes, No Programming Required.

Go Straight to the Last Page

Google doesn’t always search for ALL the words in your search statement. Sometimes, you see this at the bottom of the last page of results:

Tip: These results do not include the word “something”. Show results that include “something”.

“Something” could be any term in your search statement. This will appear at the bottom of the last page  page of results. How many people go there? I set my Google results to show 100 results and I may get 8 to 10 pages of results.

Now I go straight to the last page to see what wasn’t included in the search results.

Knowem

James Ruotolo at FraudPro found Knowem to be a good way to find what social sites have a particular user name. I’m going to add this to my list of ways for Finding Usernames.

Stealth Searching III

In a previous article on Stealth Searching I wrote:

You will not click on any links on the cached pages as these will go to live pages. You will not allow your browser to download any images on the cached pages, as they may be live images from the target domain. You will be STEALTHY. They won’t see you coming.

A reader suggested that this requires some further explanation.

Google Cache Risks

Google caches only the text of the Web page. When  the Googlebot copies the first 101K of HTML to a Google server, external files such as Javascript, Cascading Style Sheets, images, Flash, etc. are not saved. The images load from the live site not the Google cache.  Normally, when you view the cached copy, you are not connecting to the live site. However, following any link on the cached page will connect you to the live Web site, if it still exists. Some pages in Google’s cache load the entire page from the original server thanks to a simple redirection script. If a cached page has no external files, then you will not show up in the site’s log by viewing Google’s cache; but how likely is that?

The Wayback Machine

The Wayback Machine changes the links of cached pages, to allow navigation within the cached pages. However, there is always the chance that you will navigate yourself out to the original site. Remember, nothing is prefect and this stuff wasn’t designed with anonymity as its objective.

The Dangers of TOR

Using TOR to explore the Google cache and The Wayback Machine seems to be the only option. However, Web history and geographic origin affects search results when you use TOR or similar methods.

TOR does require a certain level of technical knowledge and sophistication or it can backfire on you. For example, the SSLstrip attack that is now in the wild:

The attack is more than theoretical. Marlinspike tested the software on a public server he hosted for users of the Tor anonymous browsing network; he was, by his own account, able to grab passwords to 117 e-mail accounts, 16 credit cards numbers, seven Paypal logins and about 300 other logins to supposedly secure sites ranging from Gmail to Ticketmaster to Facebook.

If a TOR server is set-up for the purpose of running SSLstrip, then you’re in trouble. The very nature of TOR makes the possibility of a corrupt TOR server rerouting your data to the attacker very possible and an ideal situation for the crook.To use TOR effectively, the proxy must be configured properly and the user must be very observant to prevent an attack via SSLslip and similar threats. Google Cache Google The Onion Router The Wayback Machine Private Investigator Toronto Ontario Canada

Bing searching Facebook and Twitter

Microsoft to Data-Mine Facebook & Twitter

Microsoft has cut non-exclusive deals with both Facebook and Twitter for Bing.

Microsoft has cut non-exclusive deals with both Facebook and Twitter for Bing to search their real-time data feeds. Google has followed suit at least with Twitter, but Facebook is the prize because it has like 40 million updates a day from its 300 million users. Not all Facebook updates will be searched by Bing, however, only the ones made available to the wider public. Facebook, where Microsoft has an equity stake, will apparently provide users with a numbers of new tools to do so. It is unclear how much Microsoft is paying. The Twitter integration is already in beta. The deals suggest that Twitter, which has raised $155 million in venture capital, will see its first revenue since ads will follow. Terms were not disclosed.

Microsoft’s stake in Facebook may give us some interesting tools for searching Facebook in the near future.

Internet Detective School 101

Google Alerts

We all know know and love Google, but how many people use its best investigative features? Investigations aren’t done in one day so why search Google on only one day?

Google Alert service is free and it allows you to create custom RSS feeds using Google search results, or you can receive the alerts by email.  Thus, if you create focused searches using phrases, site qualifiers, etc. in Google, you now can have those results as a RSS Feed.

Login to you Google account, then use the advanced query options to construct your search.  Select the Feed setting in the “Deliver to” column to activate your RSS feed.  It’s that simple; there is no need to program a Google API. Alternatively, select email to have the results sent to you by email.

Your search can be set-up to notify you as the new data appears if you select email notification. You may select as-it-happens, daily, or weekly. Simply make the selection in the “How often” column. Of course the RSS feed option doesn’t need to be told when to send you the results, it captures new data as it appears and publishes it in the feed.

To receive the feed you will have to wait until it is populated with some results. Once there are results in the feed, you may then click on the feed link for the Alert and copy the URL into your newsreader.  This takes about one day to occur in my experience.

Internet Detective School

Internet Tracking

Mantracker hunts people by following their spoor for a popular TV show.

On the Internet, Investigators have to do the same thing. However, the digital spoor may be on a computer in Singapore while your prey is in Corner Brook Newfoundland.

For this series of articles, the terms tracking, monitoring, and alerts  all mean the same thing. These terms are applied to methods of collecting new information as it appears in a variety of searches of many sources throughout the Internet.  This is a systematic way of locating information about a subject as it becomes available. These are sources and methods that monitor news reports, social media, blogs, or other open sources of information relevant to your investigation. I will illustrate how to construct the search statement and get the results in your hands on an ongoing basis.

I will start with the large search engines and move onto the lesser know sources and methods.

Google Search Options

Google’s “Search Options” was launched last May and it provides several filters to narrow down your search results. On the results page the “Show Options” link appears at the top of the search results. Click on that and you get a sidebar that looks  like the one on Google News search.

The best option in all this is that you can now SORT RESULTS BY DATE instead of relevance.  Other options that offer interesting results are the filters for formats such as video, forums, Blogs, and reviews. The reviews filter is quite strange — but I have found the best way to make it useful — use it when you are searching a person’s name and it will turn-up results from a wide variety of publications and blogs. Searching for reviews about things seems quite useless, but using this when searching names, then sorting by date, makes this very useful.

Directory of Social Networks

I came across this interesting directory of social networks: http://www.social.com/Social-Networking/. This seems to have in excess of 500 listings for social network sites in something like 100 categories.

Many of the listed sites aren’t social sites like Facebook or MySpace. I wasn’t quite sure how I might use this, so I Googled it, and found an interesting use for it in ResearchBuzz.

According to ResearchBuzz, “Google Sets allows you to specify a couple of different things and get lists of additional similar things”, and I have been using it to help me build searches and find stuff for awhile. Sometimes I wonder how some things get into the set list, but it is good to play around with the new toys from Google.

Tweeple at Work

These searches will help you to find people associated with a company or find  a subject’s co-workers.

Start with Twitter’s Find People. Search for the company name. A long list of followers of the company Tweets might be very enlightening.

Search the Twitter Profiles using Twellow by searching for the firm name, web site URL or other relevant search terms.Sometimes former employees appear in the results and may prove to be useful interview subjects.

LinkedIn is one of the most used social networking sites. Use Google to search LinkedIn for Twitter references with a search term such as site:linkedin.com company name. twitter to the search string to find twitter feeds. Do the same search using Bing and Yahoo.Then redo all the searches for FaceBook and MySpace and any other social network site that might be useful.

Use TweepSearch to search the Twitter name of someone and then index the bios of all the users they are following or are following them. Once you have them indexed, you can do a keyword search using relevant search terms.  The results may lead you to the bios of additional members of the firm for which the subject works.

Real-time Search Engine

Collecta

Collecta claims to provide results in real-time from the Web. Your search results will appear in a constantly-reloading stream — everything from Twitter updates to news and blog articles, and even  Flickr photos.

However, Twitter usually deluges the results. The “Search Options” to the left of the results allows you to select the type of updates you want to see. Leaving the Twitter updates unchecked makes it easier to see the other real-time search results.

Limitations

Like all Meta search engines, it is hard to create a search statement because you’re searching 140-character Tweets, full-text news, and Blog entries. I don’t use this as a starting point. However, it searches a wide variety of places, which makes it good for tracking breaking news.

Searching the Personal Ads

CraigsList Search Engine

AllofCraigs is another CraigsList search engine built on a Custom Google Search.

It also  allows you to query specify all Craigslist and  other ad sites and get results pulled from a custom Google search.

A Twitter stream tool allows you to see tweets that contain the word Craiglist.  However, you also get Tweets that  just mention the word Craigslist not ones with links to ads.

A search for the words toronto incall returns  many, many hits in this fast changing type of ad, while Search All Craig’s returns none. This might be useful for searching Craigslist for telephone numbers. My early searches for telephone numbers seem more successful using this than Craigslist itself.

Yauba

states:

 We do not keep any personally identifiable information.

Period.

Anonymity may be important for some people. However for most, it’s search results that count and this review clearly shows that this is a search engine with yet undeveloped potential.

Chickipedia

I recently read a news article that mentioned Chickipedia.  I immediately began searching this site. I found porn stars, actresses, athletes, and many more.  If a local paper can find a drunk driver in this thing, maybe I could find the subject of an investigation. I searched using names, city names, and occupations. Every search returned valid results.  Too bad there are only 9,177 ladies profiled on the site. Too bad I didn’t find the subject of an investigation.