Hunting YouTube Content

A successful hunt for data includes dragging your prey home and preparing it for consumption. If you have a hungry client to feed, then you will have to chop-up your prey into digestible chunks, cook it properly, and then serve it up all pretty-like on a fancy platter, because clients are picky eaters.

Here is what you need to make a delightful repast of what you find on YouTube.

After the disappearance of Google Reader, Feedly became the new standard in RSS readers. However, Feedly is much more than an RSS reader. It allows you to collect and categorize YouTube accounts.

For example, you can monitor the YouTube accounts of politicians, activists, or anybody else who posts a lot of YouTube videos. You get the latest uploads to their YouTube accounts almost instantly. This continuous stream of updated content can be viewed and played in Feedly and does away with individual manual searches of known YouTube accounts.

Of course, Feedly has other uses, but the YouTube use is the greatest time saver. The time saved can be applied to summarizing the video content and analyzing it in terms of how it relates to your client’s objectives.

Inoreader is another feed reader that can organise YouTube account feeds into folders along with a limited number of feeds from Twitter, Facebook, Google+ and VKontakte. It also allows the user to gather bundles of subscriptions into one RSS feed and export them to another platform to go along with the YouTube content.

Just paste the URL of a YouTube video into Amnesty International’s YouTube Dataviewer to extract metadata from the videos. The tool reveals the exact upload time of a video and provides a thumbnail on which you can do a reverse image search. It also shows any other copies of the video on YouTube. Use this to track down the original video and the first instance of the video on YouTube.

A lot of fake videos appear on YouTube. Anything worth reporting needs to be examined to see if it is a possible fake. The Chrome browser extension Frame by Frame lets you change the playback speed or manually play through the frames. While this is the first step in uncovering a fake, it is however, an easy way to extract images from the video for inclusion into a report.

Of course, you will use the Download Helper browser extension, which is available for both Firefox and Chrome, to help download the videos. Just remember to set the maximum number of ‘concurrent downloads’ and ‘maximum varients’ to 20 and check ‘ignore protected varients’ to speed the process.

To make a long list of videos to download, you can use the browser extension, Copy All Links, or Link Klipper or Copy Links in Chrome, to make a list of the links to every video you find. In addition to using this list in your report, you can turn it into an HTML page and then let Download Helper work away on it for hours by downloading all the videos for you.

Collecting all this video is the easy part. Sitting through all of it to extract useful data and then analysing it to see how it helps or hinders your client’s interests is the painful and expensive part, but it is the only way cook-up what the client wants to eat.

Forcing Firefox to Open Links in a New Tab

During a training class I watched everybody trudge around looking for lost search results. They tried reloading results pages, only to get distored results. They kept losing the search engine results page and were getting lost in a sea of tabs. They wanted to know how to get “google search results” to open in a new tab.

Here is my solution for getting tabs to open where I want them to. In Firefox, go to ‘about:config’ in the address bar. In the config window search for these settings and change them as follows:

  • browser.search.openintab – if true, will open a search from the searchbar in a new tab if you use the return key to trigger the search
  • browser.tabs.loadBookmarksInBackground – if true, bookmarks that open in a new tab will not steal focus
  • browser.tabs.loadDivertedInBackground – Load the new tab in the background, leaving focus on the current tab if true
  • browser.tabs.loadInBackground – Do not focus new tabs opened from links (load in background) if true
  • browser.tabs.opentabfor.middleclick – if true, links can be forced to open a new tab if middle-clicked.

This is the type of ‘boring stuff’  that you must master if you want to do Investigative Internet Research and make any money at it. Clients won’t pay for wasted time. You may know where to hunt for data, but you need to also know how to get it into the larder before it goes bad.

The PI & OSINT

Finding and verifying social media content is becoming a greater concern for private investigators (PIs) and their clients. Unfortunately, most PIs do not possess the skills and resources to do this beyond the most rudimentary level.

Some investigation companies will try to build an in-house operation. They will buy technology, or spend money on subscriptions to tools that claim to do the work with a click of a button. This usually proves to be a costly expedition into the unknown that ends in failure. The purchased tools do not live up to their claims or clients usually want something the purchased tools and subscriptions don’t deliver.

Some investigation companies will send staff to courses to learn about sources. These are billed as Open Source Intelligence (OSINT) courses. Unfortunately, the OSINT concept usually misses the “intelligence” part, and it is more about gathering raw information than producing usable investigative reporting.

The ‘intelligence’ part is the expensive part. It involves time to conduct the analysis and many hours of learning to present the analysis along with the sources and methods reporting.

Producing a report that goes beyond the OSINT concept is not a secretarial task. Once you go beyond the popular OSINT concept, you start doing Investigative Internet Research (IIR).

Why You Can’t Dictate an IIR Report

Proper IIR reporting does not rely on haphazard Internet searches and does not dump a disorganised load of raw data from the Internet into a client’s inbox. Reports summarize then analyse the collected data and then explain the sources and methods used to collect data.

The researcher must understand how to use Word and other software because he cannot dictate IIR reports. A dicta-typist cannot produce an IIR report for the following four reasons:

  1. The person transcribing the dictation will not place images, graphs, and video clips properly yet, a picture, screenshot or video is worth a thousand words.
  2. There is no efficiency at all in dictating a URL and plenty of mistakes would result.
  3. Some Web site names are hard to pronounce and would lead to misspelling (although you might spell them out, there is still a risk).
  4. Whoever writes the report must have all the collected material at hand in order to create footnotes and appendices.

Now you know why the person doing the IIR must also prepare the report.

In the next few articles I will describe the tools and techniques that actually work, but there is no magic button that does the analysis for you.

Verbatim

In Google, Verbatim is not a command. If Google misbehaves by including strange terms that have nothing to do with your search statement, or if the search results entirely ignore some of your seach terms, then apply Verbatim to the search results by selecting ‘Search tools’, then ‘All results’  and  finally ‘Verbatim’.  Doing this will force Google to search on all of your terms without dropping any or looking for variations and synonyms.

Phone Numbers on the Web

The Phone Archive  says it searches USA based phone numbers usages and context snippets on webpages and documents found on the Web. This is operated by the same folks that run The Email Archive that I found less that useful earlier this week. This site is much more useful.

While they advertise this as searching US based phone numbers I found it useful for finding references to any phone number in the NA numbering plan. I found numbers in Canadian, Panama, and Caribbean islands.

I haven’t compared results to the large search engines, but this is a useful resource.

When things get complex

Advangle helps you build complex web-search queries in Google and Bing.

You can quickly build a query with multiple parameters (such as the ‘domain’, ‘language’ or ‘date published’) and immediately see the result of this query in Google or Bing search engines. Any condition in a query can be temporarily disabled without removing it to allow you to try several combinations of different conditions and choose the one that works best.

A reddit Barometer

Reddit is an entertainment, social networking, and news website where registered users submit content, such as text posts or direct links. This makes it a large online bulletin board.

Users vote submissions up or down to organize the posts and determine their position on the site’s pages. Content is organized by areas of interest called “subreddits”. The subreddit topics include news, gaming, movies, music, books, fitness, food, and photosharing, among many others.

For the investigator, reddit is a good barometer of the user’s interests, attitudes, and popularity. If you want to see the user’s barometer, SnoopSnoo provides reddit user and subreddits analytics.

On SnoopSnoo, the user analytics are computed by analyzing submissions and comments activity. Analysis is limited to the 1,000 most recent comments and submissions due to reddit’s API restrictions. The subreddits are automatically assigned topics by an algorithm. Subreddits with fewer than 1,000 subscribers or created within the last 30 days may not have been processed.

Searching Periscope & Meerkat

Periscope, the free iPhone app from Twitter is the clear winner against first-comer Meerkat. Periscope is mobile live streaming that lets the user share what is happening right now and relive it later thanks to the service’s saved streams feature.

At the moment, from the investigator’s perspective, Periscope and Meerkat offer an opportunity to see a lot of useless streaming video if you don’t know how to search effectively. Both are hard to search by keyword or topic–you usually have to search via people.

You can use Getxplore and link your Twitter account to them. This will then allow you to see current Periscope and Meerkat streams and then enter search quires to find the types of streams that you are looking for.

Another option is the Twitter search and programs such as Tweet Deck or Hootsuite which you can setup to constantly pull Periscope and Meerkat streams direct to you dashboard. Simply add #Periscope OR #Meerkat as a search term and now you will have access to every single live-streaming video that is shared to Twitter.

You can refine the search by geography as in  #periscope OR #Meerkat near:”Toronto, Ontario” within:50mi. To further filter results add keywords to make the search even more specific, (#periscope OR #Meerkat) AND (Jays OR Skydome).

Search Engines are NOT Neutral

If you believe that the search results from any search engine, let alone Google, are neutral and do not reflect the search engine’s owners interests and biases, then you are very niave or entirely delusional. To prosper in the ‘information age’ one must be skeptical, open minded, and use many search engines.

For example, Google monitors what we’re searching on and decides what search results are best for its own interests. In the USA, Google was the second-largest contributor to Obama, but Google protests that it doesn’t manipulate search results in his, and the democrat’s favour.

Some very enlightening information is now comming to light about how a small change the search algorithm may dramatically change the outcome of an election. I strongly suggest that you read Big Data Meets Popular Vote in today’s National Post.

Google-Free Wednesday–Disconnect Search

Disconnect Search is a specialized VPN that lets you search privately using Google, Bing, and Yahoo search engines. They say they don’t log searches, IP addresses, or any other personal info.

Using Disconnect search, your ISP shouldn’t see your search terms as they don’t have access to your searches. Normally, when you click a result link, the site you go to may see your search terms, but Disconnect should prevent this. Search engines save your searches, which can be connected to your real name or IP address. Disconnect should anonymize your searches.