Hunting YouTube Content

A successful hunt for data includes dragging your prey home and preparing it for consumption. If you have a hungry client to feed, then you will have to chop-up your prey into digestible chunks, cook it properly, and then serve it up all pretty-like on a fancy platter, because clients are picky eaters.

Here is what you need to make a delightful repast of what you find on YouTube.

After the disappearance of Google Reader, Feedly became the new standard in RSS readers. However, Feedly is much more than an RSS reader. It allows you to collect and categorize YouTube accounts.

For example, you can monitor the YouTube accounts of politicians, activists, or anybody else who posts a lot of YouTube videos. You get the latest uploads to their YouTube accounts almost instantly. This continuous stream of updated content can be viewed and played in Feedly and does away with individual manual searches of known YouTube accounts.

Of course, Feedly has other uses, but the YouTube use is the greatest time saver. The time saved can be applied to summarizing the video content and analyzing it in terms of how it relates to your client’s objectives.

Inoreader is another feed reader that can organise YouTube account feeds into folders along with a limited number of feeds from Twitter, Facebook, Google+ and VKontakte. It also allows the user to gather bundles of subscriptions into one RSS feed and export them to another platform to go along with the YouTube content.

Just paste the URL of a YouTube video into Amnesty International’s YouTube Dataviewer to extract metadata from the videos. The tool reveals the exact upload time of a video and provides a thumbnail on which you can do a reverse image search. It also shows any other copies of the video on YouTube. Use this to track down the original video and the first instance of the video on YouTube.

A lot of fake videos appear on YouTube. Anything worth reporting needs to be examined to see if it is a possible fake. The Chrome browser extension Frame by Frame lets you change the playback speed or manually play through the frames. While this is the first step in uncovering a fake, it is however, an easy way to extract images from the video for inclusion into a report.

Of course, you will use the Download Helper browser extension, which is available for both Firefox and Chrome, to help download the videos. Just remember to set the maximum number of ‘concurrent downloads’ and ‘maximum varients’ to 20 and check ‘ignore protected varients’ to speed the process.

To make a long list of videos to download, you can use the browser extension, Copy All Links, or Link Klipper or Copy Links in Chrome, to make a list of the links to every video you find. In addition to using this list in your report, you can turn it into an HTML page and then let Download Helper work away on it for hours by downloading all the videos for you.

Collecting all this video is the easy part. Sitting through all of it to extract useful data and then analysing it to see how it helps or hinders your client’s interests is the painful and expensive part, but it is the only way cook-up what the client wants to eat.

Hunchly & Casefile

As I move away from Windows due to privacy and security issues, I have been looking for new software for Investigative Internet Research (IIR). Taking Casefile from one OS to another has not created any problems.

I have been watching the development of Hunchly and have tried it on Windows, Mac, and the recent Linux release with success and it works well with Casefile. Browser-based tool Hunchly  creates local copies of every page visited during a session, and organises them into a searchable database for future reference. Hunchly is a Google Chrome extension. I have some privacy and security concerns about using Chrome, but the IIR world isn’t a perfect place.

Hunchly permits the use of “selectors,” such as a name or phone number that save you from manually searching each page for the terms. In my opinion, this feature alone is worth the purchase price. The other useful features include:

  • being able to add notes to what you find
  • you can download notes as a Word document
  • all collected data is stored, tracked and accessed on your local machine–no security or privacy concerns about cloud use
  • you can export Hunchly data to a Casefile or Maltego graph.

Hunchly isn’t a replacement for Maltego, but it is a good tool for smaller IIR tasks that might later require the use of Maltego. The ability to export to Casefile or Maltego can help with further research and reporting the linkages within the collected data.

Indexing PDFs

ORPALIS PDF OCR Free is a Windows tool which converts PDF files into fully searchable documents. It scans a PDF file and recognises all its text–even within images–and then exports a new PDF file that now has all its text searchable. This is useful with scanned documents, as it allows you to use the regular Search tool, rather than reading every page of the document.

ORPALIS offers a lot of useful tools for managing your documents. For example, the professional version converts over 90 document formats whereas the free edition supports only PDF as input. It also recognizes over 60 languages and uses multithreading to process multiple documents at the same time.

Document Scanning with Smart Phones

It is now common practice to take pictures of computer screens, record books, and documents during our research expeditions. I am certain that you want to do the same. Here is a list of scanning applications that may help with your quest for the ideal scanning app:

  • Genius Scan for ios. This app turns phone/tablet into PDF scanner w/Dropbox/GDrive integration.
  • CamScanner for Android, iPhone, iPad, Windows Phone 8
  • Tiny Scanner allows you to create PDF documents with multiple scans. Scans are saved to your phone as images or PDFs. For Android, iPhone and both free and pro versions exist.
  • Scannable from Evernote. Requires iOS 8.0 or later and compatible with iPhone, iPad, and iPod touch. Beware, scans are only saved to your device for 30 days unless you disable this in the “Advanced” settings.

All of the above will create a PDF of the scanned content. The next post will offer a solution to indexing the PDF files to make them searchable.

Why I am Never Wrong

You might think the headline was written tongue-in-cheek. You might be right, but you lack relevant data upon which to draw that conclusion.

Nobody pays an investigator to collect data. You earn the big paycheck for interpreting and analysing data.

You must quickly collect data from a variety of sources knowing their content, date-range, and how this data relates to the matter at hand. Next, you must summarise what you find. Then, you must interpret how this data might add to the progress of your investigation. Finally, you must analyse the new data in view of how it either supports or refutes your mandate, objectives, or hypothesis.

If you start with a logical mandate, objective, or hypothesis, and collect relevant data upon which you apply a reasoned analytical process, then, based upon available data, you will never be wrong either.

Quotes, Citations, & Markup

When collecting data for a report, I come across data in a multitude of markup formats. A markup language is a format for annotating a document in a way that is distinguishable from the text. Each markup language has its own syntax. The differing syntax between languages creates a problem when I need to extract quotations, create citations, and create appendices. What I need is a program that can understand and convert document text annotated with different markup languages.  It must handle footnotes, tables, definition lists, superscript and subscript, strikeout, enhanced ordered lists, and the render the text into a form usable by MS Word. It must also translate math equations into something useful.

If you have been struggling with this too, try a programme called panddoc. This programme will take a while to learn, but once you have experimented a little, you will learn how to solve most of your markup-to-report conversion problems.

Getting a Date

Date formats are easily misinterpreted. For example, if you write 06-07-07, an American might assume that it represents June 7, 2007 or 1907 and an European might assume that it is 6 July 1907 or 2007. Some might  recommend using an unambiguous date system, such as an ISO 8601 European date format, (YYYY-MM-DD) but unless the reader  is a government worker they might get the month and date mixed-up.

The best method is to use a 3-letter abbreviation for the month preceded by the day and followed by the full year to avoid any confusion thusly, 6 Jul 2007.

Search Link and Results Copying

The Google/Yandex Search Link Fix Firefox extension prevents Google Search and Yandex from modifying result links when they are clicked. If you try to copy the link you may get gibberish instead of the actual link. If you try to copy the text description in the results it won’t work unless you got to the Edit menu and select Copy — Ctl+C won’t work. This extension disables these behaviors on any Google domain without having to configure anything.

Image Conversion Tool is a FREE online utility to convert popular picture file formats (JPEG, TIFF, PNG, GIF, BMP, TGA, ICO, ICNS, PDF …) and camera RAW formats (CRW, NEF, RAF, CR2, DNG, …) into JPEG, PNG, TIFF, PDF, BMP, and GIF.

It supports more than 400 input formats. It also allows you to resize images, rotate them, and add effects. It has a file size limit of 3MB. This has proven handy when I need to convert an image that I will then insert into a Word document.

What’s in an Employee Number

I was reviewing a stalled investigation into an apparent corporate fraud when I noticed something interesting. A surveillance photograph was in the paper file — you don’t see many real surveilance photographs any more, just muddy images taken from video.

This particular photo was so clear and detailed that I had to talk to the investigator who took it. It was taken with a long lens mounted on a camera with a 22 mp full-frame CMOS sensor. The investigator directed me to the server and directory that contained over one hundred images along with video taken using the same camera. All of this data was summarised in two paragraphs in the investigation report. This proved unfortunate, as this fine work happened early in the investigation. The investigator wrote a detailed report that someone summarised without including a proper citation. The person who did this failed to recognise that the problem had been solved. Over one year later I was hired to solve this difficult and persistent problem.

The surveillance picture clearly showed an employee pass card. The pass card clearly showed the name of the security system vendor, employee name, employee picture, and worst of all, the employee number. The employee number was the defacto authentication required for gaining information the crooks needed. During social engineering the crooks were challenged and asked for their employee number. When they provided the number the information flood gates opened.

Further investigation revealed that a fake employee pass card was made and used to gain access to the facility. The card didn’t have any electronic component, but the crook was wearing a authentic-looking employee card just like everybody else, and that was enough for him to repeatedly gain the access he needed. He just walked throughout he front door at the right time of day and followed the real employees to the department where he committed his crime, over and over again.

Once captured, this crook freely admitted that he got everything he needed from the passcards that employees wore prominently around their necks. He copied it from pictures he took, just like the first investigator did.

File Erasure

File erasure is something every Investigator needs to consider. Investigators collect a lot of data that never makes into a report. Sometimes that data is irrelvant or something that cannot be reported. That stuff should not be left hanging around to be recovered later and then missused. Some form of file erasure software should be used to make it unrecoverable.

Some examples of file erasure software:

Web Citations

While doing Investigative Internet Research (IIR), you find a document from an organisation that  changes its name before you finish your report. The document was retrieved before the name change. How do you cite reference? Do you cite it with the old organisation name or the new name?

Normal practice is to use the name as it was when you found the document. However, this can cause problems when someone does fact-checking to independently verify the citation. Someone must then find and document the history of the organisation name.

The solution is to cite the date the document was retrieved and in square brackets include the new name. For example, [currently, XTS Organisation] or better still [as of 11 Jan 13 the name changed to, XTS Organisation]. The latter addition to the citation creates a dated history of the organisation’s name.

Note Taking – Yesterday & Today

Skilled note taking is a critical skill for the Investigator. A client reminded me of this when he described a meeting with a Crown Prosecutor. The case in question resulted from an investigation that was conducted two years ago. The Crown went over his report and notes with a fine tooth comb in preparation for the trial.

Note taking has a long history. I see it in the margins of books, in notebooks, and this blog is a form of note taking for me. I’m in the process of writing a book and that entails a different form of note taking.

I found an New York Times article about 250 academics and civilians gathered at Harvard for a more self-conscious exercise: a chance to take notes on note-taking.

The article mentions the “Anxiety over the potential mindlessness of note-taking took on particular urgency during the digital annotation session, at which panelists debated whether the Internet and social media had ushered in a golden age of notes or doomed us to watch all our fleeting thoughts — if not our brains themselves — sucked down a giant digital drain, beyond the reach of future historians.” This is of particular interest to the Investigator.

The Investigator still needs to create clear paper-based notes to avoid having his work “sucked down a giant digital drain, beyond the reach of clients, prosecutors, and defense council.

Searchable Clipboard Extender

Ethervane Echo from Tranglos software is a clipboard extender that will hold all your data from the clipboard until you delete it, and it has excellent search capabilities. It works with Microsoft Windows XP or later. This is the kind of utility that nobody thinks about, but everyone uses once they have it.

If you are an Investigator, Journalist, Writer, or Translator, then this will be very useful. The search feature allows you to easily find words, phrases, etc., that you have previously copied. To use the search feature just type a few characters, and the list of clips will be automatically filtered to include only those that match the characters you have typed. It also has more advanced search features. Of course, you can delete any item or the entire content of the clipboard extender.



Why You Can’t Dictate an Investigative Internet Research Report

  1. A picture, screenshot or video is worth a thousand words. The person transcribing the dictation won’t place pics & video clip properly.
  2. There would be no efficiency at all in dictating a URL and there would be plenty of mistakes.
  3. Some website names are hard to pronounce and would lead to misspelling (although you might spell them out there is still a risk)
  4. One must have all the collected material at hand to create footnotes and appendices.