Information Vs. Actionable Intelligence & the PI

I see many courses for Private Investigators (PIs) about using the Internet for Open Source Intelligence (OSINT). These courses are predominately about Internet sites that might yield useful information. These courses don’t teach how to process and analyse the captured data or how to properly report what was found. The OSINT concept usually misses the “intelligence” part, and it’s more about gathering raw information, not the production of intelligence.

As an example, I just captured a FB account with about 1000 posts, thousands of friends and pictures, along with about 20 videos. How would anyone search through all of this and link it to relevant people, places, things, or companies? Even if the PI can identify some useful linkages and other data, how does he report it in a timely and cost-effective manner? All these courses conveniently omit the fact that a senior decision-maker needs an accurate and concise report that illustrates the linkages between relevant data.

Unfortunately, many of the course providers don’t create investigative or intelligence product, they teach courses about Internet sources.

According to Justin Seitz, the creator of Hunchly, a Chrome browser extension for collecting OSINT material from the Internet, “the greatest limiting factor of the OSINT concept is budgets that don’t recognise the time, resources, and training needed to complete the research, or the complexity of creating a true intelligence product. The budget provided to the PI leaves no choice but to simply provide screenshots and captured raw data to clients who don’t want to pay the premium required to deconstruct a network, or to chase-down the best breadcrumbs.” In the information industry, we call this ‘rip & ship’. Nobody expects other professionals to work like this.

In a recent discussion with Mark Northwood of Northwood & Associates, a large Canadian private investigation company, he easily summed-up the problem. “If a client retains a lawyer and the lawyer researches case law in order to determine which are the best methods to advance or defend a claim does the client simply say “give me the case law, I will interpret it”–No–the lawyer gives the client his opinion and supports it with the case law. Clients pay lawyers substantial fees for their analysis of case law, not the collected the case law itself.” Clients need the PI doing OSINT to work in the same manner.

Northwood believes that PIs need to educate their clients into understanding that someone needs to analyze the raw OSINT data and the only person that can do that is the PI because he collected the raw data and has it immediately at hand. The PI is in the best position to collate, analyse, and report on the data he has collected.

Chicken or Egg

As I see it, this is a chicken or egg problem.

The Egg

Without reasonable budgets on offer, clients won’t find PIs with the programming experience necessary to mine the collected data. Nor will clients find PIs experienced with the complex and expensive software to collect and report on the data in the first place.

Clients cannot find PIs to conduct OSINT and create actionable reports because there is no profit in it for the PI. No PI is going to acquire such skills if there is no profit in doing so. Without the prospect of reasonable wages, people with the above skills won’t become PIs; nor will people with the training in the logic, rhetoric, and argumentation needed to produce actionable reports. Existing PIs won’t be motivated to learn these skills without the prospect of financial benefit.

The Chicken

If the PI consistently has appropriate budgets to work within, then he will have or acquire proper tools and skills needed to collect, analyse, and then report on the significance of the collected data. Proper budgets also permit the PI to develop a viable reporting protocol for the type of data he collects. Proper budgets preserve the integrity of the collected data and allow for the creation of intelligence reports that include proper citations.

This chicken definitely grows from the budget egg. A large Canadian PI firm is currently advertising for someone to conduct ‘social media investigations’ at a pay rate of $15 per hour. One can only imagine the nature of the client’s expectations and the type of work produced for so little pay.

Fusion

Today, any intelligence or investigative product requires a fusion of many types and sources of data. A complete report usually needs surveillance observations, content from interviews, public records, and government documents.

Again, the budget to collect and analyse public records and government documents creates the skills and knowledge needed to perform this task. This fusion of data sources allows the PI to establish relevant links between the people, places, things, and companies of interest to the client.

OSINT Tools & Skills

If the budgets come to truly represent a desire for a better product, then the following will be the tools and skills your PI should possess in the realm of OSINT. This is the rocket science behind real OSINT.

Hunchly

Hunchly is a Google Chrome extension that tracks and captures every page that you view during an investigation. This saves you from having to stop and take screenshots or from having to create handwritten logs of every URL that you have visited. It includes the ability to track names, phone numbers and other pieces of information. Hunchly builds a data rich case file from all of your investigative steps that helps you to preserve evidence.

Hunchly permits the use of “selectors,” such as a name, address, or phone number that save you from manually searching each page or the collected data for the terms. In my opinion, this feature alone is worth the purchase price. The other useful features include:

  • being able to add notes to what you find
  • you can download notes as a Word document
  • all collected data is stored, tracked and accessed on your local machine–no security or privacy concerns about cloud use

Casefile

If your research requires graphing of the relationships between people, places, things, and companies, then CaseFile provides that at a much lower cost than other solutions if the dataset small enough to be managed manually and this is the case presently for most of the PI’s work.

Maltego

Maltego is the favoured software of many intelligence analysts, researchers, and investigators for searching, and linking OSINT data. While it helps search through mountains of data and sort it in useful ways based on publicly available information that is currently sitting on the Internet, it has many limitations.

If you need to search FB by email address, Instagram by photo GPS, search people in social media sites, or search LinkedIn by company or college, then this is the tool to use. However, some these capabilities can cost $1000 per year on top of the Maltego yearly fees. Less costly alternatives exist.

Given its current state of development, I am not certain that Maltego warrants its cost for the PI. Most of the search capabilities of Maltego are in ‘transforms’, which are Python scripts that access a search site’s API[1].

The search functions of the most used ‘transforms’ can be created in Python for a lower cost. The graphing component of Maltego is available in CaseFile. Using Hunchly, CaseFile, Python scripts, Word, and PowerPoint together should produce on acceptable product if the collected data is properly summarised and then analysed.

Python

Python is a programming language best described as a language used to create scripts that execute specific tasks, such as searching for a specific word in a sea of text.

Python automates time-consuming tasks. It allows you to parse raw data untouched by other tools and read information from databases. It aids in the generation of reports and moves files into folder structures based on their content type. From the PI’s perspective, Hunchly can handle these tasks.

Python scripts may also provide access to a search site’s API. A page of scripts enables searching a site for search terms in a variety of ways. In practice, this is the PI’s favored use of Python.

The High-end Tools

When the volume of collected data increases, so does its lack of organisation for investigative purposes. This fact has spawned many products designed to search and retrieve text strings in masses of data. This is usually called “free text retrieval” (FTR) software. The following are the current leaders in utility for investigative purposes.

dtSearch

The dtSearch[2] product line enables searches of terabytes of text across a desktop, network, Internet or Intranet site.

Nuix

In the near future, PIs may resort to high-end tools like the Nuix suite to find connections in the vast seas of data that like the Panama Papers dataset. Nuix is a FTR software that enables searching through huge volumes of unsorted data for people, places, things, and companies. It also allows users to display connections between all these entities along with timelines is a manner similar to Maltego and CaseFile.

For more than a decade, FTR software has been the province of well-funded intelligence agencies, law firms, and businesses. Journalism has discovered this due to the donation of Nuix to the Panama Papers project[3].

Social Media Monitoring

Products like XI Social Discovery[4], Geofeedia[5], Dataminr[6], Dunami[7], and SocioSpyder[8], to name a few, are being purchased by Fortune 500 companies, and government to manage social media research.  Products are now becoming necessary for the successful private investigator.

The Report

In broad strokes, the PI’s report creation process should look like the following:

  • The PI will assemble or collate all of the collected information from all the tools used, examine links, or shared information such as URLs, email addresses, etc.. From this collated material, a summary begins to take shape.
  • The investigator ensures that each piece of crucial information is put into its own section within the logical order of the summary; visuals (screenshots, text captures, tagged photos) are included as much as possible.
  • Relationship graphs exported from CaseFile or Maltego should be included in the report if they fit the page, if not, screen clips may be used or Powerpoint slides can be imported.
  • From the summary rises the true analysis of how the data relates to or affects the client’s objectives.
  • The report must describe the sources and methods used and describe all investigative activities. This is crucial when little information is uncovered about a subject. This level of detail is not included in the summary.
  • Evidence (captured images, videos, etc.) remains in a separate file from the report.

Conclusion

In conclusion, as with all new products, the price will drop and quality will improve as PIs adopt the necessary programming skills and software in an increasingly competitive market. Of course, this will not happen if clients are not willing to provide reasonable OSINT budgets today.

[1] Application program interface (API) specifies how software components should interact, ie. a search interface.

[2] http://www.dtsearch.com/

[3] https://www.icij.org/offshore/how-icijs-project-team-analyzed-offshore-files

[4] http://www.x1.com/products/x1_social_discovery/

[5] https://geofeedia.com/

[6] https://www.dataminr.com/

[7] http://www.pathar.net/

[8] https://www.sociospyder.com/

OSINT & Zombie Journals–Part 5

To improve your evaluation skills, develop three abilities:

  1. Maintain a skeptical mind-set.
  2. Learn which sources are most trustworthy.
  3. Learn to identify reporting errors and inconsistencies. If something does not look right, investigate its veracity.

In addition, the skeptical investigator must develop a structured means of evaluating the relevance and reliability of the collected data along with the ability to communicate this. In my opinion, this is the most demanding part of any investigation.

The following list of 13 evaluation criteria has developed over decades of practice by researchers and investigators worldwide. This list of evaluation criteria appears in one form or another throughout the literature on research and intelligence analysis. I use a Microsoft Excel spread sheet based upon them to record the evaluation of sources for most investigations. Doing this is time-consuming, but it is often necessary to maintain the integrity of the reported data and the conclusions that are drawn from it.

The Internet is renowned for harbouring unreliable information. The following evaluation matrix will work for you if you rigorously apply it.

Evaluation Matrix

  1. Recency.Do the data appear to be current on the subject or the most appropriate for the historical time period? Are sources dated, and maintained?
  2. Relevancy. Is there a direct correlation to the subject? What is the tone of the information display? Is it popular, scholarly, or technical?
  3. Authority. What is the reputation of the data, and the data-provider? Has this source of data been cited elsewhere? What is the reliability of the source? How can you document or qualify the source of the information?
  4. Completeness. Are alternative data or views cited? Are references provided? Given what you know about the source, is there any evidence that the data is NOT ‘slanted’?
  5. Accuracy. Does the source furnish background information about data sources and/or in-­depth data? Are the complex issues of data integrity and validity oversimplified? Are the terms adequately defined? Are there references or sources of quotes?
  6. Clarity. Is the presentation of information really credible? Can bias of the information providers really be ruled out? Are there any logical fallacies in presentation of the data or assertions, or in other statements or publications on the page or by the source? Are key assumptions described, or are they hidden in the hope that the reader will be gullible?
  7. Verifiability. Can the information be verified? Can you find corroboration? If not, why not?
  8. Statistical validity. Can the key points or critical data be supported by standard statistical testing? With subjective information, one verifies by corroboration as it can be found. With numerical information, many questions arise. What statistical inference is needed in order to accept any implied inferences of the data displayed? Are there clear explanations for readers or viewers to qualify the implications of numerical “averages” or “percentages?”
  9. Internal consistency. Do the data or commentary contain internal contradictions? Know what you can about the source, and scan for logical fallacies throughout the presentation.
  10. External consistency. Do the data reflect any contradictions among the source documents? In the assertion of information or views, is there an acknowledgment or recognition of alternative views or sources? If not, and source documents are involved, one might suspect that the author had an “agenda”.
  11. Context. Can fact be distinguished from opinion? Are sources taken, or used out of context?
  12. Comparative quality. Are some data clearly inferior to other data? Which are the “best” data when you consider the above eleven tests ( i.e., most recent, most relevant, the most authoritative, the most complete, the most accurate and so forth). You should always be evaluating the information as you browse the Net. Check the little things that journalists watch for. A misspelled name, for example, could be a warning sign, even in an academic paper, that the author was careless in other areas as well. Do any statements seem exaggerated? If so, why has the author exaggerated? Is it a spur­-of-­the­moment statement by e­mail, or is that exaggeration more deliberate? Are you reading instant analysis, quick off the mark, or the results of a carefully crafted study, or a meta­analysis, which is a study of all previous studies on a subject? What do you think has been left out of  the report? What an author omits may be just as important to a researcher as what an author includes. What’s left out could reveal much about the bias of the information you are reading. Take notes on such matters as you find sources, and possibly include footnotes in your paper, if that’s what you’re doing; it’ll provide evidence that you have critical thinking abilities! Don’t ignore bias as a valuable source of information, as well: Even an untrustworthy source is valuable for what it reveals about the personality of an author, especially if he or she is an actor in the events.
  13. Problems.There’s another problem, growing in the 1990s, called by some the Crossfire Syndrome, after Crossfire, the CNN public affairs show and its tabloid television imitators. The Crossfire Syndrome drowns out the moderate voices in favor of polarization and polemics. Confrontation between polar opposites may make for good television, but it often paints a distorted view of reality. On the Net, and throughout North American media culture, the Crossfire Syndrome is acute. In flame wars, the shouters on both sides are left to use up bandwidth while the moderate voices back off. Political correctness of any sort, right or left, religious or secular, tends to distort material while creating interest, at the cost of ‘truth’. All of us are left with the task, with nearly any information we’re exposed to, of seeking out the facts behind the bias. It’s been said many times that there is no such thing as objectivity. The honest researcher tries to be fair. The best researcher is both prosecutor and defense attorney, searching out the facts on all sides (there are often more than two).

The advent of the 24-hour all-news network, along with a procession of imitators and the Web. The CNN Effect lives off of one continually asked question, ‘how bad can it get?’ as a way to maintain viewer interest. This severely distorts the information presented.

A Test

If you want to test what you have learned here is an example.

You are researching how IT operations impact carbon dioxide emissions. In the data you uncover, you find the following journal article: Towards Green CommunıcatıonsWhat would your evaluation of this article be?

Economic & Political Wisdom

Give a man a fish, and you feed him for a day.

Teach a man to fish, and you feed him for a lifetime.

Steal a fish from one guy and give it to another–and keep doing that on a daily basis–and you’ll make the first guy pissed off, but you’ll make the second guy lazy and dependent on you. Then you can tell the second guy that the first guy is greedy for wanting to keep the fish he caught. Then the second guy will cheer for you to steal more fish. Then you can prohibit anyone from fishing without getting permission from you. Then you can expand the racket, stealing fish from more people and buying the loyalty of others. Then you can get the recipients of the stolen fish to act as your hired thugs. Then you can … well, you know the rest.

Larkin Rose

OSINT & Zombie Journals–Part 4

Always keep in mind that, as the old pessimist philosopher Arnold Schopenhauer stated, “The truth will set you free–but first it will make you miserable.”

The RICE Method of Evaluation

Use the RICE method can help you decide on how to respond to information or intelligence:

  • R for reliability. The basic truthfulness or accuracy of the information you are evaluating.
  • I for the importance of the data based upon its relevance
  • C for the cost of the data (and your possible reactions or actions relating to the information)
  • E for the effectiveness of your actions if based upon this information. Would actions based upon this information solve the problems you face?

This evaluation format is useful for summarizing collected data and for analyzing how you might apply the data in a broad range of situations. However, it does not delve into specific metrics used to evaluate a particular piece of information. The next article will provide 13 evaluation metrics to help you do that.

The term metrics means in this context:  a system of related measures that facilitates the quantification of some particular characteristic.

Reliability

Reliability is the basic truthfulness or accuracy of the information. When evaluating data from journals you may not have the technical knowledge to evaluate the content itself. However, you do have the ability to compare the date of publication to the ownership of the journal. You also have the ability to compile a list of the author’s previous journal articles, their date of publication, the names of the journals and the publisher or owner of the journal. You can also identify conferences that the author attended or at which he was a speaker. You can then determine who ran or supported these conferences.

Predatory publishers often create or sponsor conferences to showcase their authors who may pay to be published. The conference may be part of a paid for publication package. Sometimes, they advertise a conference and take in attendance fees, and then the conference never occurs.

It is sometimes difficult to link a conference to a predatory publisher. You have to look at the page source code and the conference domain registrations and the list of speakers to tie the conference to a predatory publisher.

Cost

Cost is an important factor to consider from two perspectives. First, the obvious cost/benefit relationship or more precisely, the question, is this worth the cost? Second, if the data seem too good to be offered free, then you have to ask, why is this free?

As you now know, open access journals don’t make their money from subscriptions. You must uncover and then evaluate the relationship between how the journal makes its money and the authors who produce the content. Of course, this evaluation metric is not unique to academic and open access journals.

Part 5 will discuss a more detailed evaluation matrix.

OSINT & Zombie Journals—Part 3

Understanding the Need for Evaluation Methods & Citations

The basis of your evaluation method is understanding that you cannot know what you don’t know. However, as you come to terms with this, you can guard against the perils of the Dunning-Kruger Effect.

Dunning-Kruger Effect

There is a saying that “you cannot know what you do not know”. This might seem redundant, but it is also true as it might be impossible to identify gaps in our own knowledge. In other words, you cannot teach yourself what you do not know. Without instruction and training, you are very likely to think that you do, in fact, know “everything” you need to know, when you actually do not have the ability to recognize your mistakes – you are unconsciously incompetent. David Dunning and Justin Kruger first tested this phenomenon in a series of experiments in 1999[1].

Typically, the unskilled rate their ability as above average, much higher than it actually is, while the highly skilled underrate their abilities. Confidence is no substitute for skill and knowledge which must be used with confidence to ensure a positive outcome.

The Dunning-Kruger Effect may inhibit proper evaluation of the collected data without an established evaluation method. This article, and those that follow, should help you adopt an effective evaluation process.

Developing the necessary skills and knowledge is not ‘rocket science’, it is ‘time in grade’. You must simply do it, study how to do it better, and network with people who do it. This process takes years of effort but do not give up. I have been doing this type of research for 40 years and I am still learning. Now let’s set about reducing the Dunning-Kruger Effect.

Before beginning the evaluation of the collected data itself, the investigator must prepare accurate citations as the starting point for evaluation. The citation quantifies significant attributes of the data and its source.

Citations

A citation is a reference to a published or unpublished source though not always the original source.

Citations uphold intellectual honesty and avoid plagiarism. They provide attribution to the work and ideas of other people while allowing the reader to weigh the relevance and validity of the source material that the investigator employed.

Regardless of the citation style used, it must include the author(s), date of publication, title, and page numbers. Citations should also include any unique identifiers relevant to the type of material referenced.

The citation style you adopt will depend upon your clientele and the material being reported. If the report will include many citations, you should discuss the issue of citation style with your client before producing the report and your client should be familiar with that style, if at all possible.

Never use footnotes or endnotes for anecdotal information. This avoids having something masquerading as a citation of a source that only provides supplementary information. Supplementary information belongs in the body of the report where it is identified as such.

While doing OSINT, you might find a document from an organization that changes its name before you finish your report. In that case, the document was retrieved before the name change. How do you cite reference? Do you cite it with the old organization name or the new name?

Normal practice is to use the name as it was when you found the document, however, this can cause problems when someone does fact-checking to independently verify the citation. Someone must then find and document the history of the organization name.

The solution is to cite the date the document was retrieved and in square brackets include the new name, for example, [currently, XTS Organization] or better still, [as of 11 Jan 13 the name changed to XTS Organization]. The latter addition to the citation creates a dated history of the organization’s name. The dated history of a journal and its publisher is of critical importance when dealing with journals that die and come back as zombies. It is wise to check Jeffery Bealls’s list of predatory publishers while preparing citations. It is also wise to state when this list was checked in a footnote or in the actual citation as I now do.

Of course, all Web citations must include the date on which the URL was visited for the purpose it is being cited.

Bibliographic Databases

The large bibliographic abstract and citation databases are secondary sources that merely collect journal article abstracts and journal titles without much, or any, vetting of the article or journal.

Elsevier’s Scopus  is one such service, another is the Thompson Reuters Master Journal List. Do not consider either an authoritative source of quality journals or abstracts. Both contain numerous low-quality journals produced by predatory publishers.

Lars Bjørnshauge founded an online index open-access journals in 2003 with 300. Over the next decade, the open-access publishing market exploded. By 2014, the Directory of Open Access Journals (DOAJ) now operated by the non-profit company, IS4OA, had almost 10,000 journals. Today its main problem is not finding new publications to include, but keeping the predatory publishers out.

In 2014, following criticism of its quality-control process, DOAJ began asking all of the journals to reapply based on a stricter inclusion criterion in hopes of weeding-out predatory publishers. However, the question remains, how does DOAJ determine if the publisher is lying?

Attempts to create a ‘whitelist’ of journals seems doomed to failure, especially when attempted by a non-profit using volunteers. Most researchers will judge a journal’s quality by its inclusion in major citation databases, such as Elsevier’s Scopus index, rather than the DOAJ’s list. As you can see, Scopus and the Thompson Reuters Master Journal List are also vulnerable to manipulation by unscrupulous publishers.

Predatory publishers have realised that these lists offer a very low barrier to entry, especially in certain categories. In addition, as such databases are usually subscription services, some publishers advertise certain authors using fake citations supposedly from bibliographic databases knowing that certain commercially valuable demographics never verify these citations.

[1] Kruger, Justin; Dunning, David (1999). “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments”. Journal of Personality and Social Psychology, Vol 77(6), Dec 1999, 1121-1134.  http://psycnet.apa.org/?&fa=main.doiLanding&doi=10.1037/0022-3514.77.6.1121 (3 Oct 2016).

Jamming Remote Controls

The range of issues I get involved in always amazes me. I recently had a client ask me for solutions to garage door and gate remote jammers. Jammers are simple devices that transmit enough radio frequency noise to prevent a legitimate signal from activating the garage door or gate. These remote systems operate in the 300Mhz to 400Mhz range.

It seems an executive was car jacked when his gate remote didn’t work. He got out to unlock the gate manually and his car was stolen. The police speculated that the thief used a radio jammer to prevent the gate opening.

Research into solutions for this led me to jammers for car remotes. Most of these operate in the ranges of 433Mhz, 315Mhz, and 868Mhz.

No practical technical solution exists for this type of attack. My solution was to eliminate the gate and garage door remotes in favour of more advanced access control systems. The car remote is something else. Training people to stop using them is going to be a difficult task, but it may be necessary in some threat environments.

The car remote being jammed may expose the user to robbery, assassination, or abduction due to a delay while trying to open the car door. Worse, if the user doesn’t check if the door is locked after using the remote, then an explosive device could be planted in their vehicle and commanded to explode near a high-value target.

The best solution for this problem is a thorough understanding of the user’s threat environment.

A Brief History of Open Source Intelligence

An article with the above title appeared on the bellingcat site.

It is an excellent article even if I don’t agree that OSINT went into hibernation after WW2. For example, from after WW2 until the Cold War ended, in the US, the Foreign Broadcast Information Service, which is now the Open Source Enterprise and in the UK the BBC Monitoring Service trawled the airwaves, and other open sources regularly publishing transcripts and analysis of what they heard, starting after the war and continues today. There are many other examples.

On the other hand, today’s OSINT is highly influenced by a convergence of technologies. The market penetration of smartphones with 3G connections and the popularity of social media sites is one such convergence of technologies that produces raw data. The other convergence of technology is the availability of inexpensive software and computer hardware to process the raw data for analysis.

OSINT & Zombie Journals—Part 2

The Nature of Sources

Primary & Secondary Sources

An archive is a primary source because the contents are documents usually authored by a person with direct knowledge of the topic; this includes public records completed by the subject.

A library is a secondary source because its documents are created from the primary sources, as are citations, abstracts,  bibliographic databases, etc..

Authoritative Sources

Evaluating the quality of a source is to ask questions like:

  • What is the reputation of the data, and the data-provider (including the publisher)?
  • Has this source of data been cited elsewhere?
  • What is the reliability of the source?
  • How can the source of the information be documented or qualified?
  • Is this a primary source or secondary source?
  • Is this a legally required or legally binding source?

Answers to the above questions should help you find the authoritative source. Zombies are never authoritative sources.

In the next article I will discuss evaluation methods, citations, and bibliographic databases.

Proximity Search on Google-Free Wednesday

The international version of Yandex, the Russian search engine, has a collection of advanced commands that include a proximity operator that is extremely useful for drilling down to what your really want. For example, a search statement might be, “opec & saudi” (in same sentence) or “opec && saudi” (in same page).

There is also an /n operator that enables you to specify that words or phrases must appear within a certain distance of each other. For example, a search statment might be “opec saudi /3

An interesting operator is the non-ranking “and”, which is entered as “<<“: the words after the operator do not affect the ranking of the page in the results.

The search operators are listed at https://yandex.com/support/search/how-to-search/search-operators.xml.

OSINT & Zombie Journals—Part 1

Many scholarly journals are being bought-up by predatory publishers that turn once prestigious journals into publications full of junk science. Usually these publishers turn their acquisitions into free ‘open access’ publications on the Internet that are full of typos, inaccuracies, and even outright fabrications.

One such online publisher, the OMICS Group, is being sued by the U.S. Federal Trade Commission for deceptive practices that include spam emails to solicit articles that are not peer reviewed. This same outfit recently acquired two Canadian medical journal publishers.

From the researcher’s perspective, the most deceptive practice of these free open access journals is the fact that authors pay to have their articles published. The second deceptive practice, according to the FTC, is that such publishers falsely state that their journals are widely cited and included in academic databases. To the contrary, the FTC states that PubMed does not include any of the OMICS titles. The FTC also alleges that the work of authors is sometimes held hostage for payment of undisclosed fees.

When Jeffery Beall, an academic librarian at the University of Colorado, started compiling his list of predatory publishers, he found only 18—that was in 2010. Today, his list has over 1000 publishers.

When a predatory publisher acquires a journal, it ceases to be a scholarly journal and only lives on as something exploited for profit. Such an acquisition ends proper peer review. The journal becomes a zombie.

For the researcher conducting a literature review, the additional time and effort required to vet every article and citation to eliminate zombie journals has increased to nearly unbearable levels. Of course, this is part of the zombie strategy to flood the scholarly journal space with purulent, infectious zombies to kill-off real journals.

Zombie publications are a rising issue for serious researchers. The quality of a literature review affects the quality of the decisions based upon this collected data.

This series of articles is about recognising and avoiding open-source junk. These five articles should help you develop the evaluation skills and processes necessary to avoid falling victim to zombie journals and other forms of diseased data that infects the open-source domain.

Security & Shortened URLs

As we all know, clicking on a link can send us to digital purgatory. While I don’t worry about this when I am working in a VM, I do in a normal browsing session. This hunter doesn’t want to become the hunted.

The best advice, for general browsing, is to use the WOT browser pluggin available for Firefox and Chrome. This will deal with most problem links. While in a VM, I sometimes now do a manual scan of shortened URLs using VirusTotal.

A trusted collegue tells me, “the bad actors are beginning to step up their game now, some actually check the useragent string from the browser and will redirect you to malware and fool the link scanners.”

Hunting Elusive Prey on Instagram

Instagram is a photo-sharing site now owned by Facebook. It has about 400 million users. This often works in consort with Twitter to distribute photos.

I know of no way to search the posts of a user who has made his profile private. For the elusive private user profiles, I reverse the profile picture to find other accounts that use it, and go from there.

If the user updated his profile picture since 2015, then view the image and then remove the ‘s150x150′ from the thumbnail image URL and you may end-up at a full resolution version of the image–reverse search this image to find other social media accounts. The profile page may be private, but any posts that appear in Twitter, Tinder, or elsewhere are not.

Unfortunately, Instagram does not offer a true search facility. To search this, you must rely on traditional search engines and third party apps. For example, in Google and Bing, use the site: command.

The Instagram API was shut down this summer, but fortunately for investigators, this has not affected the third party apps mentioned below.

The Apple-only Photodesk app is the powerhouse for searching and managing Instagram. It allows you to perform the standard Instagram functions of sharing, liking and commenting, but the real value to investigators is the ability to search for content by keyword, tag and username.

When monitoring a current event, Photodesk offers the ability to search by location, and create a geofence around that position. This filters content to show only that within a certain radius showing the results on a map.

If you are not an Apple user, or don’t need the publishing features of Photodesk, then Picodash, which was formerly known as Gramfeed, is an alternative. Picodash offers the same advanced searches for Instagram content as Photodesk, enabling the user to search by hashtag, date/time, keyword, user, and location. I like this because it is easy to get stuff I find from it into a report. Of course, it isn’t free at $8 per month, but I think it’s worth it.

VPN Security & Firefox

When you’re hunting in the digital landscape, you don’t want to stand out like a white lion on the Serengeti.

PeerConnections are enabled by default in Firefox. This is a bad juju for me as enabling this can leak my IP address when using a VPN connection.

In Firefox, go to ‘about:config’ in the address bar. In the config window search for this setting and change it as follows:

  • media.peerconnection.enabled and doubleclick it to change the value to false.

As this is such bad juju, I check this to make sure it is set at false before I start any research project. Of course, I do this because I always use a VPN.

Hunting YouTube Content

A successful hunt for data includes dragging your prey home and preparing it for consumption. If you have a hungry client to feed, then you will have to chop-up your prey into digestible chunks, cook it properly, and then serve it up all pretty-like on a fancy platter, because clients are picky eaters.

Here is what you need to make a delightful repast of what you find on YouTube.

After the disappearance of Google Reader, Feedly became the new standard in RSS readers. However, Feedly is much more than an RSS reader. It allows you to collect and categorize YouTube accounts.

For example, you can monitor the YouTube accounts of politicians, activists, or anybody else who posts a lot of YouTube videos. You get the latest uploads to their YouTube accounts almost instantly. This continuous stream of updated content can be viewed and played in Feedly and does away with individual manual searches of known YouTube accounts.

Of course, Feedly has other uses, but the YouTube use is the greatest time saver. The time saved can be applied to summarizing the video content and analyzing it in terms of how it relates to your client’s objectives.

Inoreader is another feed reader that can organise YouTube account feeds into folders along with a limited number of feeds from Twitter, Facebook, Google+ and VKontakte. It also allows the user to gather bundles of subscriptions into one RSS feed and export them to another platform to go along with the YouTube content.

Just paste the URL of a YouTube video into Amnesty International’s YouTube Dataviewer to extract metadata from the videos. The tool reveals the exact upload time of a video and provides a thumbnail on which you can do a reverse image search. It also shows any other copies of the video on YouTube. Use this to track down the original video and the first instance of the video on YouTube.

A lot of fake videos appear on YouTube. Anything worth reporting needs to be examined to see if it is a possible fake. The Chrome browser extension Frame by Frame lets you change the playback speed or manually play through the frames. While this is the first step in uncovering a fake, it is however, an easy way to extract images from the video for inclusion into a report.

Of course, you will use the Download Helper browser extension, which is available for both Firefox and Chrome, to help download the videos. Just remember to set the maximum number of ‘concurrent downloads’ and ‘maximum varients’ to 20 and check ‘ignore protected varients’ to speed the process.

To make a long list of videos to download, you can use the browser extension, Copy All Links, or Link Klipper or Copy Links in Chrome, to make a list of the links to every video you find. In addition to using this list in your report, you can turn it into an HTML page and then let Download Helper work away on it for hours by downloading all the videos for you.

Collecting all this video is the easy part. Sitting through all of it to extract useful data and then analysing it to see how it helps or hinders your client’s interests is the painful and expensive part, but it is the only way cook-up what the client wants to eat.

Forcing Firefox to Open Links in a New Tab

During a training class I watched everybody trudge around looking for lost search results. They tried reloading results pages, only to get distored results. They kept losing the search engine results page and were getting lost in a sea of tabs. They wanted to know how to get “google search results” to open in a new tab.

Here is my solution for getting tabs to open where I want them to. In Firefox, go to ‘about:config’ in the address bar. In the config window search for these settings and change them as follows:

  • browser.search.openintab – if true, will open a search from the searchbar in a new tab if you use the return key to trigger the search
  • browser.tabs.loadBookmarksInBackground – if true, bookmarks that open in a new tab will not steal focus
  • browser.tabs.loadDivertedInBackground – Load the new tab in the background, leaving focus on the current tab if true
  • browser.tabs.loadInBackground – Do not focus new tabs opened from links (load in background) if true
  • browser.tabs.opentabfor.middleclick – if true, links can be forced to open a new tab if middle-clicked.

This is the type of ‘boring stuff’  that you must master if you want to do Investigative Internet Research and make any money at it. Clients won’t pay for wasted time. You may know where to hunt for data, but you need to also know how to get it into the larder before it goes bad.