Friday, November 19, 2021

UFO data : The KISS principle - AI software versus smart searches

The application of Artificial Intelligence to large data sets offers the potential for considerable insight into data about UFOs - but merely understanding how to perform effective and efficient keyword searching of digitised UFO material (e.g. UFO books, articles, PhD dissertations, case files and a collection of UFO databases) is a very useful first step.

Since an item I posted on the ATS discussion forums in 2011, I've been posting online about the benefits of using free software (such as PDF-Xchange Editor, outlined below) or commercial packages (such as Adobe Acrobat) to conduct offline searches of collections of PDFs.

Simple tools relating to that recommendation have caused less excitement than when I've posted a bit about some of the work I've been doing behind the scenes on using Artificial Intelligence to assist with UFO research. For example, a UFO chat-bot that I posted online to demonstrate the use of Artificial Intelligence to suggest explanations for some UFO reports was referred to by the veteran Spanish UFO researcher Vicente-Juan Ballester-Olmos on his blog as "one of the best, most proactive and original developments in UFO research in the last decades".  But I think my encouragement of the use of simple but effective search tools (and sharing tips on their use) has actually been a more meaningful contribution to ufology.

The simplicity of basic search tools should not cause them to be disregarded. 

Indeed, simplicity has considerable advantages in terms of ease of adoption.  

I'm a firm supporter of the KISS principle [i.e. Keep It Simple, Stupid].

In fact, despite some other people discussing the possibility of using AI software to search/analyse UFO data, I haven't seen many discussions of the numerous analytical traps involved in such searches/analysis or any acknowledgement of the many practical difficulties involved in using such tools.  Issues that I've struggled with for years are generally glossed over in most references to AI tools.

I generally prefer simple tools to protracted discussions of the potential application of AI without anyone demonstrating clearly that they have managed to use such AI tools to make UFO research more efficient or effective.  



In particular, I consider it important to have - and be able to search efficiently and effectively - material on your own hard drive.  The results of such searches can be reviewed much more rapidly than, say, the results of Google searches of online material.  I'd estimate that the speed of reviewing search results is about 10 or 20 times faster with such offline search tools.  

Given that most of us involved in UFO research have limited time, an ability to speed up the reviewing of material by a factor of 10 or 20 is pretty significant.

Back in 2012, I performed some comparisons using various items of search and indexing different software a couple of years ago. I tested several pieces of software to make it easier and faster to find UFO material on my computer. I wanted to see which piece(s) of software were quickest and/or easiest to use to search through the UFO material on my hard-drive.

My collection of digitised material has been growing exponentially in the last few years (to include many books, journals, magazines, official documents, archives of email discussion lists, catalogues, indexes and other material), particularly since I have found a few ways to search this material more efficiently which has caused me to seek to increase my collection of digitised material.

Obviously, I'm not able to put all this material online, for numerous reasons (not the least of which are copyright issues). I have, however, been able to upload over 2 million pages of material after getting relevant permissions.  Moreover, I'm happy to share some tips and techniques which may help others to search their own collections more efficiently and effectively.

I had previously been interested in finding efficient ways of searching for UFO material online. In particular, I spent a fairly considerable amount of time seeking to develop various customised search engines (using, in particular, Google's free Google Custom Search service) to search some of the better UFO websites in a single search. I was not happy with the results of those efforts, particularly because the index used by the Google Custom Search service is more limited than the index used by the main Google search service. Because I was not happy with the results, I will refrain from posting links to the the various customised search engines I made.

Because of the limitations I found with the Google Custom Search and because quite a bit of UFO material is not available online, I turned my focus to searching UFO material on my hard-drive. 

I was very pleased to find software which allowed fast searches of multiple PDF files on my hard-drive (with an ability to specify which file or folders were to be searched).  While various items of software can now perform this task, I have found the free version of the PDF-Xchange Editor to be the fastest and most useful option I have tried.

I subsequently compared that piece of software with the Copernic Desktop Search software (helpfully mentioned to me by Chris Aubeck on the EuroUFO List) and some other indexing software, including DtSearch - recommended to me by Maurizio Verga on the EuroUFO List.

I found both of these pieces of software useful, for different types of searches. However, since 2012 I have increasingly focused on the PDF-Xchange Editor for reasons I outline below.


(1) Cost
The free version of PDF-Xchange Editor does everything I want to use it for (including searching large collections of files) while Copernic Desktop Search is not free to use in relation to collections larger than 2GB. 

The cost of Copernic is not extremely high - but after being used to free searches online I'm sure this cost may deter some people.


(2) Types of files searched:
Copernic Desktop Search is not limited to searching PDF files (and searched, for example, Microsoft Word/Excel files, on my hard-drive) while PDF-Xchange Editor (as its name may imply) is
limited in this way. 

Since (for reasons outline below) I generally prefer using PDF-Xchange Editor, I developed a fairly strong incentive to convert as much digitised material as possible from Word documents etc to PDF format. Some of you will have noticed, for example, that I've been seeking to convert the archives of various email discussion lists to PDF format to enable me to use PDF-Xchange Editor to search those archives (amongst other material).

Of course, it goes without saying (which will not stop it saying it...) but both pieces of software can only search digitised information. Neither is going to help with the piles of books and documents which I haven't scanned. Again, this has given me an incentive to work to increase the amount of UFO material which is digitised. Now, as at 2021, most UFO books, UFO newsletters and many case files have been digitised.


(3) Initial set-up time:
Copernic Desktop Search can take quite a while to produce an initial index. I had to leave one of my computers alone for about 4 days for an index of its 500Gb hard-drive to be compiled.

PDF-Xchange Editor does not create any index - it needs to run through each specified file/folder each time a search is performed. This means it is quicker to set up.


(4) Speed of obtaining search results:

Copernicus is MUCH faster at producing a list of search results. Results are virtually instantaneous.

A search of a sizeable collection using PDF-Xchange Editor can take quite a few minutes (or even hours when I specify a search of my entire collection of UFO material). 

As at 2021, a full search of my higher-priority UFO items (about 1 TB) takes about 24 hours to run.  (I have other hard drives holding over 40 TB of material, but 1TB is sufficient for most of the key text material).



(5) Speed of REVIEWING search results:

I have found it MUCH easier and quicker to go through the results of searches in PDF-Xchange Editor.

The search results in PDF-XChange Editor indicate how many times the relevant keyword or phrase appears in any particular document (with a helpful snippet of surrounding words, which
often allows you to eliminate many of the results) and allows you to click on each one in turn very quickly, with the relevant page being displayed almost instantly.

Trying to review the results of a search on Copernic Desktop Search is, relatively speaking, a pain in the backside. There is a preview window which displays the first relevant occurrence of a keyword/phrase within a document when you highlight that document's filename, but I've found that preview window to be relatively slow and the formatting of text in that preview window is often almost unreadable.



CONCLUSION:
I think that some of the discussions of the potential application of sexy AI tools to UFO data risks causing people to overlook some very simple to use (but nonetheless effective and efficient) search tools - e,g, PDF-XChange Editor.

Heck, I much prefer using some basic search tools (particularly those built into PDF-XChange Editor) even over indexing software such as Copernic Desktop Search. Generally, I'd rather wait a few minutes (or even hours) for PDF-Xchange Editor to produce its search results and then zip through those results very quickly and easily. I've found it possible to use such tools to review relevant search results 10 to 20 times faster than Google searches of online material.

I can start a search on PDF-XChange Editor and carry on with other tasks on my computer (or simply start a search before going to bed or before going out for a meal) and review search results when they are ready. There isn't usually any massive urgency about getting results of a search regarding UFO material, so I tend to use PDF-XChange Editor because reviewing the results of a search takes up less of my (limited...) spare time. 

To some extent, the most appropriate piece of software depends on the type of search - if there are likely to be a lot of results (e.g. for "astronomer" or "meteorologist") then I'd focus on the ease/speed of reviewing results but if I'm not sure there will be many (or any) results then I used to try a quick search using Copernic Desktop Search.  Since it is not always possible to know how many results will be found, I increasingly used only PDF-Xchange Editor.

I'll post more practical detail about using PDF-Xchange Editor (including screen shots) in a separate item shortly, before posting more about various UFO databases and AI tools.  I'd prefer, at least for these initial post, to "Keep It Simple, Stupid".



1 comment:

  1. I just a similar post of yours on ATS yesterday, 11-18-21. I downloaded it, followed the directions you gave and was surprised at what i found in my own records. Thanks for the heads up.

    ReplyDelete