Welcome to

phraSEarch$

 

A Utility for Helping You Search and Retrieve Information

Not just Links-to-Information

 

Home

Tips

Feedback

Download

Installation

Website

User Guide

 

 

·        Search engines return links, but now

·         phraSEarch returns a document containing the contexts that  surround every search word found, extracted from the source files.

 

The purpose of phraSEarch$ is:  

$   To Search through a directory structure (e.g. C:\My Documents, C:\My E-Books, C:\My Clients, C:\My Genealogy, C:\Recipes, C:\Program Files). It is Ideal for Off-line Browsing of stripped web sites, and for searching through e-books of classical literature, from sites such as Project Gutenberg’s 13,000 volumes.

$   To Scan common document types, such as .doc MS Word documents, .htm Web pages, .ppt Power Point presentations, .txt Text files, .rtf Rich Text Format files, .xml Instant Messaging file archive, … [Note: .htm implies .htm .shtml .html .asp and .xml ]

$   To Locate multiple phrases or keywords within any of the identified files;

$   To Select the context that precedes and follows those located phrases;

$   To Build a single browseable web page, made up of the contexts from all files that contain the phrases or keywords, noting the file names from which the contexts were selected; and

$   To Highlight words in the resulting .htm document that the user chooses to highlight, whether search words or not.ccc

$   To Archive Search Results Automatically.

 

Compared to a traditional search engine or Microsoft Explorer Search

When searching with search engines or Windows Explorer one only gets a list of documents that contain the search object. You must then open all the documents, one by one, and search (again) for your keyword phrases, in each document.

Now, with phraSEarch$, you get back a single document (HTML web page) that contains:

$   The full path filenames for all files that contain your search phrases;

$   The immediate context, before and after the search found phrase; and

$   Highlighted multiple keyword phrases to help while browsing the results.

But, There’s More:

$   You can specify the directory structure to be searched;

$   You can specify the amount of context to report, before and after your search phrases;

$   You can specify the full path for the output results file;

$   You can specify whether to search sub-directories; and

$   You can specify the file types that are of interest (.txt  .doc  .ppt  .htm  .rtf  .xml  any)

Example situations suggesting a search—and the results from phraSEarch$:

$   From files in the C:\Program Files directory:

          Assume that you are interested in software in the C:\Program Files directory that interfaces with SQL databases. You want to scan all of the text documentation in the C:\Program Files directory, together with its sub-directories, for the acronym, “SQL,” and highlight in red, all occurrences of “SQL” and “Database.” You use phraSEarch$ and your results look like this. Note two entries. The first entry is the SEARCH.TXT file, showing you the search parameters. Also, in the third file context from the end, notice how the total size of the context expands, as additional search phrases are encountered within the context range of a previous hit.

$   From Off-line Browsing of a Website: 

          You are interested in the subject of “statistical validity,” and having searched the Internet for sites with related content, you are curious to know what a particular site has to offer on the subject. You search your downloaded website, and produce the following file.

$   From MSN Messenger conversations: You are on your way to Kiev, Ukraine, and want to refresh your memory of several MSN Messenger conversations, telling the comings and goings in Kiev. You search on “Kiev” in your MSN_Messenger directory, only using .xml files, and produce this file. On using phraSEarch$ you find the following:

$   From Private e-Book Collections: You have a collection of Classics (Mark Twain, Shakespeare, various Poets, Jane Austin, Jack London, Cervantes, Louis Carroll, etc.) This collection is on your computer’s disk drive. You want to find a quotation, and remember that the word “Vexed” is within the quotation. On using phraSEarch$ you find the following file.

$   From Files While Preparing for a Presentation:  You must discuss the “Law of Requisite Variety” in a presentation, tomorrow. You know that you have several web pages on your website that address the subject. To ensure that you do not overlook one of your usual examples, you use phraSEarch$ to scan your personal website files, on disk, picking out all the pages containing “Requisite Variety.

$   From Recipe Files Collected Over the Years, Or Browsing Offline:  You are having a macadamia nut attack. You know that there is a recipe for a dark chocolate tart, with bourbon. You remember Martha Stewart’s drooling over them, so you decide to search your Martha directory. But, you find that the filenames contain general categories and numbers (like DESS1234). It is phraSEarch$ time. And, after using phraSEarch$ you find the recipe’s filename in the following file. However, on inspection, you notice that the HTML formatting has all been stripped from the context. So, you check the “Keep HTML Formatting” check box and re-run phraSEarch$ making the following formatted file. In this file the recipe is useable—as is. Yum!

$   For Those Who May Be Visually Impaired With Respect to Color:  Whether you are visually impaired, and have trouble with reds and greens, or whether you just want to add a different highlighting color to your output, you have nine highlighting colors from which to choose: Red  Purple  Orange  Rose  Lavender  Lime  Green  Aqua  and Blue.

$   Those users who find that reading Blue Highlighted Key Phrases is easier than reading Red Highlighted Key Phrases, etc., may switch colors. Using the macadamia attack example, you can use the phraSEarch$ >Options>Preferences>Color for Highlighting Output pull-down menu to select the color Blue, with the following results.  

 

Starting the phraSEarch$ program:

$   Start the phraSEarch$ program by either double-clicking the  icon on your computer’s desktop (placed there at installation) or double-clicking the filename C:\Phrasearch\SEARCH.exe.

$   The phraSEarch$  form will appear, as follows (containing different arguments):

            The results from the Shakespeare search are here.

$   On startup, depending on your display settings, you may need to increase or decrease the screen font size. To do this, use the pull-down menu >View>Decrease Form Font Size or >View>Increase Form Font Size. Wait a second between each increase or decrease. Watch for the labels, down the left of the form, to assume the proper shape. You may move or resize the form, if you like.

 

Setting up the phraSEarch$ search:

$   If you have not yet used the Tutorial, we recommend that you use the pull-down menu >Help>Tutorial early in your experience.

$   Enter (or Browse for) the full path in the Search Off-Line Directory: field.

$   The phraSEarch$ program will search sub-directories, if you check the Sub-Directories? check box. Consequently, you should not have results files in the directory structure that you are searching. Otherwise, you may get additional results from the existing results files.

$   Check appropriate File Types for your search. This setting will be used to discriminate between which files in the directory structure will or will not be searched for search phrases. The program will NOT search binary files (e.g. .PDF or email inbox files). Please see Tips for more advanced use of the file types.

$   Enter descriptive character strings in the Search Phrases: field. There are two styles of delimiters. You may use the familiar “double quotes” for phrases, and separate single search words with spaces. Or, you may begin each search word or phrase with the Pound Symbol (#), which will be used as a Word/Phrase Separator. You may enter as many search words or phrases (with imbedded spaces) as required. The following search strings are equivalent:

·         Mercury Venus Mars Jupiter “By Jove” Saturn Neptune

·         #Mercury#Venus#Mars#Jupiter#By Jove#Saturn#Neptune 

$   Enter appropriate numbers of characters in the Characters Before: and Characters After: fields. The phraSEarch$ program will build a “context” around each search word or phrase that is found, using the specified number of characters before and after it. Then, each context is added to the resulting output file. In cases where the number of characters after a located search phrase includes another search phrase (same or different), the context is extended from that new occurrence of a search phrase by the Characters After value. This eliminates redundant results by consolidating all phrases found within the user specified values.

$   The phraSEarch$ program will rubricate (highlight with red characters) any words or phrases of your choice, that are within the contexts. Search phrases and words are automatically highlighted. Enter additional descriptive character strings in the More Phrases to Highlight: field. Delimit phrases and words as described above. You may enter as many rubrication words or phrases as desired. The words and phrases do not have to be related to the search phrases. Imbedded spaces are OK, e.g.:

·         “by Jupiter” “plated Mars” “Venus’ doves”

·         #by Jupiter#plated Mars#Venus’ doves

Note: All of the words and phrases from both the Search Phrases: and the More Phrases to Highlight: input fields are collected first. Then, three versions of each word or phrase are formed, e.g.:

·        all of the words and phrases in lowercase;

·        ALL OF THE WORDS AND PHRASES IN UPPERCASE; and

·        All Of The Words And Phrases Capitalized.

Results containing words and phrases in any of the three forms will be highlighted.

$   Enter (or Browse for) the full path in the Output Directory: field.

$   Enter an output filename in the Output Filename: field.

$   Check the Overwrite OK?: check box, if it is OK to replace any existing file with the specified filename with newer search results. If you do not check this box, then a dialog box will appear to guide you through the event of saving the results.

 

Executing the phraSEarch$ search:

$   Start the search by clicking the Search button or by pressing the Enter key.

$   Watch the Messages: field while the phraSEarch$rogram is running. While the software is building very long directory structures, there will be a moving arrow display (-----<<<---) in the “Messages” area.

$   In large directory structure searches, there may be a brief delay between the building of the directory structure and the file scans, while the software eliminates files that do not match the requested file types. 

$   While phraSEarch$ is searching, the file names that are being searched display (very rapidly) in the Messages: field. If you want to know the files that were searched, use the pull-down menu  to toggle the >Options>Preferences from Track Filenames=Off to Track Filenames=On. If =On, then the file C:/phraSEarch/FileNameLog.txt will contain all the full-path file names of files searched. 

$   You will know that the search is complete when there is a message similar to the following:

            Wrote ( 366kb):  C:\Phrasearch\Topical\Shakespeare_Planets.htm

 

Viewing the phraSEarch$Results:

$   Press the View Results button. You may also use the Pull-Down Menu option:  >View>View Results With Browser.

$   On viewing the results, you will sooner or later have the experience of recognizing words that you wish you had highlighted. The Re-Highlight button is enabled after each output file is written. If you add those additional words or phrases to be highlighted, and click the Re-Highlight button, the highlighting will be accomplished almost instantaneously, using those new words and phrases, without having to re-search the web site or directory structure. This makes the process of cleaning up the output more efficient and more rewarding.

$   Of course, if you decide that you should have searched on more words or phrases, you will need to add those words and phrases to the Search Phrase: input area, and click the “Search” button, rather than the “Re-Highlight” button. The program is efficient, in that it will not have to generate the directory structure on subsequent searches, having saved that structure for subsequent use. 

$   If you have found nothing with your search (i.e. the resulting file contains only the title), and few (or no) files appeared to have been searched, as you watched the message area, then, you may have forgotten to check the Sub-Directories? Check Box.

$   Since you are in control of the search parameters which define the number of characters to select for the output document, before and after, there is a small chance that you will select partial formatting control characters from searched .htm, .doc, and .rtf documents. These selected strings, including their (partial) formatting control characters, will then be placed into the resulting HTM file, and may inadvertently become part of the formatting that you view when the results are displayed by your browser. When this happens, you may get strange results. If you want to remove all HTML formatting strings from the documents that are searched, then un-check the Keep HTML Formatting Check Box. This will not preempt the standard formatting (red highlighting, file names, etc…) in the normal output file. You may also change the Characters Before: and Characters After: values in order to avoid or work around control characters from the original documents.

Technical Support for phraSEarch$

$   If you wish to recommend changes, or report errors, we welcome your emailing them to us. With our apologies, your typing the email address from the following image is necessary for us to avoid spam email produced by robots:

 

                                              

$   If the program’s form simply vanishes from your display monitor, then there is likely a file that contains program control information being searched, or a file that is too large for the program to handle. There are a couple of things that you can do to work around this problem. First, use the Pull-Down Menu item: >Tools>Preferences>Track Filenames switch that can be toggled on and off. When it displays as Track Filenames =ON, each full-path filename searched is added to [C:]\Phrasearch\FileNameLog.txt.

a. Start the program,

b. turn on the logging switch, and

c. re-run the same search.

         Use the pull-down menu, >File >Reload Search, to retrieve the exact same parameters.

         After the program’s form vanishes, open the file: [C:]\Phrasearch\FileNameLog.txt and look at the last filename. If there are a few files in the list, but it is apparent that the program died when only partway through the directory, then it is likely that the last file in the list contains control information that is causing the problem, or that it is simply too large. Move the file to a different directory (outside of the directory search path), and try again. This has only been reported once, in which case it was a 200+MB PowerPoint Slide Show of images from a summer vacation.

If that does not fix the problem, then try changing the Characters Before: and Characters After: values. This action may prevent partial HTML formatting from being used as control information. 

If you wish to help us improve our product, please consider emailing to us the C:\phraSEarch\SearchParms\SEARCH_xxxx.TXT file, as well as the offending file that is indicated at the end of the C:\Phrasearch\FileNameLog.txt list. We cannot guarantee to fix your problem, but we will try.

Thank you for using phraSEarch$

Copyright © 2004, 2005 APL Consultants of Houston. All Rights Reserved.