|
Welcome to phraSEarch$™ A Utility for Helping
You Search and Retrieve Information— Not just Links-to-Information |
|
|
User Guide |
|
·
Search engines return links, but now
·
phraSEarch returns a document containing the contexts that surround every search word found, extracted from the source files.
The purpose of phraSEarch$™ is:
$ To Search through a
directory structure (e.g. C:\My Documents, C:\My E-Books, C:\My Clients, C:\My
Genealogy, C:\Recipes, C:\Program Files). It is Ideal for Off-line Browsing of stripped
web sites, and for searching through e-books of classical literature, from
sites such as Project Gutenberg’s 13,000 volumes.
$ To Scan common document types, such as .doc MS Word documents, .htm Web pages, .ppt Power Point presentations,
.txt Text files, .rtf Rich Text Format files, .xml Instant Messaging file
archive, … [Note: .htm
implies .htm .shtml .html .asp and .xml ]
$ To Locate multiple phrases or keywords
within any of the identified files;
$ To Select the context that precedes
and follows those located phrases;
$ To Build a single browseable web
page, made up of the contexts from all files that contain the
phrases or keywords, noting the file names from which the contexts were
selected; and
$ To Highlight words in the
resulting .htm document that the user chooses to
highlight, whether search words or
not.ccc
$ To Archive Search Results
Automatically.
Compared to a traditional search
engine or Microsoft Explorer Search
When searching with search
engines or Windows Explorer one only gets a list of documents that
contain the search object. You must then open all the documents, one by one, and search (again) for your keyword phrases, in each
document.
Now, with phraSEarch$™, you get back a single document (HTML web page)
that contains:
$ The full
path filenames for all files that contain your search
phrases;
$ The immediate
context, before and after the search found phrase; and
$ Highlighted
multiple keyword phrases to help while
browsing the results.
But, There’s More:
$ You can specify the directory structure to be searched;
$ You can specify the amount of context to report, before and
after your search phrases;
$ You can specify the full path for the output results file;
$ You can specify whether to search sub-directories; and
$ You can specify the file types that are of interest (.txt .doc
.ppt .htm .rtf
.xml any)
Example situations suggesting a
search—and the results from phraSEarch$™:
$ From
files in the C:\Program Files directory:
Assume that you are interested in
software in the C:\Program Files
directory that interfaces with SQL databases. You want to scan all of the text
documentation in the C:\Program Files directory, together with its
sub-directories, for the acronym, “SQL,” and highlight in red, all occurrences
of “SQL” and “Database.” You use phraSEarch$™ and your results look like this. Note two entries. The first
entry is the SEARCH.TXT file, showing you the search parameters. Also, in the
third file context from the end, notice how the total size of the context
expands, as additional search phrases are encountered within the context range
of a previous hit.
$ From Off-line Browsing of a Website:
You are interested in the subject of
“statistical validity,” and having searched the Internet for sites with related
content, you are curious to know what a particular site has to offer on the
subject. You search your downloaded website, and produce the following file.
$ From MSN Messenger
conversations: You are on your way to Kiev, Ukraine, and want to
refresh your memory of several MSN Messenger conversations, telling the comings
and goings in Kiev. You search on “Kiev” in your MSN_Messenger directory, only
using .xml files, and produce this file. On
using phraSEarch$™ you find the following:
$ From Private e-Book
Collections:
You have a collection of Classics (Mark Twain, Shakespeare, various Poets, Jane
Austin, Jack London, Cervantes, Louis Carroll, etc.) This collection is on your
computer’s disk drive. You want to find a quotation, and remember that the word
“Vexed” is within the quotation. On using phraSEarch$™ you find the following file.
$ From
Files While Preparing for a Presentation: You must discuss the “Law of Requisite Variety” in a
presentation, tomorrow. You know that you have several web pages on your
website that address the subject. To ensure that you do not overlook one of
your usual examples, you use phraSEarch$™ to scan your personal website files, on disk,
picking out all the pages containing “Requisite
Variety.”
$ From Recipe Files Collected Over
the Years, Or Browsing Offline:
You are having a macadamia nut attack. You know that there is a recipe
for a dark chocolate tart, with bourbon. You remember Martha Stewart’s drooling
over them, so you decide to search your Martha directory. But, you find that
the filenames contain general categories and numbers (like DESS1234). It is phraSEarch$™ time. And, after using phraSEarch$™ you find the recipe’s filename in the following file. However, on inspection, you notice
that the HTML formatting has all been stripped from the context. So, you check
the “Keep HTML Formatting” check box and re-run phraSEarch$™ making the following formatted file. In this file the recipe is
useable—as is. Yum!
$ For Those Who May Be Visually
Impaired With Respect to Color:
Whether you are visually impaired, and have trouble with reds and
greens, or whether you just want to add a different highlighting color to your
output, you have nine highlighting colors from which to choose: Red Purple
Orange Rose
Lavender Lime
Green Aqua and Blue.
$ Those users who find that reading Blue Highlighted Key Phrases
is easier than reading Red Highlighted Key Phrases, etc., may switch colors.
Using the macadamia attack example, you can use the phraSEarch$™ >Options>Preferences>Color
for Highlighting Output pull-down menu to select the color Blue,
with the following results.
Starting the phraSEarch$™ program:
$ Start
the phraSEarch$™ program by either double-clicking the
icon on your computer’s desktop (placed there
at installation) or double-clicking the filename C:\Phrasearch\SEARCH.exe.
$ The
phraSEarch$™ form will appear, as follows
(containing different arguments):

The results from the
Shakespeare search are here.
$ On
startup, depending on your display settings, you may need to increase or
decrease the screen font size. To do this, use the pull-down menu >View>Decrease Form Font Size or >View>Increase Form Font Size. Wait a
second between each increase or decrease. Watch for the labels, down the left
of the form, to assume the proper shape. You may move or resize the form, if
you like.
Setting up the phraSEarch$™ search:
$ If
you have not yet used the Tutorial, we recommend that you use the pull-down
menu >Help>Tutorial early in
your experience.
$ Enter
(or Browse for) the full path in the Search
Off-Line Directory: field.
$ The
phraSEarch$™ program will search sub-directories, if you check
the Sub-Directories? check box.
Consequently, you should not have results files in the directory structure that
you are searching. Otherwise, you may get additional results from the existing
results files.
$ Check
appropriate File Types for your
search. This setting will be used to discriminate between which files in the
directory structure will or will not be searched for search phrases. The
program will NOT search binary files (e.g. .PDF or email inbox files). Please
see Tips for more advanced use of the file
types.
$ Enter
descriptive character strings in the Search
Phrases: field. There are two styles of delimiters. You may use the
familiar “double quotes” for phrases, and separate single search words with
spaces. Or, you may begin each search word or phrase with the Pound Symbol (#), which will be used as a Word/Phrase
Separator. You may enter as many search words or phrases (with imbedded spaces)
as required. The following search strings are equivalent:
·
Mercury Venus Mars Jupiter “By Jove” Saturn Neptune
·
#Mercury#Venus#Mars#Jupiter#By Jove#Saturn#Neptune
$ Enter appropriate numbers of characters in the Characters Before: and Characters After: fields. The phraSEarch$™ program will build a “context” around each search
word or phrase that is found, using the specified number of characters before
and after it. Then, each context is added to the resulting output file. In
cases where the number of characters after a located search phrase includes
another search phrase (same or different), the context is extended from that
new occurrence of a search phrase by the Characters
After value. This eliminates redundant results by consolidating all
phrases found within the user specified values.
$ The phraSEarch$™ program will rubricate (highlight with red characters)
any words or phrases of your choice, that are within the contexts. Search
phrases and words are automatically highlighted. Enter additional descriptive
character strings in the More Phrases to
Highlight: field. Delimit phrases and words as described above. You
may enter as many rubrication words or phrases as desired. The words and
phrases do not have to be related to the search phrases. Imbedded spaces are
OK, e.g.:
·
“by Jupiter” “plated Mars” “Venus’ doves”
·
#by Jupiter#plated Mars#Venus’ doves
Note:
All of the words and phrases from both the Search
Phrases: and the More Phrases to
Highlight: input fields are collected first. Then, three versions of
each word or phrase are formed, e.g.:
·
all of the words and phrases in lowercase;
·
ALL OF THE WORDS AND PHRASES IN UPPERCASE; and
·
All Of The Words And Phrases Capitalized.
Results
containing words and phrases in any of the three forms will be highlighted.
$ Enter
(or Browse for) the full path in the Output
Directory: field.
$ Enter an output filename in the Output
Filename: field.
$ Check the Overwrite OK?:
check box, if it is OK to replace any existing file with the specified filename
with newer search results. If you do not check this box, then a dialog box will
appear to guide you through the event of saving the results.
Executing the phraSEarch$™ search:
$ Start
the search by clicking the Search button or by pressing the Enter key.
$ Watch the Messages:
field while the phraSEarch$™rogram is running. While the software is building very
long directory structures, there will be a moving arrow display
(-----<<<---) in the “Messages” area.
$ In large directory structure searches, there may be a brief delay
between the building of the directory structure and the file scans, while the
software eliminates files that do not match the requested file types.
$ While phraSEarch$™ is searching, the file names that are being
searched display (very rapidly) in the Messages:
field. If you want to know the files that were searched, use the pull-down
menu to toggle the >Options>Preferences from Track Filenames=Off to Track Filenames=On. If =On, then the file
C:/phraSEarch/FileNameLog.txt will contain all the full-path file names of
files searched.
$ You will know that the search is complete when there is a message
similar to the following:
Wrote ( 366kb):
C:\Phrasearch\Topical\Shakespeare_Planets.htm
Viewing the phraSEarch$™ Results:
$ Press the View Results button. You may also use the
Pull-Down Menu option: >View>View Results With Browser.
$ On viewing the results, you will sooner or later have the
experience of recognizing words that you wish you had highlighted. The Re-Highlight button is enabled after each
output file is written. If you add those additional words or phrases to be
highlighted, and click the Re-Highlight
button, the highlighting will be accomplished almost instantaneously, using
those new words and phrases, without having
to re-search the web site or directory structure. This makes the
process of cleaning up the output more efficient and more rewarding.
$ Of course, if you decide that you should have searched on more
words or phrases, you will need to add those words and phrases to the Search Phrase: input area, and click the “Search” button, rather than the “Re-Highlight” button. The program is
efficient, in that it will not have to generate the directory structure on
subsequent searches, having saved that structure for subsequent use.
$ If you have found nothing with your search (i.e. the resulting
file contains only the title), and few (or no) files appeared to have been
searched, as you watched the message area, then, you may have forgotten to
check the Sub-Directories? Check
Box.
$ Since you are in control of the search parameters which define the
number of characters to select for the output document, before and after, there
is a small chance that you will select partial formatting control characters
from searched .htm, .doc, and .rtf
documents. These selected strings, including their (partial) formatting control
characters, will then be placed into the resulting HTM file, and may
inadvertently become part of the formatting that you view when the results are
displayed by your browser. When this happens, you may get strange results. If
you want to remove all HTML formatting strings from the documents that are
searched, then un-check the Keep HTML Formatting Check Box. This will
not preempt the standard formatting (red highlighting, file names, etc…) in the
normal output file. You may also change the Characters
Before: and Characters After:
values in order to avoid or work around control characters from the original
documents.
Technical Support for phraSEarch$™
$ If you wish
to recommend changes, or report errors, we welcome your emailing them to us.
With our apologies, your typing the email address from the following image is
necessary for us to avoid spam email produced by robots:
![]()
$ If the program’s form simply vanishes from your display monitor,
then there is likely a file that contains program control information being
searched, or a file that is too large for the program to handle. There are a
couple of things that you can do to work around this problem. First, use the
Pull-Down Menu item: >Tools>Preferences>Track
Filenames switch that can be toggled on and off. When it displays as
Track Filenames =ON, each
full-path filename searched is added to [C:]\Phrasearch\FileNameLog.txt.
a.
Start the program,
b.
turn on the logging switch, and
c.
re-run the same search.
Use the pull-down menu, >File
>Reload Search, to retrieve the exact same parameters.
After the program’s form vanishes, open
the file: [C:]\Phrasearch\FileNameLog.txt and look at the last filename. If
there are a few files in the list, but it is apparent that the program died
when only partway through the directory, then it is likely that the last file
in the list contains control information that is causing the problem, or that
it is simply too large. Move the file to a different directory (outside of the
directory search path), and try again. This has only been reported once, in
which case it was a 200+MB PowerPoint Slide Show of images from a summer vacation.
If
that does not fix the problem, then try changing the Characters Before: and
Characters After: values. This action may prevent partial HTML
formatting from being used as control information.
If
you wish to help us improve our product, please consider emailing to us the
C:\phraSEarch\SearchParms\SEARCH_xxxx.TXT file, as well as the offending file
that is indicated at the end of the C:\Phrasearch\FileNameLog.txt list. We
cannot guarantee to fix your problem, but we will try.
Thank you for using phraSEarch$™
Copyright © 2004, 2005 APL Consultants of Houston. All Rights Reserved.