Overview of the Documents and the Site
- Where did these documents come from?
The vast majority of the documents came from the tobacco industry, and are the same documents available on the industry websites (see www.tobaccoarchives.com).
The Bliley collections came from the commerce committee.
- What is indexing, and who does it?
Indexing is the process of associating additional information with a document, like title, author, subect, and abstract. Some indexing can be very complete, listing every person and organization named in a document. Other indexing is very fast, associating just one or two fields with a document.
Many of the documents have had basic indexing done by the tobacco industry, as part of the trial. However, this indexing lacked abstracts and subject categories. More complete indexing is being done now by research teams funded by the National Cancer Institute and other institutions.
- Do you have a Daily Document Newsletter?
Searching the Collections
- How do I search for more than one term at a time?
- Like many other websites, TDO allows users to search by combining terms with and and or. For example, to search for information about airline smoking bans, you can search for
airline and (restriction or ban)(Note: turning stemming on will return results for airline and airlines, restriction and restrictions, ban and bans, etc.)By default, and is inserted between consecutive words. That is, if you search for
airline smoking banyou will get the same results as if you searched forairline and smoking and banTo search for a phrase, use quotation marks ("). That is, a search for"airline smoking ban"will find only documents in which those words appear next to each other and in order.To find words near each other, use the w/ command. For example
airline w/10 banfinds all documents where the word airline is within 10 words of ban.
- What is stemming, fuzziness, and synonyms?
Each of these broadens your search by taking your search terms and changing them so that you get more results.
Stemming tries variations of your search -- adding "s", "es", "ing", etc. Although it's not perfect, it generally finds most variations of English word forms.
Fuzzy Search The OCR of the image is often slightly incorrect, but close. By turning fuzziness on, you can find words that are close but not exactly the same as your search term.
Synonyms Synonyms will automatically expand your search to include similar terms. For example, if you search for "youth" with synonym expansion on, it will find documents with "youth", "teen", "teenager", and "adolescent". Or if you search for "latino", it will also find "hispanic".
- What does checking "All Details" in the search form do?
It turns much more detail about the documents you've searched for. If you don't check it, it just returns basic information -- the title, number of pages, author, recipient, type and characteristics, plus any field that matches your search term (e.g. the Named Organizations field, if any of your search terms show up there).
By default this is off, since these records can be quite long.
- What does checking "Show First Page" in the search form do?
It turns shows the first page of the document integrated with your search results. This can save lots of time, because you don't need to click on the document to see if it's relevant (if you can find that out from the first page).
- How can I restrict my search to words that are part of an author's name or a title? How can I search for a document from a particular date?
To search for words within a particular field, precede the word with the field name and a colon, like this:
author:smith title:analysis date:19811001Dates are represented as 8-digit numbers in the format YYYYMMDD. You can use wildcards if you don't have an exact date. For example, to search for documents from October 1981 or from all of 1981, you can use these criteria:date:198110* date:1981*These field names are available, though some are rare or not very useful:abstract fileset_code privilege additive file_code_begin product_type affiliation file_code_end project alias file_number prototype area full_text publication attachment grant_number quotes attendee hypothesis rank author import_collection_code recipient bates_begin import_document_code referenced_document bates_end indexed_date region box_number indexer_email relevant_pages brand index_status request case_code intended resource_code changes issue restricted characteristic job_title results client keyword role collection_code language side comment lawyers_present site company litigation smoke_constituent component location source components location strategy copied major_subject subject court_reporter marketing_type synonyms date master_begin target_market date_loaded master_end team date_produced master_id technology depository_date message testimony_date description minor_subject thesaurus_term document_code named title document_file notes tobacco_type ending_date original_file type exhibits page_count url expertise page_range witness fact_type payment witness_type
Abstracting and Indexing
- how do I sign up an indexer?
Administration
- How can I change the order of my fields for editing and display?
Go to the "Configure Fields" page under "Administration". You will see a list of the fields that are currently active. The number to the left is the order number (default: 50). Simply edit the fields, one at a time, and pick numbers for the "Edit Order" field that indicate the order.
Note: the fields shouldn't be sequential (1,2,3,4), because then it's too hard to insert one field between two others. Instead, pick numbers like 10,20,30, so that if you ever want to move a field between two others you have some room to do so.
- If I have deposition transcripts that seem to be missing on your site, can I e-mail them to you and have you add them to the DATTA collection?
We'll try, send it to keith@tobacco.org, please include where you got the transcript from, case name, etc.
- What is the Daily Document Newsletter?