diff options
author | Raphael Moll <ralf@android.com> | 2010-11-29 15:30:07 -0800 |
---|---|---|
committer | Raphael Moll <ralf@android.com> | 2010-11-30 12:14:24 -0800 |
commit | 28e1cc36ca032615d2a4c48738c1042a4992e321 (patch) | |
tree | be66c37a658d02f5be14253fc70a4a35aa983b07 /docs | |
parent | b281c42014e66e6b9ecef5b1911e51ded8d172cb (diff) | |
download | sdk-28e1cc36ca032615d2a4c48738c1042a4992e321.zip sdk-28e1cc36ca032615d2a4c48738c1042a4992e321.tar.gz sdk-28e1cc36ca032615d2a4c48738c1042a4992e321.tar.bz2 |
Move WST doc to sdk/docs.
Change-Id: I53d7838b557da43a9194ddb13ec1bcd0f1119a36
Diffstat (limited to 'docs')
-rwxr-xr-x | docs/Notes_on_WST_StructuredDocument.txt | 183 |
1 files changed, 183 insertions, 0 deletions
diff --git a/docs/Notes_on_WST_StructuredDocument.txt b/docs/Notes_on_WST_StructuredDocument.txt new file mode 100755 index 0000000..dcf124d --- /dev/null +++ b/docs/Notes_on_WST_StructuredDocument.txt @@ -0,0 +1,183 @@ +Notes on WST StructuredDocument +------------------------------- + +Created: 2010/11/26 +References: WST 3.1.x, Eclipse 3.5 Galileo + +To manipulate XML documents in refactorings, we sometimes use the WST/SEE +"StructuredDocument" API. There isn't exactly a lot of documentation on +this out there, so this is a short explanation of how it works, totally +based on _empirical_ evidence. As such, it must be taken with a grain of salt. + +Examples of usage can be found in + sdk/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/refactorings/ + + +1- Get a document instance +-------------------------- + +To get a document from an existing IFile resource: + + IModelManager modelMan = StructuredModelManager.getModelManager(); + IStructuredDocument sdoc = modelMan.createStructuredDocumentFor(file); + +Note that the IStructuredDocument and all the associated interfaces we'll use +below are all located in org.eclipse.wst.sse.core.internal.provisional, +meaning they _might_ change later. + +Also note that this parses the content of the file on disk, not of a buffer +with pending unsaved modifications opened in an editor. + +There is a counterpart for non-existent resources: + + IModelManager.createNewStructuredDocumentFor(IFile) + +However our goal so far has been to _parse_ existing documents, find +the place that we wanted to modify and then generate a TextFileChange +for a refactoring operation. Consequently this document doesn't say +anything about using this model to modify content directly. + + +2- Structured Document overview +------------------------------- + +The IStructuredDocument is organized in "regions", which are little pieces +of text. + +The document contains a list of region collections, each one being +a list of regions. Each region has a type, as well as text. + +Since we use this to parse XML, let's look at this XML example: + +<?xml version="1.0" encoding="utf-8"?> \n +<resource> \n + <color/> + <string name="my_string">Some Value</string> <!-- comment -->\n +</resource> + + +This will result in the following regions and sub-regions: +(all the constants below are located in DOMRegionContext) + +XML_PI_OPEN + XML_PI_OPEN:<? + XML_TAG_NAME:xml + XML_TAG_ATTRIBUTE_NAME:version + XML_TAG_ATTRIBUTE_EQUALS:= + XML_TAG_ATTRIBUTE_VALUE:"1.0" + XML_TAG_ATTRIBUTE_NAME:encoding + XML_TAG_ATTRIBUTE_EQUALS:= + XML_TAG_ATTRIBUTE_VALUE:"utf-8" + XML_PI_CLOSE:?> + +XML_CONTENT + XML_CONTENT:\n + +XML_TAG_NAME + XML_TAG_OPEN:< + XML_TAG_NAME:resources + XML_TAG_CLOSE:> + +XML_CONTENT + XML_CONTENT:\n + whitespace before color + +XML_TAG_NAME + XML_TAG_OPEN:< + XML_TAG_NAME:color + XML_EMPTY_TAG_CLOSE:/> + +XML_CONTENT + XML_CONTENT:\n + whitespace before string + +XML_TAG_NAME + XML_TAG_OPEN:< + XML_TAG_NAME:string + XML_TAG_ATTRIBUTE_NAME:name + XML_TAG_ATTRIBUTE_EQUALS:= + XML_TAG_ATTRIBUTE_VALUE:"my_string" + XML_TAG_CLOSE:> + +XML_CONTENT + XML_CONTENT:Some Value + +XML_TAG_NAME + XML_END_TAG_OPEN:</ + XML_TAG_NAME:string + XML_TAG_CLOSE:> + +XML_CONTENT + XML_CONTENT: (2 spaces before the comment) + +XML_COMMENT_TEXT + XML_COMMENT_OPEN:<!-- + XML_COMMENT_TEXT: comment + XML_COMMENT_CLOSE:-- + +XML_CONTENT + XML_CONTENT: \n after comment + +XML_TAG_NAME + XML_END_TAG_OPEN:</ + XML_TAG_NAME:resources + XML_TAG_CLOSE:> + +XML_CONTENT + XML_CONTENT: + + +3- Iterating through regions +---------------------------- + +To iterate through all regions, we need to process the list of top-level regions and then +iterate over inner regions: + + for (IStructuredDocumentRegion regions : sdoc.getStructuredDocumentRegions()) { + // process inner regions + for (int i = 0; i < regions.getNumberOfRegions(); i++) { + ITextRegion region = regions.getRegions().get(i); + String type = region.getType(); + String text = regions.getText(region); + } + } + +Each "region collection" basically matches one XML tag, with sub-regions for all the tokens +inside a tag. + +Note that an XML_CONTENT region is actually the whitespace, was is known as a TEXT in the w3c DOM. + +Also note that each outer region has a type, but the inner regions also reuse a similar type. +So for example an outer XML_TAG_NAME region collection is a proper XML tag, and it will contain +an opening tag, a closing tag but also an XML_TAG_NAME that is the tag name itself. + +Surprisingly, the inner regions do not have many access methods we can use on them, except their +type and start/length/end. There are two length and end methods: +- getLength() and getEnd() take any whitespace into account. +- getTextLength() and getTextEnd() exclude some typical trailing whitespace. + +Note that regarding the trailing whitespace, empirical evidence shows that in the XML case +here, the only case where it matters is in a tag such as <string name="my_string">: for the +XML_TAG_NAME region, getLength is 7 (string + space) and getTextLength is 6 (string, no space). +Spacing between XML element is its own collapsed region. + +If you want the text of the inner region, you actually need to query it from the outer region. +The outer IStructuredDocumentRegion (the region collection) contains lots more useful access +methods, some of which return details on the inner regions: +- getText : without the whitespace. +- getFullText : with the whitespace. +- getStart / getLength / getEnd : type-dependent offset, including whitespace. +- getStart / getTextLength / getTextEnd : type-dependent offset, excluding "irrelevant" whitespace. +- getStartOffset / getEndOffset / getTextEndOffset : relative to document. + +Empirical evidence shows that there is no discernible difference between the getStart/getEnd +values and those returned by getStartOffset/getEndOffset. Please abide by the javadoc. + +All offsets start at zero. + +Given a region collection, you can also browse regions either using a getRegions() list, or +using getFirst/getLastRegion, or using getRegionAtCharacterOffset(). Iterating the region +list seems the most useful scenario. There's no actual iterator provided for inner regions. + +There are a few other methods available in the regions classes. This was not an exhaustive list. + + +---- |