Package org.apache.any23.extractor.html
Class EncodingTest
- java.lang.Object
-
- org.apache.any23.AbstractAny23TestBase
-
- org.apache.any23.extractor.html.EncodingTest
-
public class EncodingTest extends AbstractAny23TestBase
Test class to ensure behaviors ofHTMLDocument
parser with encoding corner cases.
-
-
Field Summary
-
Fields inherited from class org.apache.any23.AbstractAny23TestBase
tempDirectory, testFolder
-
-
Constructor Summary
Constructors Constructor Description EncodingTest()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
testEncodingHTML_ISO_8859_1()
void
testEncodingHTML_UTF_8()
void
testEncodingHTML_UTF_8_DeclarationAfterTitle()
Known issue: NekoHTML does not auto-detect the encoding, but relies on the explicitly specified encoding (via XML declaration or HTTP-Equiv meta header).void
testEncodingXHTML_ISO_8859_1()
void
testEncodingXHTML_UTF_8()
-
Methods inherited from class org.apache.any23.AbstractAny23TestBase
copyResourceToTempFile, getDocumentSourceFromResource, getDocumentSourceFromResource, setUp
-
-
-
-
Method Detail
-
testEncodingHTML_ISO_8859_1
public void testEncodingHTML_ISO_8859_1() throws Exception
- Throws:
Exception
-
testEncodingHTML_UTF_8_DeclarationAfterTitle
public void testEncodingHTML_UTF_8_DeclarationAfterTitle() throws Exception
Known issue: NekoHTML does not auto-detect the encoding, but relies on the explicitly specified encoding (via XML declaration or HTTP-Equiv meta header). If the meta header comes *after* the title element, then NekoHTML will not use the declared encoding for the title. For this test we expect to not recognize the title.- Throws:
Exception
- if there is an error asserting the test data.
-
testEncodingXHTML_ISO_8859_1
public void testEncodingXHTML_ISO_8859_1() throws Exception
- Throws:
Exception
-
-