WebArchiveX Appendices
Appendix A - History
Have any question? Please don't hesitate to
contact our Tech. Support.
Kindly email your comments and suggestions to webmaster at csystems.co.il
Version | Date | Details |
6.0 | September 2006 |
- Rewrite to use the faster and more stable WinHttp instead of WinInet
- Optimize for performance in high stress environment
- Support user authentication for web sites and proxies
- Add ZipFile method
- Fix retrieval of custom 404 error pages
- Find and fix all memory and handle leaks
|
5.5 | March 2005 |
- Support Windows XP SP2 and the upcoming Windows 2003 SP1
- Thread pool fully supports also Windows 95/98/Me
- Support Unicode
- Use precompiled regular expressions for better performance
- Automatically detect encoding by parsing META tags and HTTP headers
- Detect MIME type of redirected URLs (may be different of the original)
- Fix various small bugs
|
5.0 | September 2004 |
- Major speedup - archiving is 2 times faster in v5.0
- Use configurable thread pool
- Handle content types specified by web server
- Build archives in one pass (instead of 2 in previous versions)
- Use memory instead of temporary files
- Compress archives in ZIP or GZIP
- Email compressed archives with ArchiveZipAndSend
- MakeArchiveFromDoc returns path or empty string if prompt canceled
|
4.6 | June 2004 |
- Support custom user agent (some web servers return different
content according to the user agent string)
- Can retrieve files from Internet via proxy server
- Add RemoveMimeType and RemoveResourceTag
methods
- Improve spidering mechanism
- Improve parsing mechanism
|
4.2 | December 2003 |
- Add WebArchiveLib C++ static library
- MakeArchiveFromDoc supports self-updateable web archives
- Support archiving of more formats (e.g. Sound files, PDF, Word etc.)
- Many useful samples
- MakeArchiveFromDoc can display "Save As" dialog with a custom title
- New DelTempFiles property
- Improve spidering mechanism
- Improve parsing mechanism
|
4.1 | November 2003 |
|
4.0 | October 2003 |
|
3.7 | September 2003 |
- New AddFile method lets you add any file manually
- New Initialize method clean up resources
and restore initial state
|
3.6 | July 2003 |
- When reading input files from URLs:
(1) Handle HTTP
redirection and (2) Get configuration parameters (e.g. proxy) from the Registry
Optional custom temporary directoryOptional
alternative HTLM baseProgress callbacks (custom windows/messages)Major
performance improvements
|
3.5 | June 2003 |
Make MHT from HTML document object using MakeArchiveFromDocRicher API (control spidering level, server side scripts, link mending etc.) Fix for case sensitive URLs
|
3.2 | March 2003 |
Write into ASP Response or any other Stream with MakeArchiveStream Fix for complicated paths
|
3.1 | January 2003 |
- Fix processing of very large files
- Fix table processing
|
3.0 | November 2002 |
Find and process dynamically loaded images Download web site into a single web archive file (a so-called spidering)
|
2.3 | July 2002 | Upgrade for the second release Support logging and extended error reporting Small bug fixes |
2.0 | January 2002 |
Second release of WebArchiveX Performance improvements Various small bug fixes
|
1.4 | November 2001 | Support custom resource tagsSupport charsetsSupport MIME types |
1.3 | October 2001 |
Migrate to MSHTML parserSupport frames and iframesBug fixes
|
1.1 | March 2001 | Upgrade for the first releasePerformance improvementsSupport custom resource tags, charsets and MIME types |
1.0 | January 2001 |
First release of WebArchiveX
|
Appendix B - Encodings
The most widely used encodings are:
Name | Charset |
Default | iso-8859-1 |
Central European (Windows) | windows-1250 |
Cyrillic (Windows) | windows-1251 |
Greek (Windows) | windows-1253 |
Hebrew (Windows) | windows-1255 |
Western European (Windows) | windows-1252 |
Arabic (Windows) | windows-1256 |
Default encoding of WebArchiveX is iso-8859-1. For the full list of encodings kindly refer to
MSDN
Appendix C - Default MIME Types
WebArchiveX uses the following default File extension/MIME type associations:
Name | File extension | MIME Type |
HTML text | .htm, .html, .shtml | text/html |
JPEG image | .jpg, .jpeg | image/jpeg |
GIF image | .gif | image/gif |
BMP image | .bmp | image/bmp |
CSS style sheet | .css | text/css |
JS Java script | .js | application/x-javascript |
Icon image | .ico | image/ico |
VB script | .vbs | application/x-vbscript |
The following explanation is copied from Boost.org
In format strings, all characters are treated as literals except: ()$\?:
To use any of these as literals you must prefix them with the escape character \
The following special sequences are recognized:
Grouping:
Use the parenthesis characters ( and ) to group sub-expressions within the
format string, use \( and \) to represent literal '(' and ')'.
Sub-expression expansions:
The following Perl like expressions expand to a particular matched
sub-expression:
Exp |
Description |
$` |
Expands to all the text from the end of the
previous match to the start of the current match, if there was no
previous match in the current operation, then everything from the start
of the input string to the start of the match. |
$' |
Expands to all the text from the end of the
match to the end of the input string. |
$& |
Expands to all of the current match. |
$0 |
Expands to all of the current match. |
$N |
Expands to the text that matched
sub-expression N. |
Conditional expressions:
Conditional expressions allow two different format strings to be selected
dependent upon whether a sub-expression participated in the match or not:
?Ntrue_expression:false_expression
Executes true_expression if sub-expression N participated in the
match, otherwise executes false_expression.
Example: suppose we search for "(while)|(for)" then the format
string "?1WHILE:FOR" would output what matched, but in upper case.
Escape sequences:
The following escape sequences are also allowed:
Char |
Description |
\a |
The bell character. |
\f |
The form feed character. |
\n |
The newline character. |
\r |
The carriage return character. |
\t |
The tab character. |
\v |
A vertical tab character. |
\x |
A hexadecimal character - for example \x0D. |
\x{} |
A possible unicode hexadecimal character -
for example \x{1A0} |
\cx |
The ASCII escape character x, for example \c@
is equivalent to escape-@. |
\e |
The ASCII escape character. |
\dd |
An octal character constant, for example \10. |
- WebArchiveX
API - WebArchiveX
Website - C Systems Website -
C Systems - Creative software solutions since 1996. All rights reserved. Terms of use.