WebArchiveX Appendices


Appendix A - History

Have any question? Please don't hesitate to contact our Tech. Support.
Kindly email your comments and suggestions to webmaster at csystems.co.il

VersionDateDetails
6.0September 2006
  • Rewrite to use the faster and more stable WinHttp instead of WinInet
  • Optimize for performance in high stress environment
  • Support user authentication for web sites and proxies
  • Add ZipFile method
  • Fix retrieval of custom 404 error pages
  • Find and fix all memory and handle leaks
5.5March 2005
  • Support Windows XP SP2 and the upcoming Windows 2003 SP1
  • Thread pool fully supports also Windows 95/98/Me
  • Support Unicode
  • Use precompiled regular expressions for better performance
  • Automatically detect encoding by parsing META tags and HTTP headers
  • Detect MIME type of redirected URLs (may be different of the original)
  • Fix various small bugs
5.0September 2004
  • Major speedup - archiving is 2 times faster in v5.0
  • Use configurable thread pool
  • Handle content types specified by web server
  • Build archives in one pass (instead of 2 in previous versions)
  • Use memory instead of temporary files
  • Compress archives in ZIP or GZIP
  • Email compressed archives with ArchiveZipAndSend
  • MakeArchiveFromDoc returns path or empty string if prompt canceled
4.6June 2004
  • Support custom user agent (some web servers return different content according to the user agent string)
  • Can retrieve files from Internet via proxy server
  • Add RemoveMimeType and RemoveResourceTag methods
  • Improve spidering mechanism
  • Improve parsing mechanism
4.2December 2003
  • Add WebArchiveLib C++ static library
  • MakeArchiveFromDoc supports self-updateable web archives
  • Support archiving of more formats (e.g. Sound files, PDF, Word etc.)
  • Many useful samples
  • MakeArchiveFromDoc can display "Save As" dialog with a custom title
  • New DelTempFiles property 
  • Improve spidering mechanism
  • Improve parsing mechanism
4.1November 2003
4.0October 2003
3.7September 2003
  • New AddFile method lets you add any file manually 
  • New Initialize method clean up resources and restore initial state
3.6July 2003
3.5June 2003
  • Make MHT from HTML document object using MakeArchiveFromDoc
  • Richer API (control spidering level, server side scripts, link mending etc.)
  • Fix for case sensitive URLs
  • 3.2March 2003
  • Write into ASP Response or any other Stream with MakeArchiveStream
  • Fix for complicated paths
  • 3.1January 2003
    • Fix processing of very large files 
    • Fix table processing
    3.0November 2002
  • Find and process dynamically loaded images
  • Download web site into a single web archive file (a so-called spidering)
  • 2.3July 2002
  • Upgrade for the second release
  • Support logging and extended error reporting
  • Small bug fixes
  • 2.0January 2002
  • Second release of WebArchiveX
  • Performance improvements
  • Various small bug fixes
  • 1.4November 2001
  • Support custom resource tags
  • Support charsets
  • Support MIME types
  • 1.3October 2001
  • Migrate to MSHTML parser
  • Support frames and iframes
  • Bug fixes
  • 1.1March 2001
  • Upgrade for the first release
  • Performance improvements
  • Support custom resource tags, charsets and MIME types
  • 1.0January 2001
  • First release of WebArchiveX
  • Appendix B - Encodings

    The most widely used encodings are:
    Name Charset
    Defaultiso-8859-1
    Central European (Windows)windows-1250
    Cyrillic (Windows)windows-1251
    Greek (Windows)windows-1253
    Hebrew (Windows)windows-1255
    Western European (Windows)windows-1252
    Arabic (Windows)windows-1256

    Default encoding of WebArchiveX is iso-8859-1. For the full list of encodings kindly refer to
    MSDN

    Appendix C - Default MIME Types

    WebArchiveX uses the following default File extension/MIME type associations:
    NameFile extensionMIME Type
    HTML text.htm, .html, .shtmltext/html
    JPEG image.jpg, .jpegimage/jpeg
    GIF image.gifimage/gif
    BMP image.bmpimage/bmp
    CSS style sheet.csstext/css
    JS Java script.jsapplication/x-javascript
    Icon image.icoimage/ico
    VB script.vbsapplication/x-vbscript

    Appendix D - Extended Format Syntax

    The following explanation is copied from Boost.org 

    In format strings, all characters are treated as literals except: ()$\?:
    To use any of these as literals you must prefix them with the escape character \

    The following special sequences are recognized:


    Grouping: 

    Use the parenthesis characters ( and ) to group sub-expressions within the format string, use \( and \) to represent literal '(' and ')'. 

    Sub-expression expansions: 

    The following Perl like expressions expand to a particular matched sub-expression: 

    Exp  Description
    $` Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match.
    $' Expands to all the text from the end of the match to the end of the input string.
    $& Expands to all of the current match.
    $0 Expands to all of the current match.
    $N Expands to the text that matched sub-expression N.

    Conditional expressions:

    Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:

    ?Ntrue_expression:false_expression

    Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.

    Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.

    Escape sequences:

    The following escape sequences are also allowed: 

    Char  Description
    \a The bell character.
    \f The form feed character.
    \n The newline character.
    \r The carriage return character.
    \t The tab character.
    \v A vertical tab character.
    \x A hexadecimal character - for example \x0D.
    \x{} A possible unicode hexadecimal character - for example \x{1A0}
    \cx The ASCII escape character x, for example \c@ is equivalent to escape-@.
    \e The ASCII escape character.
    \dd An octal character constant, for example \10.



    - WebArchiveX API - WebArchiveX Website - C Systems Website


    C Systems - Creative software solutions since 1996. All rights reserved. Terms of use.