Updated 2010-11-01 19:03:20 by tomk

tomk

The following procedure will remove C style comments (i.e. /* .... */ ) from text.
 proc removeComments { text {replacement ""} } {
    regsub -all {[/][*].*?[*][/]} $text ${replacement} text
    return $text
 }

If you need to remove C style comments that are imbedded (i.e. /* ... /* ... */ ... */) use the following procedure.
 proc removeImbeddedComments { text {replacement ""} } {
     set text [string map  {"/*" \x80 "*/" \x81} $text]
     while {[regsub -all {\x80[^\x80\x81]*?\x81} $text ${replacement} text]} {continue}
     set text [string map  {\x80 "/*" \x81 "*/"} $text]
     return $text
 }

Use Examples:
 removeComments ${data} "#comment-removed#"

 removeImbeddedComments ${data} "#comment-removed#"

Test Cases:
 ##### Simple Comments #####
 # test-1
 /**/
 /* */
 /* text1 */
 # test-2
 text1 /**/ text2 /* */ text3 /* comment */
 # test-3
 /*
 */
 text1
 /*
     */
 text2
     /*
      */
 # test-4
 text1 /*
 */ text2

 text1 /*
     */ text2
 # test-5
 /* comment
 */
 /*
 comment
 */
 /*
 comment */
 ##### Imbedded Comments #####
 # test-1
 text1 /*/*/**/*/*/ text2
 # test-2
 text1 /*/**//**//*/**//**//**/*/*/ text2
 # test-3
 text1 /* comment /* comment /* comment */ comment */ comment */ text2
 # test-4
 text1
 /*
 text2
 text3 /* comment */
 text4 /*
         comment
         comment /* comment */
         comment
       */
 text5
 */
 text5
 # test-5
 text1
 /*
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  */
 text2
 # test-6
 text1 * / / *
 /*
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  comment ///
     /*
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      comment ///
         /*
          comment ///
          comment ***
          */
      comment ***
      */
  comment ***
  */
 text2
 # test-7 (dangling comments)
 */ /*

Test results from the removeImbeddedComments procedure were as follows.
 ##### Simple Comments #####
 # test-1
 #comment-removed#
 #comment-removed#
 #comment-removed#
 # test-2
 text1 #comment-removed# text2 #comment-removed# text3 #comment-removed#
 # test-3
 #comment-removed#
 text1
 #comment-removed#
 text2
     #comment-removed#
 # test-4
 text1 #comment-removed# text2

 text1 #comment-removed# text2
 # test-5
 #comment-removed#
 #comment-removed#
 #comment-removed#
 ##### Imbedded Comments #####
 # test-1
 text1 #comment-removed# text2
 # test-2
 text1 #comment-removed# text2
 # test-3
 text1 #comment-removed# text2
 # test-4
 text1
 #comment-removed#
 text5
 # test-5
 text1
 #comment-removed#
 text2
 # test-6
 text1 * / / *
 #comment-removed#
 text2
 # test-7 (dangling comments)
 */ /*

Pierre Coueffin (03 Sept. 2005): You do have to be careful if you try to use this on actual comments in C code.

if 0 {
 removeComments {printf ("/* %s */\n", "Comment to print"); /* Prints a comment to stdout */}

returns:
 printf (" \n ", "Comment to print");

where you might expect to see:
 printf ("/* %s */\n", "Comment to print");

}

[tbtietc] - 2009-06-25 11:19:05
regsub -all {('([^\']|[\\].)')|("([^\"]|[\\].)*")|(//[^\n]*)|(/\*([^*]|[*][^/])*\*/)} $text "\\1\\3" text;

This detects:
 A. Character in single quotes
 B. String in double quotes
 C. C style comments.

And replaces:
 A, B with themselves (quotes intact).
 C with null-string (comments deleted).