Stripping Non-ASCII Characters within Macro
Hit
once more
with a pesky en-dash issue (likely related to the transcoding between SAS & SQL Server) I discovered today there was no ‘in-built’ way to remove non-ascii (or extended-ascii) characters within SAS.
There is a great SUGI paper about this topic (here) but the approach required the use of a data step. Let me save you some fiddling around if you need
this as a macro capability, with the extract below.
%macro ascii();
%local
i asciichars;
/_ adjust here to include any additional chars _/
%do
i=32
%to
126;
%let
asciichars=&asciichars%qsysfunc
(byte(&i));
%end
;
%str
(&asciichars)
%mend;
/_ store in macvar for efficiency _/
%let ascii*chars=%ascii();
%put &=ascii_chars;
/**
* Example usage within macro language
*/
%put
%sysfunc(compress(my – endash,&ascii_chars,k ));
/**
* Example usage within data step
*/
data
\_null*;
str="goodbye •–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶· nasties"
;
asciichars=symget('ascii_chars'
);
out=compress(str,asciichars,'k'
);
put
out=;
run;
The main gotchas were as follows:
- The characters in byte(3,4,5,12,13) do funny things in macro (open code recursion etc)
- It is not advisable to reference rank() above 127 as this extended set can vary country to country (the byte # may not be the same as the rank #)
- The 32-126 range includes apostrophe and single quote, and thus they need to be handled appropriately!