    I need some regex help for my jQuery Library for SharePoint Web Services , which is certainly for SharePoint!

    I need to be able to parse out the attributes in SCRIPT tags reliably for my SPScriptAudit function. jQuery does an excellent job on SCRIPT in the BODY but won't touch anything in the HEAD, so I need to do more heavy lifting to parse SCRIPTs there.

    So for SCRIPT tags like these:

    <script type="text/javascript" language="javascript" src="/_layouts/1033/init.js?rev=qX%2BG3yl4pldKy9KbPLXf9w%3D%3D"></script>
    <script type="text/javascript" language="javascript" src="/_layouts/1033/core.js?rev=CNBZRdV1h3pKuA7LsMXf3w%3D%3D" defer></script>
    <script type="text/javascript" language="javascript" src="/_layouts/1033/ie55up.js?rev=Ni7%2Fj2ZV%2FzCvd09XYSSWvA%3D%3D"></script>

    I need to be able to get the values of language, type, src, etc. reliably. I'd appreciate help from any regex gurus out there. I think I've reached the point of diminishing returns trying to solve this.

  • First off, i can recommend Expresso to build and test regular expressions.

    Try this:


    Some things to notice:

    • You should run it with Ignore Case, to capture capital script tags.
    • The named capture group "value" contains the value of the attribute
    • The expression takes hand of
      attribute values surrounded by
      both ",' and nothing (eg.
      key="value", key='value',

    [edit: added rough javascript example]

    Hi Marc, added an example of how to use the pattern in javascript. I simplified it a bit, so if you need check for both ",' and no surrounding quotes you need to stuff that in again. Also removed the check for last

    Be aware this is boilerplate code that just show you how to get all hits from the attribute in question with the source to parse as parameter. I did not add any logic on how you would return anything (if you for example wants all (here 3) src attribute values, you could add them to an array and return them).

    function getAttributeValue(attribute, source)
        var regex = RegExp("<script?\\w+(?:\\s+(?:" + attribute + "=\"([^\"]*)\")|[^\\s>]+|\\s+)*>","gi");
        var matches;
        while ( matches = regex.exec(source) )

    hth Anders Rask

    [Edit: NB] OBS: Just for the record, i see several other ways to do this that might be better for your purpose (like DHTML/DOM). Regular Expressions are quite often seen as the silver bullet of string handling. I dont always agree that they are. Parsing HTML can be very tricky, especially because the rules (standards) often are bent (browsers are forgiving), and you can have nested tags, both attributes with double quotes and single quotes or even without quotes at all is legal (even within same element), attribute order is random, html comment tags can contain html elements. For those reasons parsing HTML with RegExp is often a poor choice (especially when we talk JavaScript thats lacking some of the smarter regex functionality like recursion that languages like Perl and .NET has).

