Posts tagged: xml

Dammit

Ok, so cue this as being slightly more complicated.

I’ve got XML files describing most of what I need. The problem is that it’s only “most.” Example:

  <class>
    <name>Rogue</name>
    <role>Striker. You dart in to attack, do massive damage, and then retreat to safety. You do best when teamed with a defender to flank enemies.</role>
    <powerSource>Martial. Your talents depend on extensive training and constant practice, innate skill, and natural coordination.</powerSource>
    <keyAbilities>Dexterity, Strength, Charisma</keyAbilities>
    <armorProficiencies>Cloth, Leather</armorProficiencies>
    <weaponProficiencies>Dagger, Hand Crossbow, Shuriken, Sling, Short Sword</weaponProficiencies>
    <bonusToDefense>+2 Reflex</bonusToDefense>
    <firstLevelHitPoints>12 + Constitution score</firstLevelHitPoints>
    <firstLevelHitPointScript>12 + root.con.current</firstLevelHitPointScript>
    <hitPointsPerLevel>5</hitPointsPerLevel>
    <healingSurges>6 + root.con.currentMod</healingSurges>
    <trainedSkills><![CDATA[<html>
      Stealth and Thievery. From the class skills list below, choose four more trained skills at 1st level.<br/><i>Class Skills: </i> Acrobatics (Dex), Athletics (Str), Bluff (Cha), Dungeoneering (Wis), Insight (Wis), Intimidate (Cha), Perception (Wis), Stealth (Dex), Streetwise (Cha), Thievery (Dex)</html> 
    ]]></trainedSkills>
    <buildOptions>Brawny rogue, trickster rogue</buildOptions>
    <classFeatures>First Strike, Rogue Tactics, Rogue Weapon Talent, Sneak Attack</classFeatures>
    <modifierSetName>class.rogue</modifierSetName>
  </class>

Or:

<power>
  <name>Positioning Strike</name>
  <macroName>P_E_Positioning_Strike</macroName>
  <level>1</level>
  <source>Rogue</source>
  <flavor>A false stumble and a shove place the enemy exactly where you want him.</flavor>
  <type>Encounter</type>
  <keywords>Martial, Weapon</keywords>
  <action>Standard</action>
  <attackTypeAndRange>'Melee weapon'</attackTypeAndRange>
  <requirementsMet>root.currentWeapon.melee</requirementsMet>
  <target>One creature</target>
  <attack>'Dexterity'</attack>
  <attackModifier>root.dex.currentMod</attackModifier>
  <defense>Will</defense>
  <hit>1[W] + Dexterity damage, and you slide the </hit>
  <hitWeaponDamageMultiplier>1</hitWeaponDamageMultiplier>
  <hitDamageModifier>root.dex.currentMod</hitDamageModifier>
</power>

See anything mentioning paragon paths? What archetypes change the effect (I think that one is Artful Dodger)? Neither do I. I’m doing to end up having to convert the PDFs to HTML and parse those anyway (though not for much, I don’t think). Yay.

Tags: ,

categories General

RosterParserFixed.XmlParser parser = new RosterParserFixed.XmlParser()

Fixed the nesting problem. Fixed item parsing. Item stats for nested ones units show up now. As with the Ruby parser, throw different combinations at it and see what happens.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml;
using System.Xml.Serialization;
using System.Xml.Schema;
using System.Xml.XPath;
 
namespace abparser
{
    class Program
    {
        static void Main(string[] args)
        {
            RosterParserTest.XmlParser parser = new RosterParserTest.XmlParser();
            parser.ParseRoster(@"C:\Temp\de7th.rst", @"C:\Temp\output.xml");
            Console.ReadLine();
        }
    }
}
 
 
 
namespace RosterParserTest
{
    class XmlParser
    {
        static XmlDocument Roster = new XmlDocument();
 
        static XmlElement rootElement = Roster.CreateElement("", "Army", "");
 
        public static string RemoveWhitespace(string str)
        {
            try
            {
                //Ryan's Regex
                return new Regex(@"(\s+|\{.*?\}|\(.*?\)|\/+|\.+)").Replace(str, String.Empty);
            }
            catch (Exception)
            {
                return str;
            }
    }
 
        public void ParseNestedXML(XmlElement thisElement, XmlElement rosterElement)
        {
            bool linkUnitStatsDone = false; //This is a dirty hack.
 
            string replaceMe = thisElement.GetAttribute("name").ToString();
 
            replaceMe = RemoveWhitespace(replaceMe);
 
            XmlElement baseElement;
 
            XmlNodeList linkUnitStatNodeList = thisElement.SelectNodes("./link | ./unitstat");
 
            /*Grab the last of the PascalCase names.  HarGanethExecutioners becomes Executioners
             * SupremeSorceress becomes Sorceress, etc.  Replace the rest of the name with a backreference */
            string regexMatcher = Regex.Replace(replaceMe, @".*?([A-Z][a-z]+)$", "${1}"); 
 
            //This way, it'll actually parse the NodeList for stats in nested things.
            if (Regex.IsMatch(rosterElement.Name.ToString(), regexMatcher)) //So that I don't get duplicate empty nodes.
            {
                baseElement = rosterElement; //Adding to the previous node in the tree.
 
                foreach (XmlElement parseElement in thisElement)
                    {
                        if (parseElement.HasChildNodes && parseElement.InnerXml.Contains("entity"))
                        {
                            ParseNestedXML(parseElement, baseElement); //Parsing out nested.
                        } //if (parseElement.HasChildNodes && parseElement.InnerXml.Contains("entity"))
                        else if (!linkUnitStatsDone)
                        {
                            ParseLinkUnitStats(linkUnitStatNodeList, baseElement);
                            linkUnitStatsDone = true; //Hack implemented.
                        } //else if (!linkUnitStatsDone)
                    } //foreach (XmlElement parseElement in thisElement)
            }
            else
            {
                baseElement = Roster.CreateElement(replaceMe);
 
                foreach (XmlElement parseElement in thisElement)
                {
                     if (parseElement.HasChildNodes && parseElement.InnerXml.Contains("entity"))
                    {
                        ParseNestedXML(parseElement, baseElement); //Whee recursion.
                    } //if (parseElement.HasChildNodes && parseElement.InnerXml.Contains("entity"))
                    else
                    {
                        ParseLinkUnitStats(parseElement, baseElement); //This has always worked.
                    } //else
 
                    rosterElement.AppendChild(baseElement); //Add to the local node.
 
                    rootElement.AppendChild(rosterElement); //Add to the Army node.
                } //foreach (XmlElement parseElement in thisElement)
 
            } //else
 
        } //public void ParseNestedXML(XmlElement thisElement, XmlElement rosterElement)
 
        public void ParseRoster(string path, string output)
        {
            XmlDocument parsingRoster = new XmlDocument();
 
            parsingRoster.Load(path);
 
            XmlNodeList parsingElements = parsingRoster.SelectNodes("/document/squad");
 
            foreach (XmlElement thisElement in parsingElements)
            {
                XmlElement rosterElement = Roster.CreateElement("Unit");
 
                ParseNestedXML(thisElement, rosterElement);
            } //foreach (XmlElement thisElement in parsingElements)
 
            Roster.AppendChild(rootElement);
 
            Roster.Save(output);
        } //public void ParseRoster(string path, string output)
 
        public void ParseLinkUnitStats(XmlElement parseElement, XmlElement baseElement)
        {
 
            foreach (XmlElement correctElement in parseElement)
            {
                if (correctElement.HasAttribute("name"))
                {
                    string subReplaceMe = correctElement.GetAttribute("name").ToString();
 
                    subReplaceMe = RemoveWhitespace(subReplaceMe);
 
                    XmlElement addElement = Roster.CreateElement(subReplaceMe);
                    if (!Regex.Match(subReplaceMe, @"(Left|Worker|Helper|Pts|Coun|Group)").Success)
                    {
                        if (parseElement.HasChildNodes && parseElement.InnerXml.Contains("entity"))
                        {
                            //Console.WriteLine("Found an item (XmlElement)");
                            ParseNestedXML(addElement, correctElement);
                        }
                        else if (correctElement.HasAttribute("description"))
                        {
                            addElement.InnerText = correctElement.GetAttribute("description").ToString();
                        } //if correctElement.HasAttribute("description"))
                        else if (correctElement.HasAttribute("value") && (Regex.IsMatch(correctElement.GetAttribute("value"), @"[^0|-]")))
                        {
                            addElement.InnerText = RemoveWhitespace(correctElement.GetAttribute("value").ToString());
                            baseElement.AppendChild(addElement);
                        } //else if (correctElement.HasAttribute("value"))
                    } //else
                    if (parseElement.HasAttribute("basename"))
                    {
                        /*It's a non-dwarf item.  Whee!  They don't show up in the XmlNodeList one.
                        Get rid of newlines and periods at the end, then set it as the InnerText
                        This doesn't catch cases where the item has other properties inside it, but
                        I haven't seen those */
                        baseElement.InnerText = Regex.Replace(parseElement.GetAttribute("itemsummary"), @"(\\n|\.)", String.Empty);
                    }
                } //if (correctElement.HasAttribute("name")
 
 
            } //foreach (XmlElement correctElement in parseElement)
 
        } //public void ParseLinkUnitStats(XmlElement parseElement, XmlElement baseElement)
 
        public void ParseLinkUnitStats(XmlNodeList parseNodeList, XmlElement baseElement)
        {
            foreach (XmlElement correctElement in parseNodeList)
            {
                if (correctElement.HasAttribute("name"))
                {
                    string subReplaceMe = correctElement.GetAttribute("name").ToString();
                     subReplaceMe = RemoveWhitespace(subReplaceMe);
 
                    if (!Regex.Match(subReplaceMe, @"(Left|Worker|Helper|Pts|Coun|Group)").Success)
                    {
 
                        XmlElement addElement = Roster.CreateElement(subReplaceMe);
                        if (correctElement.HasChildNodes && correctElement.InnerXml.Contains("entity"))
                        {
                            //Console.WriteLine("Found an item (XmlNodeList)");
                            ParseNestedXML(addElement, correctElement);
                        }
 
                        if (correctElement.HasAttribute("description"))
                        {
                            addElement.InnerText = correctElement.GetAttribute("description").ToString();
                            baseElement.AppendChild(addElement);
                        } //if (correctElement.HasAttribute("description"))
                        else if (correctElement.HasAttribute("value") && (Regex.IsMatch(correctElement.GetAttribute("value"), @"[^0|-]")))
                        {
                            addElement.InnerText = RemoveWhitespace(correctElement.GetAttribute("value").ToString());
                            baseElement.AppendChild(addElement);
                        } //else if (correctElement.HasAttribute("value"))
                    } //else
 
                } //if (correctElement.HasAttribute("name"))
 
            } //foreach (XmlElement correctElement in parseNodeList)
 
        } //public void ParseLinkUnitStats(XmlNodeList parseNodeList, XmlElement baseElement)
 
    } //class XmlParser
 
} //namespace RosterParserTest

Tags: , ,

categories General

Ugh.

I’m already not that fond of working with XML in .NET. Here are a couple of fixes:

public static string RemoveWhitespace(string str)
{
    try
    {
        return new Regex(@"(\s+|\{.*?\}|\(.*?\)|\/+|\.+)").Replace(str, String.Empty);
    }
    catch (Exception)
    {
        return str;
    }
}

Which actually gets rid of the crap in the braces, parentheses, etc (as well as getting rid of periods).

Secondly, I loathe empty nodes (stats, etc).

replaceMe = RemoveWhitespace(replaceMe);
Console.WriteLine(replaceMe);
if (replaceMe != String.Empty) 
{
    XmlElement baseElement = Roster.CreateElement(replaceMe);
 
    foreach (XmlElement parseElement in thisElement)
    {
        if (parseElement.HasChildNodes && parseElement.InnerXml.Contains("entity"))
        {
 
            ParseNestedXML(parseElement, baseElement);
        }
        else
        {
            foreach (XmlElement correctElement in parseElement)
            {
                if (correctElement.HasAttribute("name"))
                {
                    string subReplaceMe = correctElement.GetAttribute("name").ToString();
 
                    subReplaceMe = RemoveWhitespace(subReplaceMe);
 
                    XmlElement addElement = Roster.CreateElement(subReplaceMe);
 
                    if (correctElement.HasAttribute("description"))
                    {
                        addElement.InnerText = correctElement.GetAttribute("description").ToString();
                        baseElement.AppendChild(addElement);
                    }
                    //Bye, stats with a value of zero or a hyphen!
                    else if (correctElement.HasAttribute("value") && (Regex.Match(correctElement.GetAttribute("value").ToString(), @"[^0|-]").Success))
                    {
                        addElement.InnerText = correctElement.GetAttribute("value").ToString();
                        baseElement.AppendChild(addElement);
                    }
 
                }
            }
        }
 
        rosterElement.AppendChild(baseElement);
 
        rootElement.AppendChild(rosterElement);
    }
}

I find it kind of ironic that recursion is used after bitching about recursion. I’ll probably take a look at the nesting problems, and whatnot this weekend, assuming I have any time.

I wonder if it’s possible to get a job doing nothing but writing regular expressions…

Tags: , ,

categories General

Parser

Ok, boring. I didn’t spend as much time working on it tonight as I intended to, but it parses the Dwarf roster, at least, fine. It does not parse the Dark Elf roster properly (namely, it doesn’t pull the description out of items or Gifts of Khaine, and it doubles up the <item> tag for reasons I’m not sure of), but that’ll get fixed when I’m at work tomorrow.

Ruby code:

#!/usr/bin/ruby
require "rexml/document"
require "pp"
require "rexml/formatters/default"
include REXML
 
inputxml = File.read('dwarfroster.rst')
@roster = Document.new inputxml
 
@army = Document.new.add_element("army")
 
def parsenested(process, addto)
  #Try to guess if it's a champion, character in the unit, or item
  process.elements.each('entity') do |p|
    #puts p
    if p.elements["link"].has_elements?
      #Recursively run through these to figure out what the hell it is
      if p.elements["link/entity"].attributes["itemsummary"].any?
         adder = addto.add_element("item")
         puts "Found nested\n"
         parsenested(p.elements["link"], adder)
      else
        #This is really just stubbed out, since I haven't seen it
      end
    elsif p.attributes["statset"] =~ /Normal/
      #It's a character, crew, or mount.  Figure out which
      if p.attributes["totalcost"] !~ /^0/
        #It's a champion or character
        adder = addto.add_element("champion")
        puts "Found champ\n"
        parse(p, adder)
      else
        #It's crew or the like
        adder = addto.add_element("crew")
        puts "Found crew\n"
        parse(p, adder)
      end
    else
      #It's an item
      puts "Found item\n"
      if addto.elements["item"].nil?
        @adder = addto.add_element("item")
      end
      added = @adder.add_element(p.attributes["name"].gsub(/\s+/, ''))
      p.elements.each('link') do |ele|
        unless ele.attributes["name"] =~ /(Worker|Helper|Cost|Left)/
          added.add_element(ele.attributes["name"].gsub(/\s+/, '')).add_text(ele.attributes["description"])
        end
      end
    end
  end
end
 
def parse(s, addto)
    #In some cases, the basename differs (i.e. Supreme Sorc vs. High Sorc)
    #Also, it'll pick up whether there's a champion in the unit by the diff
    #of base and count
    %w[basename count base].each do |b|
      if s.attributes[b].any?
        addto.add_element(b).add_text(s.attributes[b])
      end
    end
    stats = addto.add_element("stats")
    s.elements.each('unitstat') do |a|
      #unit.fetch(:stats) { |el| unit[el] = {}}
      #I don't want blank stats
      if a.attributes["value"].any? && (a.attributes["value"] !~ /(0|-)/)
         stats.add_element(a.attributes["name"].gsub(/\s+/, '')).add_text(a.attributes["value"])
      end
    end
    s.elements.each('link') do |link|
      if link.has_elements?
        #Figure out what the hell it is
        parsenested(link, addto)
      else
        #unitatt = unit.add_element("attributes")
        #Rip out the name if it doesn't have "Helper, Worker, Points Left, or Cost"
        unless link.attributes["name"] =~ /(Worker|Helper|Cost|Left)/
          #Get rid of the stuff in braces AB puts in
          if addto.elements["attributes"].nil?
            @unitatt = addto.add_element("attributes")
          end
          @unitatt.add_element(link.attributes["name"].gsub(/\{.*?\}/, '').gsub(/\s+/, '')).add_text('true')
        end
 
      end
    end
end
 
@roster.elements.each('document/roster') do |ele|
  info = @army.add_element("info")
  #Pick out the race, army name, total points, used points, canonical race name
  %w[race size activesize racename].each do |attr|
    info.add_element(attr).add_text(ele.attributes[attr])
  end
  #@army.push(info)
end
 
@roster.elements.each('document/squad') do |ele|
  @unit = @army.add_element("unit")
 
   #Pick out the name of the model and its cost, plus how many models
   %w[name modelcount totalcost].each do |attr|
     @unit.add_element(attr).add_text(ele.attributes[attr])
    end
    ele.elements.each('entity') do |s|
      #Parse it out
      parse(s, @unit)
    end
end
#pp @army
 
prettyprint = REXML::Formatters::Pretty.new
output = String.new
puts prettyprint.write(@army, output)

And the XML output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<army>
	<info>
		<race>Dwarf</race>
		<size>1500</size>
		<activesize>1499.</activesize>
		<racename>Dwarfs</racename>
	</info>
	<unit>
		<name>Thane</name>
		<modelcount>1</modelcount>
		<totalcost>134</totalcost>
		<basename>Thane</basename>
		<count>1</count>
		<base>1</base>
		<stats>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>3+</Save>
			<St>4/8</St>
			<To>5</To>
			<UnitSt.>1</UnitSt.>
			<WS>6</WS>
			<Wo>2</Wo>
			<At>3</At>
			<BS>4</BS>
			<In>3</In>
			<ItemPts>75</ItemPts>
		</stats>
		<attributes>
			<General>true</General>
			<HandWeapon>true</HandWeapon>
			<GreatWeapon>true</GreatWeapon>
			<GromrilArmor>true</GromrilArmor>
		</attributes>
		<item>
			<RunicWeapon>
				<MasterRuneofKraggtheGrim>Allows other runes to be placed on a Great Weapon.</MasterRuneofKraggtheGrim>
				<RuneofCleaving>+1 Strength</RuneofCleaving>
			</RunicWeapon>
			<RunicArmor>
				<RuneofStone>+1 Armor Save</RuneofStone>
			</RunicArmor>
		</item>
	</unit>
	<unit>
		<name>Thane</name>
		<modelcount>1</modelcount>
		<totalcost>132</totalcost>
		<basename>Thane</basename>
		<count>1</count>
		<base>1</base>
		<stats>
			<In>3</In>
			<ItemPts>75</ItemPts>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>2+/1+</Save>
			<St>4/7</St>
			<To>5</To>
			<UnitSt.>1</UnitSt.>
			<WS>6</WS>
			<Wo>2</Wo>
			<At>3</At>
			<BS>4</BS>
		</stats>
		<attributes>
			<HandWeapon>true</HandWeapon>
			<GromrilArmor>true</GromrilArmor>
			<Shield>true</Shield>
		</attributes>
		<item>
			<RunicWeapon>
				<RuneofCleaving>+1 Strength</RuneofCleaving>
			</RunicWeapon>
			<RunicArmor>
				<RuneofStone>+1 Armor Save</RuneofStone>
			</RunicArmor>
		</item>
	</unit>
	<unit>
		<name>Thane</name>
		<modelcount>1</modelcount>
		<totalcost>95</totalcost>
		<basename>Thane</basename>
		<count>1</count>
		<base>1</base>
		<stats>
			<In>3</In>
			<ItemPts>75</ItemPts>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>3+</Save>
			<St>4</St>
			<To>5</To>
			<UnitSt.>1</UnitSt.>
			<WS>6</WS>
			<Wo>2</Wo>
			<At>3</At>
			<BS>4</BS>
		</stats>
		<attributes>
			<HandWeapon>true</HandWeapon>
			<GromrilArmor>true</GromrilArmor>
			<BattleStandardBearer>true</BattleStandardBearer>
		</attributes>
		<item>
			<RunicArmor>
				<RuneofStone>+1 Armor Save</RuneofStone>
			</RunicArmor>
		</item>
	</unit>
	<unit>
		<name>Dwarf Warriors</name>
		<modelcount>20</modelcount>
		<totalcost>205</totalcost>
		<basename>Dwarf Warriors</basename>
		<count>19</count>
		<base>20</base>
		<stats>
			<In>2</In>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>4+/3+</Save>
			<St>3</St>
			<To>4</To>
			<UnitSt.>1</UnitSt.>
			<WS>4</WS>
			<Wo>1</Wo>
			<At>1</At>
			<BS>3</BS>
		</stats>
		<champion>
			<basename>Veteran</basename>
			<count>1</count>
			<base>1</base>
			<stats>
				<In>2</In>
				<Ld>9</Ld>
				<Mv>3</Mv>
				<Save>4+/3+</Save>
				<St>3</St>
				<To>4</To>
				<UnitSt.>1</UnitSt.>
				<WS>4</WS>
				<Wo>1</Wo>
				<At>2</At>
				<BS>3</BS>
			</stats>
			<attributes>
				<HandWeapon>true</HandWeapon>
				<HeavyArmor>true</HeavyArmor>
				<Shield>true</Shield>
			</attributes>
		</champion>
		<attributes>
			<Musician>true</Musician>
			<StandardBearer>true</StandardBearer>
			<HandWeapon>true</HandWeapon>
			<HeavyArmor>true</HeavyArmor>
			<Shield>true</Shield>
		</attributes>
	</unit>
	<unit>
		<name>Quarellers</name>
		<modelcount>10</modelcount>
		<totalcost>110</totalcost>
		<basename>Quarrellers</basename>
		<count>10</count>
		<base>10</base>
		<stats>
			<In>2</In>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>6+</Save>
			<St>3</St>
			<To>4</To>
			<UnitSt.>1</UnitSt.>
			<WS>4</WS>
			<Wo>1</Wo>
			<At>1</At>
			<BS>3</BS>
		</stats>
		<attributes>
			<HandWeapon>true</HandWeapon>
			<Crossbow>true</Crossbow>
			<LightArmor>true</LightArmor>
		</attributes>
	</unit>
	<unit>
		<name>Quarellers</name>
		<modelcount>10</modelcount>
		<totalcost>110</totalcost>
		<basename>Quarrellers</basename>
		<count>10</count>
		<base>10</base>
		<stats>
			<In>2</In>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>6+</Save>
			<St>3</St>
			<To>4</To>
			<UnitSt.>1</UnitSt.>
			<WS>4</WS>
			<Wo>1</Wo>
			<At>1</At>
			<BS>3</BS>
		</stats>
		<attributes>
			<HandWeapon>true</HandWeapon>
			<Crossbow>true</Crossbow>
			<LightArmor>true</LightArmor>
		</attributes>
	</unit>
	<unit>
		<name>Ironbreakers</name>
		<modelcount>14</modelcount>
		<totalcost>237</totalcost>
		<basename>Ironbreakers</basename>
		<count>13</count>
		<base>14</base>
		<stats>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>3+/2+</Save>
			<St>4</St>
			<To>4</To>
			<UnitSt.>1</UnitSt.>
			<WS>5</WS>
			<Wo>1</Wo>
			<At>1</At>
			<BS>3</BS>
			<In>2</In>
		</stats>
		<champion>
			<basename>Ironbeard</basename>
			<count>1</count>
			<base>1</base>
			<stats>
				<In>2</In>
				<Ld>9</Ld>
				<Mv>3</Mv>
				<Save>3+/2+</Save>
				<St>4</St>
				<To>4</To>
				<UnitSt.>1</UnitSt.>
				<WS>5</WS>
				<Wo>1</Wo>
				<At>2</At>
				<BS>3</BS>
			</stats>
			<attributes>
				<HandWeapon>true</HandWeapon>
				<GromrilArmor>true</GromrilArmor>
				<Shield>true</Shield>
			</attributes>
		</champion>
		<attributes>
			<Musician>true</Musician>
			<StandardBearer>true</StandardBearer>
			<HandWeapon>true</HandWeapon>
			<GromrilArmor>true</GromrilArmor>
			<Shield>true</Shield>
		</attributes>
		<item>
			<RunicStandard>
				<RuneofStoicism>The unit counts as double its actual Unit Strength.</RuneofStoicism>
			</RunicStandard>
		</item>
	</unit>
	<unit>
		<name>Hammerers</name>
		<modelcount>18</modelcount>
		<totalcost>246</totalcost>
		<basename>Hammerers</basename>
		<count>17</count>
		<base>18</base>
		<stats>
			<Ld>9</Ld>
			<Mv>3</Mv>
			<Save>5+</Save>
			<St>4/6</St>
			<To>4</To>
			<UnitSt.>1</UnitSt.>
			<WS>5</WS>
			<Wo>1</Wo>
			<At>1</At>
			<BS>3</BS>
			<In>2</In>
		</stats>
		<champion>
			<basename>Gate Keeper</basename>
			<count>1</count>
			<base>1</base>
			<stats>
				<In>2</In>
				<Ld>9</Ld>
				<Mv>3</Mv>
				<Save>5+</Save>
				<St>4/6</St>
				<To>4</To>
				<UnitSt.>1</UnitSt.>
				<WS>5</WS>
				<Wo>1</Wo>
				<At>2</At>
				<BS>3</BS>
			</stats>
			<attributes>
				<HandWeapon>true</HandWeapon>
				<GreatWeapon>true</GreatWeapon>
				<HeavyArmor>true</HeavyArmor>
			</attributes>
		</champion>
		<attributes>
			<Musician>true</Musician>
			<StandardBearer>true</StandardBearer>
			<HandWeapon>true</HandWeapon>
			<GreatWeapon>true</GreatWeapon>
			<HeavyArmor>true</HeavyArmor>
			<Stubborn>true</Stubborn>
		</attributes>
	</unit>
	<unit>
		<name>Artillery Battery</name>
		<modelcount>4</modelcount>
		<totalcost>45</totalcost>
		<basename>Bolt Thrower</basename>
		<count>1</count>
		<base>1</base>
		<stats>
			<To>7</To>
			<UnitSt.>3</UnitSt.>
			<Wo>3</Wo>
		</stats>
		<crew>
			<basename>Crew</basename>
			<count>3</count>
			<base>3</base>
			<stats>
				<In>2</In>
				<Ld>9</Ld>
				<Mv>3</Mv>
				<Save>6+</Save>
				<St>3</St>
				<To>4</To>
				<WS>4</WS>
				<Wo>1</Wo>
				<At>1</At>
				<BS>3</BS>
			</stats>
			<attributes>
				<HandWeapon>true</HandWeapon>
				<LightArmor>true</LightArmor>
			</attributes>
		</crew>
		<attributes>
			<BoltThrower>true</BoltThrower>
		</attributes>
	</unit>
	<unit>
		<name>Artillery Battery</name>
		<modelcount>4</modelcount>
		<totalcost>45</totalcost>
		<basename>Bolt Thrower</basename>
		<count>1</count>
		<base>1</base>
		<stats>
			<To>7</To>
			<UnitSt.>3</UnitSt.>
			<Wo>3</Wo>
		</stats>
		<crew>
			<basename>Crew</basename>
			<count>3</count>
			<base>3</base>
			<stats>
				<In>2</In>
				<Ld>9</Ld>
				<Mv>3</Mv>
				<Save>6+</Save>
				<St>3</St>
				<To>4</To>
				<WS>4</WS>
				<Wo>1</Wo>
				<At>1</At>
				<BS>3</BS>
			</stats>
			<attributes>
				<HandWeapon>true</HandWeapon>
				<LightArmor>true</LightArmor>
			</attributes>
		</crew>
		<attributes>
			<BoltThrower>true</BoltThrower>
		</attributes>
	</unit>
	<unit>
		<name>Airborne Assault</name>
		<modelcount>1</modelcount>
		<totalcost>140</totalcost>
		<basename>Gyrocopter</basename>
		<count>1</count>
		<base>1</base>
		<stats>
			<In>2</In>
			<Ld>9</Ld>
			<Save>4+</Save>
			<St>4</St>
			<To>5</To>
			<UnitSt.>3</UnitSt.>
			<WS>4</WS>
			<Wo>3</Wo>
			<At>2</At>
		</stats>
		<attributes>
			<Flyer>true</Flyer>
		</attributes>
	</unit>
</army>

I’ll probably screw with the code so it outputs something more easily parsed by Dan (for the items and attributes, mainly) at the same time as I fix the Dark Elf parsing (which should also hit the Anvil of Doom problem). Right now it comes out like this:

<item>
      <SacrificialDagger/>
      <PearlofinfiniteBleakeness/>
      <BlackDragonEgg/>
</item>
<item>
        <RuneofKhaine/>
        <TouchofDeath>
          <KillingBlow/>
        </TouchofDeath>
</item>

As you can see, Touch of Death somehow added the name as a subelement, yet I don’t see any substantial differences between the DE roster and the dwarf roster. Still, it’ll get fixed tomorrow (and $deity willing, converted to .NET).

Tags: , , ,

categories General

How not to write an XML file

I’ll be honest. I don’t like XML. I don’t like SOAP (REST is far nicer in my opinion), since it manipulates the HTTP spec to do things it was never meant to do. Raw sockets and bit twiddling seem like a more logical extension, just that port 80 happens to be open on most corporate firewalls, so SOAP and CORBA have taken off. Inasmuch as I may dislike XML, though, it has its uses. Representing a datastream on for sets where CSV doesn’t really make sense, and YAML isn’t available, and it’s not that hard to deal with.

The ArmyBuilder developers seem to have squeezed SGML into an XML doctype somehow, and the roster files are littered with references I can’t quite make out. Yes, ArmyBuilder can export to XML, I guess, but it leads to XSL from hell. In some ways, I would have preferred to rip apart a binary format with a hex editor, as long as the data was formatted logically.

This, for instance:

<link id="dwWarCrew" count="1" actual="1" script="0" sequence="106" pseudo="no" totalcost="0" \ 
name="Crew" category="Equip" visible="no" sourceid="dwBoltThrw" sourceindex="1"></link>

Or this:

<ruleset context="dwSubtype" ruleset="dwDwarves" contextname="Army Subtype" rulesetname="Dwarf Army"/>

Is not formatted logically. The second record, as you can see, uses XML attributes rather than nodes for everything, which kinda defeats the point. XSL to parse ArmyBuilder’s XML output? Ahh…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
	<xsl:variable name="newlinefeed"><![CDATA[
]]></xsl:variable>
	<xsl:variable name="statCountGlobal" select="count(document/definition/stat_def)"/>
 
	<xsl:template match="/">
	<xsl:variable name="namedModelCount" select="document/composition/@model_count"/>
	<xsl:variable name="actualModelCount" select="sum(//regiment/@model_count)"/>
	<xsl:variable name="actualPoints" select="sum(//regiment/@cost[not(contains(.,'['))])"/>
	<xsl:value-of select="concat(/document/summary/@race_name,': ',$actualPoints, /document/definition/@points_abbrev,' - ',$actualModelCount,' ')"/> Models<xsl:value-of select="$newlinefeed"/>
		<xsl:for-each select="/document/composition/comp_entry">
		<xsl:variable name="groupName" select="@group_name"/>
		<xsl:if test="/document/roster/top_level/regiment[@composition = $groupName]">
			<xsl:variable name="unit" select="@group_name"/>
			<xsl:for-each select="/document/roster/top_level/regiment[@composition = $unit]">
				<xsl:apply-templates select="." mode="top_level">
					<xsl:with-param name="regDepth">
						<xsl:choose>
							<xsl:when test="position()=1"><xsl:value-of select="count(/document/roster/top_level[regiment/@composition = $unit]//regiment)"/></xsl:when>
							<xsl:otherwise>0</xsl:otherwise>
						</xsl:choose>
					</xsl:with-param>
				</xsl:apply-templates>
			</xsl:for-each>	
		</xsl:if>
		</xsl:for-each>
	</xsl:template>
	<xsl:template match="regiment" mode="top_level">
		<xsl:param name="regDepth">
			<xsl:value-of select="count(..//regiment)"/>
		</xsl:param>
		<xsl:variable name="statCountLocal" select="count(stat)"/>
		<xsl:variable name="fsib" select="preceding-sibling::node()"/>
		<xsl:variable name="composition" select="@composition"/>
		<xsl:if test="$regDepth > 0">
		<xsl:choose>
			<xsl:when test="not($composition = $fsib/@composition)">
				<xsl:value-of select="$composition"/>(<xsl:value-of select="/document/composition/comp_entry[@group_name = $composition]/@percentage"/>)<xsl:value-of select="$newlinefeed"/>
			</xsl:when>
		</xsl:choose>
		</xsl:if>
		<xsl:value-of select="concat('[',@model_count,'] ')"/>
		<xsl:variable name="itemcost">
			<xsl:if test="@cost"><xsl:value-of select="@cost"/></xsl:if>
		</xsl:variable>
		<xsl:variable name="retinuecost">
			<xsl:call-template name="getRetinueCost">
				<xsl:with-param name="retinuecostSum" select="0"/>
				<xsl:with-param name="current" select="regiment[position()=1]"/>
				<xsl:with-param name="rest" select="regiment[position()!=1]"/>
			</xsl:call-template>
		</xsl:variable>
		<xsl:variable name="transportcost">
			<xsl:if test="regiment[@stat_set = 0]/@cost"><xsl:value-of select="substring-after(substring-before(regiment[@stat_set = 0]/@cost,']'),'[')"/></xsl:if>
		</xsl:variable>
		<xsl:choose>
			<xsl:when test="$retinuecost and @model_count=1 and @composition='HQ'">
					<xsl:value-of select="concat('[',$itemcost - $retinuecost,'] ')"/>
			</xsl:when>
			<xsl:when test="$transportcost != ''">
					<xsl:value-of select="concat('[',$itemcost - $transportcost,'] ')"/>
			</xsl:when>
			<xsl:otherwise>
					<xsl:value-of select="concat('[',$itemcost,'] ')"/>
			</xsl:otherwise>
		</xsl:choose>
 
		<xsl:value-of select="substring-before(name,' (')"/>
		<xsl:if test="@model_count=1 and @composition='HQ'">
			<xsl:text disable-output-escaping="yes">(IC)</xsl:text>
		</xsl:if>
		<xsl:text disable-output-escaping="yes">: </xsl:text>
 
		<xsl:for-each select="item">
			<xsl:variable name="namedItem" select="name"/>
			<xsl:choose>
				<xsl:when test="count(../item[name=$namedItem]) > 1"><xsl:value-of select="concat($namedItem,'(x',count(../item/name[.=$namedItem]),');')"/></xsl:when>
				<xsl:otherwise><xsl:value-of select="concat(name,';')"/></xsl:otherwise>
			</xsl:choose>
		</xsl:for-each>
		<xsl:for-each select="choice"><xsl:value-of select="concat(name,';')"/></xsl:for-each>
		<xsl:value-of select="$newlinefeed"/>
 
		<xsl:for-each select=".//regiment[not(../@category = 'Wargear Item')] ">
			<xsl:apply-templates select="." mode="regiment" />
		</xsl:for-each>
		<xsl:value-of select="$newlinefeed"/>
	</xsl:template>
	<xsl:template match="regiment" mode="regiment">
		<xsl:variable name="statCountLocal" select="count(stat)"/>
		<xsl:variable name="depth">
			<xsl:choose>
				<xsl:when test="../../@stat_count=1"><xsl:value-of select="number(@depth)-1" /></xsl:when>
				<xsl:otherwise><xsl:value-of select="@depth" /></xsl:otherwise>
			</xsl:choose>
		</xsl:variable>
		<xsl:value-of select="concat('[',@model_count,'] ')"/>
 
		<xsl:variable name="itemcost">
			<xsl:if test="@cost"><xsl:value-of select="substring-after(substring-before(@cost,']'),'[')"/></xsl:if>
		</xsl:variable>
		<xsl:variable name="transportcost">
			<xsl:if test="regiment[@stat_set = 0]/@cost"><xsl:value-of select="substring-after(substring-before(regiment[@stat_set = 0]/@cost,']'),'[')"/></xsl:if>
		</xsl:variable>
		<xsl:choose>
			<xsl:when test="((../@composition='HQ' and @depth = 1) or (@depth = 0)) and (regiment[@stat_set = 0])">
					<xsl:value-of select="concat('[',$itemcost - $transportcost,'] ')"/>
			</xsl:when>
			<xsl:when test="@stat_set = 0">
					<xsl:value-of select="concat('[',$itemcost,'] ')"/>
			</xsl:when>
			<xsl:when test="../@composition = 'HQ' and ../@stat_count > 1">
					<xsl:value-of select="concat('[',$itemcost,'] ')"/>
			</xsl:when>
		</xsl:choose>
 
		<xsl:call-template name="formatName"><xsl:with-param name="strName" select="concat(name,': ')"/></xsl:call-template>
 
		<xsl:for-each select="item[not(name = preceding-sibling::item/name)]">
			<xsl:variable name="namedItem" select="name"/>
			<xsl:choose>
				<xsl:when test="count(../item[name=$namedItem]) > 1"><xsl:value-of select="concat($namedItem,'(x',count(../item[name=$namedItem]),');')"/></xsl:when>
				<xsl:otherwise><xsl:value-of select="concat(name,';')"/></xsl:otherwise>
			</xsl:choose>
		</xsl:for-each>
		<xsl:for-each select="choice"><xsl:value-of select="concat(name,';')"/></xsl:for-each>
		<xsl:value-of select="$newlinefeed"/>
 
	</xsl:template>
	<xsl:template name="getRetinueCost">
		<xsl:param name="retinuecostSum" />
		<xsl:param name="current" />
		<xsl:param name="rest" />
		<xsl:variable name="curCost">
			<xsl:choose>
				<xsl:when test="contains($current/@cost,'[')"><xsl:value-of select="substring-after(substring-before($current/@cost,']'),'[')" /></xsl:when>
				<xsl:otherwise><xsl:value-of select="$current" /></xsl:otherwise>
			</xsl:choose>
		</xsl:variable>
		<xsl:choose>
			<xsl:when test="$current">
				<xsl:call-template name="getRetinueCost">
					<xsl:with-param name="retinuecostSum" select="$retinuecostSum + $curCost"/>
					<xsl:with-param name="current" select="$rest[position()=1]"/>
					<xsl:with-param name="rest" select="$rest[position()!=1]"/>
				</xsl:call-template>
			</xsl:when>
			<xsl:otherwise>
				<xsl:value-of select="$retinuecostSum" />
			</xsl:otherwise>
		</xsl:choose>
	</xsl:template>
	<xsl:template name="halfCost">
		<xsl:param name="itemCost"/>
		<xsl:value-of select="round($itemCost div 2)" />
	</xsl:template>
	<xsl:template name="formatName">
		<xsl:param name="strName"/>
		<xsl:choose>
			<xsl:when test="contains($strName,' (')"><xsl:value-of select="substring-before($strName,' (')"/></xsl:when>
			<xsl:otherwise><xsl:value-of select="$strName"/></xsl:otherwise>
		</xsl:choose>
	</xsl:template>
	<xsl:template name="doReplaceCar">
		<xsl:param name="text"/>
		<xsl:param name="replace"/>
		<xsl:param name="by"/>
		<xsl:choose>
			<xsl:when test="contains($text, $replace)">
				<xsl:value-of select="substring-before($text, $replace)" disable-output-escaping="yes"/>
				<xsl:value-of select="$by" disable-output-escaping="yes"/>
				<xsl:call-template name="doReplaceCar">
					<xsl:with-param name="text" select="substring-after($text, $replace)"/>
					<xsl:with-param name="replace" select="$replace"/>
					<xsl:with-param name="by" select="$by"/>
				</xsl:call-template>
			</xsl:when>
			<xsl:otherwise>
				<xsl:value-of select="$text" disable-output-escaping="yes"/>
			</xsl:otherwise>
		</xsl:choose>
	</xsl:template>
</xsl:stylesheet>

No, I am not writing a stylesheet like that again, and the one to parse the roster files would be far more complicated.

The problem with the roster file, fundamentally, is that it’s too tightly linked with ArmyBuilder. That makes sense, in a way, but is still irksome. The <link> elements don’t have any nodes under them, just assloads of attributes, and it’s not easy to figure out which ones I am interested in:

<link id="HeavyArmor" count="1" actual="1" script="0" sequence="26" pseudo="no" totalcost="0" name="Heavy Armor" category="Equip" \
 abbrev="Hv" description="5+ Armor Save" equipment="yes" footnote="yes" sourceid="dwWarrVet" sourceindex="5"></link>

Versus ones I’m not interested in:

<link id="ItemCost" count="1" actual="1" script="0" sequence="28" pseudo="no" totalcost="0" name="Item Cost Worker" category="Equip"\
 visible="no" sourcetype="3" sourceid="Globals" sourceindex="1"></link>

Without passing a long hashlist of element.attribute[$thing] values, or specifically excluding anything with “Helper” or “Worker” or whatever in the name, etc. Not to mention it’s formatted as:

<document>
  <squad>
  <!-- unit name and cost is here -->
    <entity>
    <!-- unit stats are here, along with composition and whatnot -->
      <link>
      <!-- sometimes there's nothing of note in the link tags -->
         <entity>
         <!-- this might be a magic item, warmachine crew, magic banner, champion, and probably other stuff, but is not easily  \
         identified, and there may be more than one -->
            <link>
            <!-- might be info for whatever is in entity, might be a helper which I don't want -->
            </link>
            <unitstat>
            <!-- if it's crew, champion, whatever, stats would be here, but this node may not exist -->
            </unitstat>
        </entity>
      </link>
   </entity>
 </squad>
</document>

The problem with some of these is that by the mantra of whoever wrote ArmyBuilder, champions fall into the “Equip” category. There is, in fact, a “isunit” attribute, but it isn’t set to yes anywhere. Only set to “no” for items, which I can’t figure out (unless there’s some kind of magic item which qualifies as a unit you can add? I don’t know).

I’ve got a parser that works in Ruby written, but I haven’t converted it to C# yet. Also, I’ve not tested it against anything that might have more complicated schema than dwarves: mounted units, chariots, to check if it’s undead/daemon/greenskin and see if special rules apply (since not everything in the army is guaranteed to be), embedded assassins, magic, et al. Sadly, the only roster I’ve got at work is for dwarves, so I’ll have to dump some more output from ArmyBuilder and run the parser against it to see how it handles it.

Any other niche cases either of you can think of that may have specific rules? I’m going to try to stabilize the parser and get it to properly validate every army type, then move it to .NET

Also, thinking about it, I’m utterly convinced that snapping things to some kind of a grid is the only real feasible solution. Querying the object via System.Drawing or GDI might work, but I’m not sure how accurate the pixel mapping is. At any rate, for things like the Lance Formation, line of sight on skirmishers, determining base contact for champions/characters embedded, reforming the unit, and templates, a grid seems like the only way to go without doing occlusion detection (for the templates). Convert inches to millimeters, and make it 1mm x 1mm squares or something.

Tags: , ,

categories General