Skip to content

Commit d511b3e

Browse files
committed
better parsing for Lords Amemdments
rather than just parsing it all into a single line of text parse all the paragraphs and indents so that we try and retain a bit more structure.
1 parent 9c63f44 commit d511b3e

File tree

1 file changed

+11
-6
lines changed

1 file changed

+11
-6
lines changed

pyscraper/new_hansard.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1663,12 +1663,17 @@ def parse_tabledby(self, tabledby):
16631663
)
16641664

16651665
def parse_amendment(self, amendment):
1666-
self.parse_para_with_member(
1667-
amendment,
1668-
None,
1669-
css_class='italic',
1670-
pwmotiontext='unrecognized'
1671-
)
1666+
# Amendments are often things like:
1667+
#
1668+
# <Amendment><hs_quote><B>54:</B>
1669+
# Clause 67, page 30, line 9, leave out “high” and insert
1670+
# “higher”</hs_quote></Amendment>
1671+
#
1672+
# so we need to parse the tags to make sure we get the
1673+
# indenting etc
1674+
for tag in amendment.getchildren():
1675+
tag_name = self.get_tag_name_no_ns(tag)
1676+
self.handle_tag(tag_name, tag)
16721677

16731678
def parse_clause_heading(self, heading):
16741679
tag = etree.Element('p')

0 commit comments

Comments
 (0)