839: Add support for reference geometry for geo questions by lindsay-stevens · Pull Request #847 · XLSForm/pyxform

lindsay-stevens · 2026-06-17T14:35:04Z

Closes #839

Why is this the best possible solution? Were any other approaches considered?

Initial pass for feedback
Includes refactoring to clarify existing nodeset generation

What are the regression risks?

The refactoring may have missed some usage pattern that is not currently in the test suite.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Spec updates TBC getodk/xforms-spec#343 but outline in linked forum post.

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run python -m unittest and verified all tests pass
run ruff format pyxform tests and ruff check pyxform tests to lint code
verified that any code or assets from external sources are properly credited in comments

- range.py: - pv.validate doesn't change anything so remove assignment. - question.py already processes all attributes in control dict so put the parameters for that location in there. Still useful to have the parsed parameters available separately though. - parameters.py - change signature since validator doesn't change input. - question.py - refactor MultipleChoiceQuestion.build_xml into steps that correspond to the supported options, in a way that can be selectively reused depending on what the itemset use case supports. - simplify RangeQuestion to: - not re-apply parameter attributes already set via control dict. - use the itemset helper to try only internal choices since that seemed to be the intent of the previous code. - remove bind type check since this is set internally, or alternatively if it is user-modifiable then it should be checked earlier and/or have tests added.

- builder.py: - use the presence of just "itemset" as a way to trigger pulling in the choices info, rather than needing to attach choices as well which are ignored in that case anyway. - range.py: - use the above change to simplify the feature signal for the itemset. - question.py - use the pattern in _build_itemset_instance a fall-through default rather than trying to proactively select whether it's needed. - fold the itemset node functionality into a question subclass since that seems a bit neater than using a helper class, and also allows sharing the common __init__ code.

- for geopoint, geoshape, and geotrace, add a parameter 'reference-geo' which accepts a choices list_name, an entities list_name, a reference variable containing the name of a repeat, or the name of an attached CSV file. The parameter value is used to build an itemset node that references source data instance, optionally with a choice_filter.

lognaturel

Thanks so much! Folks have been trying out the beta implementation in Collect and are loving this functionality.

I know this isn't your favorite spec but I like how you've introduced a layer in the question hierarchy for itemset-capable question types, it makes a lot of sense and led to some useful cleanup.

Details below, the main things are:

parameter name should be reference-geometry
any external secondary instance should be supported, regardless of the filename extension
let's drop support for the special itemsets.csv if it's practical

lognaturel · 2026-06-17T19:20:28Z

    return choices
+
+
+def add_choices_info_to_question(


So far everything in validators feels much more directly validation-related. Not a big deal but wanted to mention that this file is starting to feel like it could be a choices feature slice rather than just a validator.

Moved back to xls2json in 3735635 for reasons explained there.

Separately to that yes the sub-package name "validators" is maybe not the most accurate anymore since some of the modules in there deal with processing as well as validation. But at least it breaks down the xls2json logic into digestible pieces, and in the longer term hopefully aligns to doing something like how all the entities stuff is in one package - maybe each question type gets a module (and finally spell the end of question_type_dict 😱), etc.

lognaturel · 2026-06-17T19:55:53Z

+    def __bool__(self):
+        return True
+
+    def __init__(


Consider using @dataclass?

The ItemsetNode is looking a lot like a SurveyElement but I was leaving that til the spec for this feature settles.

lognaturel · 2026-06-17T20:14:29Z

 class ParametersGeo(StrEnum):
    ALLOW_MOCK_ACCURACY = "allow-mock-accuracy"
    INCREMENTAL = "incremental"
+    REFERENCE_GEO = "reference-geo"


I forgot we had all these existing parameters that use dashes because they match the XForm convention. Typically we use underscores for new parameter names but I agree with your instinct to match the others in this context.

I forgot to update the issue -- Yaw has requested we use the full reference-geometry which does fit better with the other parameters as well.

Updated the parameter name in b31e271, and noted the preference for underscores in #592

lognaturel · 2026-06-17T20:16:43Z

    trigger_references: list[tuple[str, int]] = []
-    repeat_names = set()
+    geo_references: list[tuple[str, int]] = []
+    csv_sources: set[str] = set()


It could also be XML or geojson sources -- any named secondary instance -- whether it's internal or external or what the file format is shouldn't matter.

I thought pyxform would have a way to list these but now that I think about it we don't validate the parameter to instance calls or anything like that, do we? Can we rely on the code that outputs all the instance declarations by any chance?

Updated validation and test cases to allow all external instance file types in 7b1799b. Since it became a bit lengthy I put some follow-up questions on scope in this comment.

lognaturel · 2026-06-17T20:17:04Z

+            "[row : {row}] On the 'survey' sheet, the 'parameters' value is invalid. "
+            "For geo questions, the 'reference-geo' parameter must be either: "
+            "a choices list_name, an entities list_name, a reference variable containing "
+            "the name of a repeat, or the name of an attached CSV file."


"an attached CSV file" -> "an attached file" (could be XML or geojson)

Updated the error message along these lines in 7b1799b

lognaturel · 2026-06-17T20:35:08Z

+        | survey |
+        | | type                         | name | label | parameters           |
+        | | select_one_from_file s1.csv  | q1   | Q1    |                      |
+        | | {type}                       | q2   | Q2    | reference-geo=s1.csv |


I think it would be more consistent with other usage to use the instance name without the extension. It feels kind of similar to writing an instance('s1') expression where we don't use the extension.

It's true that there's an extension for select_one_from_file s1.csv but that feels to me like it's because we conflated attaching and querying the file. For reference-geometry the file must already be attached.

Updated the usage pattern and tests accordingly in 7b1799b

lognaturel · 2026-06-17T20:37:23Z

+                    ],
+                )
+
+    def test_with_reference_geo__select_one_external__ok(self):


I would prefer not to support this case. I think it's very unlikely someone would want it and in general we can treat the DB-backed CSV querying features as legacy.

In 7b1799b I removed the special case for this from validate_parameter_reference_geometry, although the tests show that the select_one_external / external_choices approach doesn't work on it's own anyway - like with entities it needs a csv-external to trigger the auto-generated secondary instance, which allows the nodeset to reference a file named itemsets.csv.

Unless a regression has snuck in somewhere, the effect of select_one_external / external_choices seems to be just to emit an input control with an instance() expression in a query attribute - and then presumably Collect does some magic to auto-wire that to itemsets.csv without a secondary instance.

I'm not sure it's OK to block the select_one_external / external_choices usage pattern entirely because a user might have a file called itemsets.csv that they created independently. In other words we could say in an error "you cannot use the secondary instance name itemsets with reference-geometry" or "you cannot (do that) if there is an external_choices sheet", which seems potentially noisy?

- per updated requirements

- previously just allowed csv but the updated requirements are to support any named secondary instance, and to do so using the instance name rather than the file name. - there are also the possibilities of instances generated for pulldata calls and ${last-saved#q} references, but that parsing code and logic is in survey.py which would require a larger refactor so it is excluded from this commit. - test_geo.py: - tightened "extra_q_assertions" to check for the parameter name as an attribute, since that is really the intent of that assertion, and having a count there might conflict with future xform changes.

- moved out mainly to support importing from other modules that xls2json depends on, but it turned out that it was not needed. In particular, as noted in the function body, it supports various legacy scenarios which are not relevant to newer features such as itemset referencing in range or geo parameters.

lindsay-stevens added 6 commits June 16, 2026 16:39

chg: organise choices and range code into validators module

c9fc5c8

chg: organise range and geo code into validators modules

f80d7c4

chg: combine range xpath assertions, add translation test case

aa7ab86

lindsay-stevens force-pushed the pyxform-839 branch from a913442 to 6ed0215 Compare June 17, 2026 14:55

lognaturel reviewed Jun 17, 2026

View reviewed changes

lindsay-stevens added 3 commits June 18, 2026 14:01

chg: rename reference-geo parameter to reference-geometry

b31e271

- per updated requirements

This was referenced Jun 18, 2026

Create a framework for adding parameters #592

Open

Add support for reference geometry for geo questions #839

Open

Conversation

lindsay-stevens commented Jun 17, 2026

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

Uh oh!

lognaturel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lognaturel left a comment •

edited

Loading