839: Add support for reference geometry for geo questions#847
839: Add support for reference geometry for geo questions#847lindsay-stevens wants to merge 9 commits into
Conversation
- range.py:
- pv.validate doesn't change anything so remove assignment.
- question.py already processes all attributes in control dict so
put the parameters for that location in there. Still useful to have
the parsed parameters available separately though.
- parameters.py
- change signature since validator doesn't change input.
- question.py
- refactor MultipleChoiceQuestion.build_xml into steps that correspond
to the supported options, in a way that can be selectively reused
depending on what the itemset use case supports.
- simplify RangeQuestion to:
- not re-apply parameter attributes already set via control dict.
- use the itemset helper to try only internal choices since that
seemed to be the intent of the previous code.
- remove bind type check since this is set internally, or
alternatively if it is user-modifiable then it should be checked
earlier and/or have tests added.
- builder.py:
- use the presence of just "itemset" as a way to trigger pulling in
the choices info, rather than needing to attach choices as well
which are ignored in that case anyway.
- range.py:
- use the above change to simplify the feature signal for the itemset.
- question.py
- use the pattern in _build_itemset_instance a fall-through default
rather than trying to proactively select whether it's needed.
- fold the itemset node functionality into a question subclass since
that seems a bit neater than using a helper class, and also allows
sharing the common __init__ code.
- for geopoint, geoshape, and geotrace, add a parameter 'reference-geo' which accepts a choices list_name, an entities list_name, a reference variable containing the name of a repeat, or the name of an attached CSV file. The parameter value is used to build an itemset node that references source data instance, optionally with a choice_filter.
a913442 to
6ed0215
Compare
There was a problem hiding this comment.
Thanks so much! Folks have been trying out the beta implementation in Collect and are loving this functionality.
I know this isn't your favorite spec but I like how you've introduced a layer in the question hierarchy for itemset-capable question types, it makes a lot of sense and led to some useful cleanup.
Details below, the main things are:
- parameter name should be
reference-geometry - any external secondary instance should be supported, regardless of the filename extension
- let's drop support for the special itemsets.csv if it's practical
| return choices | ||
|
|
||
|
|
||
| def add_choices_info_to_question( |
There was a problem hiding this comment.
So far everything in validators feels much more directly validation-related. Not a big deal but wanted to mention that this file is starting to feel like it could be a choices feature slice rather than just a validator.
There was a problem hiding this comment.
Moved back to xls2json in 3735635 for reasons explained there.
Separately to that yes the sub-package name "validators" is maybe not the most accurate anymore since some of the modules in there deal with processing as well as validation. But at least it breaks down the xls2json logic into digestible pieces, and in the longer term hopefully aligns to doing something like how all the entities stuff is in one package - maybe each question type gets a module (and finally spell the end of question_type_dict 😱), etc.
| def __bool__(self): | ||
| return True | ||
|
|
||
| def __init__( |
There was a problem hiding this comment.
Consider using @dataclass?
There was a problem hiding this comment.
The ItemsetNode is looking a lot like a SurveyElement but I was leaving that til the spec for this feature settles.
| class ParametersGeo(StrEnum): | ||
| ALLOW_MOCK_ACCURACY = "allow-mock-accuracy" | ||
| INCREMENTAL = "incremental" | ||
| REFERENCE_GEO = "reference-geo" |
There was a problem hiding this comment.
I forgot we had all these existing parameters that use dashes because they match the XForm convention. Typically we use underscores for new parameter names but I agree with your instinct to match the others in this context.
I forgot to update the issue -- Yaw has requested we use the full reference-geometry which does fit better with the other parameters as well.
| trigger_references: list[tuple[str, int]] = [] | ||
| repeat_names = set() | ||
| geo_references: list[tuple[str, int]] = [] | ||
| csv_sources: set[str] = set() |
There was a problem hiding this comment.
It could also be XML or geojson sources -- any named secondary instance -- whether it's internal or external or what the file format is shouldn't matter.
I thought pyxform would have a way to list these but now that I think about it we don't validate the parameter to instance calls or anything like that, do we? Can we rely on the code that outputs all the instance declarations by any chance?
There was a problem hiding this comment.
Updated validation and test cases to allow all external instance file types in 7b1799b. Since it became a bit lengthy I put some follow-up questions on scope in this comment.
| "[row : {row}] On the 'survey' sheet, the 'parameters' value is invalid. " | ||
| "For geo questions, the 'reference-geo' parameter must be either: " | ||
| "a choices list_name, an entities list_name, a reference variable containing " | ||
| "the name of a repeat, or the name of an attached CSV file." |
There was a problem hiding this comment.
"an attached CSV file" -> "an attached file" (could be XML or geojson)
There was a problem hiding this comment.
Updated the error message along these lines in 7b1799b
| | survey | | ||
| | | type | name | label | parameters | | ||
| | | select_one_from_file s1.csv | q1 | Q1 | | | ||
| | | {type} | q2 | Q2 | reference-geo=s1.csv | |
There was a problem hiding this comment.
I think it would be more consistent with other usage to use the instance name without the extension. It feels kind of similar to writing an instance('s1') expression where we don't use the extension.
It's true that there's an extension for select_one_from_file s1.csv but that feels to me like it's because we conflated attaching and querying the file. For reference-geometry the file must already be attached.
There was a problem hiding this comment.
Updated the usage pattern and tests accordingly in 7b1799b
| ], | ||
| ) | ||
|
|
||
| def test_with_reference_geo__select_one_external__ok(self): |
There was a problem hiding this comment.
I would prefer not to support this case. I think it's very unlikely someone would want it and in general we can treat the DB-backed CSV querying features as legacy.
There was a problem hiding this comment.
In 7b1799b I removed the special case for this from validate_parameter_reference_geometry, although the tests show that the select_one_external / external_choices approach doesn't work on it's own anyway - like with entities it needs a csv-external to trigger the auto-generated secondary instance, which allows the nodeset to reference a file named itemsets.csv.
Unless a regression has snuck in somewhere, the effect of select_one_external / external_choices seems to be just to emit an input control with an instance() expression in a query attribute - and then presumably Collect does some magic to auto-wire that to itemsets.csv without a secondary instance.
I'm not sure it's OK to block the select_one_external / external_choices usage pattern entirely because a user might have a file called itemsets.csv that they created independently. In other words we could say in an error "you cannot use the secondary instance name itemsets with reference-geometry" or "you cannot (do that) if there is an external_choices sheet", which seems potentially noisy?
- per updated requirements
- previously just allowed csv but the updated requirements are to
support any named secondary instance, and to do so using the instance
name rather than the file name.
- there are also the possibilities of instances generated for
pulldata calls and ${last-saved#q} references, but that parsing code
and logic is in survey.py which would require a larger refactor so
it is excluded from this commit.
- test_geo.py:
- tightened "extra_q_assertions" to check for the parameter name as an
attribute, since that is really the intent of that assertion, and
having a count there might conflict with future xform changes.
- moved out mainly to support importing from other modules that xls2json depends on, but it turned out that it was not needed. In particular, as noted in the function body, it supports various legacy scenarios which are not relevant to newer features such as itemset referencing in range or geo parameters.
Closes #839
Why is this the best possible solution? Were any other approaches considered?
What are the regression risks?
The refactoring may have missed some usage pattern that is not currently in the test suite.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
Spec updates TBC getodk/xforms-spec#343 but outline in linked forum post.
Before submitting this PR, please make sure you have:
testspython -m unittestand verified all tests passruff format pyxform testsandruff check pyxform teststo lint code