Skip to content

839: Add support for reference geometry for geo questions#847

Open
lindsay-stevens wants to merge 9 commits into
XLSForm:masterfrom
lindsay-stevens:pyxform-839
Open

839: Add support for reference geometry for geo questions#847
lindsay-stevens wants to merge 9 commits into
XLSForm:masterfrom
lindsay-stevens:pyxform-839

Conversation

@lindsay-stevens

Copy link
Copy Markdown
Contributor

Closes #839

Why is this the best possible solution? Were any other approaches considered?

  • Initial pass for feedback
  • Includes refactoring to clarify existing nodeset generation

What are the regression risks?

The refactoring may have missed some usage pattern that is not currently in the test suite.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Spec updates TBC getodk/xforms-spec#343 but outline in linked forum post.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run python -m unittest and verified all tests pass
  • run ruff format pyxform tests and ruff check pyxform tests to lint code
  • verified that any code or assets from external sources are properly credited in comments

- range.py:
  - pv.validate doesn't change anything so remove assignment.
  - question.py already processes all attributes in control dict so
    put the parameters for that location in there. Still useful to have
    the parsed parameters available separately though.
- parameters.py
  - change signature since validator doesn't change input.
- question.py
  - refactor MultipleChoiceQuestion.build_xml into steps that correspond
    to the supported options, in a way that can be selectively reused
    depending on what the itemset use case supports.
  - simplify RangeQuestion to:
    - not re-apply parameter attributes already set via control dict.
    - use the itemset helper to try only internal choices since that
      seemed to be the intent of the previous code.
    - remove bind type check since this is set internally, or
      alternatively if it is user-modifiable then it should be checked
      earlier and/or have tests added.
- builder.py:
  - use the presence of just "itemset" as a way to trigger pulling in
    the choices info, rather than needing to attach choices as well
    which are ignored in that case anyway.
- range.py:
  - use the above change to simplify the feature signal for the itemset.
- question.py
  - use the pattern in _build_itemset_instance a fall-through default
    rather than trying to proactively select whether it's needed.
  - fold the itemset node functionality into a question subclass since
    that seems a bit neater than using a helper class, and also allows
    sharing the common __init__ code.
- for geopoint, geoshape, and geotrace, add a parameter 'reference-geo'
  which accepts a choices list_name, an entities list_name, a reference
  variable containing the name of a repeat, or the name of an attached
  CSV file. The parameter value is used to build an itemset node that
  references source data instance, optionally with a choice_filter.

@lognaturel lognaturel left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much! Folks have been trying out the beta implementation in Collect and are loving this functionality.

I know this isn't your favorite spec but I like how you've introduced a layer in the question hierarchy for itemset-capable question types, it makes a lot of sense and led to some useful cleanup.

Details below, the main things are:

  • parameter name should be reference-geometry
  • any external secondary instance should be supported, regardless of the filename extension
  • let's drop support for the special itemsets.csv if it's practical

Comment thread pyxform/validators/pyxform/choices.py Outdated
return choices


def add_choices_info_to_question(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far everything in validators feels much more directly validation-related. Not a big deal but wanted to mention that this file is starting to feel like it could be a choices feature slice rather than just a validator.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved back to xls2json in 3735635 for reasons explained there.

Separately to that yes the sub-package name "validators" is maybe not the most accurate anymore since some of the modules in there deal with processing as well as validation. But at least it breaks down the xls2json logic into digestible pieces, and in the longer term hopefully aligns to doing something like how all the entities stuff is in one package - maybe each question type gets a module (and finally spell the end of question_type_dict 😱), etc.

Comment thread pyxform/question.py
def __bool__(self):
return True

def __init__(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using @dataclass?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ItemsetNode is looking a lot like a SurveyElement but I was leaving that til the spec for this feature settles.

Comment thread pyxform/constants.py Outdated
class ParametersGeo(StrEnum):
ALLOW_MOCK_ACCURACY = "allow-mock-accuracy"
INCREMENTAL = "incremental"
REFERENCE_GEO = "reference-geo"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot we had all these existing parameters that use dashes because they match the XForm convention. Typically we use underscores for new parameter names but I agree with your instinct to match the others in this context.

I forgot to update the issue -- Yaw has requested we use the full reference-geometry which does fit better with the other parameters as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the parameter name in b31e271, and noted the preference for underscores in #592

Comment thread pyxform/xls2json.py Outdated
trigger_references: list[tuple[str, int]] = []
repeat_names = set()
geo_references: list[tuple[str, int]] = []
csv_sources: set[str] = set()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be XML or geojson sources -- any named secondary instance -- whether it's internal or external or what the file format is shouldn't matter.

I thought pyxform would have a way to list these but now that I think about it we don't validate the parameter to instance calls or anything like that, do we? Can we rely on the code that outputs all the instance declarations by any chance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated validation and test cases to allow all external instance file types in 7b1799b. Since it became a bit lengthy I put some follow-up questions on scope in this comment.

Comment thread pyxform/errors.py Outdated
"[row : {row}] On the 'survey' sheet, the 'parameters' value is invalid. "
"For geo questions, the 'reference-geo' parameter must be either: "
"a choices list_name, an entities list_name, a reference variable containing "
"the name of a repeat, or the name of an attached CSV file."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"an attached CSV file" -> "an attached file" (could be XML or geojson)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the error message along these lines in 7b1799b

Comment thread tests/test_geo.py Outdated
| survey |
| | type | name | label | parameters |
| | select_one_from_file s1.csv | q1 | Q1 | |
| | {type} | q2 | Q2 | reference-geo=s1.csv |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more consistent with other usage to use the instance name without the extension. It feels kind of similar to writing an instance('s1') expression where we don't use the extension.

It's true that there's an extension for select_one_from_file s1.csv but that feels to me like it's because we conflated attaching and querying the file. For reference-geometry the file must already be attached.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the usage pattern and tests accordingly in 7b1799b

Comment thread tests/test_geo.py Outdated
],
)

def test_with_reference_geo__select_one_external__ok(self):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to support this case. I think it's very unlikely someone would want it and in general we can treat the DB-backed CSV querying features as legacy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 7b1799b I removed the special case for this from validate_parameter_reference_geometry, although the tests show that the select_one_external / external_choices approach doesn't work on it's own anyway - like with entities it needs a csv-external to trigger the auto-generated secondary instance, which allows the nodeset to reference a file named itemsets.csv.

Unless a regression has snuck in somewhere, the effect of select_one_external / external_choices seems to be just to emit an input control with an instance() expression in a query attribute - and then presumably Collect does some magic to auto-wire that to itemsets.csv without a secondary instance.

I'm not sure it's OK to block the select_one_external / external_choices usage pattern entirely because a user might have a file called itemsets.csv that they created independently. In other words we could say in an error "you cannot use the secondary instance name itemsets with reference-geometry" or "you cannot (do that) if there is an external_choices sheet", which seems potentially noisy?

- previously just allowed csv but the updated requirements are to
  support any named secondary instance, and to do so using the instance
  name rather than the file name.
  - there are also the possibilities of instances generated for
    pulldata calls and ${last-saved#q} references, but that parsing code
    and logic is in survey.py which would require a larger refactor so
    it is excluded from this commit.
- test_geo.py:
  - tightened "extra_q_assertions" to check for the parameter name as an
    attribute, since that is really the intent of that assertion, and
    having a count there might conflict with future xform changes.
- moved out mainly to support importing from other modules that xls2json
  depends on, but it turned out that it was not needed. In particular,
  as noted in the function body, it supports various legacy scenarios
  which are not relevant to newer features such as itemset referencing
  in range or geo parameters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for reference geometry for geo questions

2 participants