Flow interoperability, Frictionless Data open source projects

isaacholeman · January 11, 2021, 7:34pm

@marc shared this Flow Interop RFA from Digital Square, and it was interesting to read about a couple of the open source projects referenced at the related flowinterop.org site. According to that site:

The Flow Content standard supports the sharing of communication “Flows”, including content and processing logic. The standard supports common functionality that can be implemented by all Flow-based tools for authoring and running flows across diverse channels: voice calls, text channels, social networks with rich multimedia, and offline-first mobile apps. The standard also allows systems to add their own custom functionality on top of general, interoperable building blocks… also flexibly handle established legacy data collection applications, such as XForms / Open Data Kit questionnaires. They are sector-agnostic and work in conjunction with, rather than as a replacement for, sector-specific data standards such as FHIR in digital health, SCORM in remote education, etc.

Some of this project is about standards for making it easier to build and share flow definitions from one software application to another (e.g. Viamo to RapidPro). Another part of it that I find even more interesting is the idea that these standards would make it easier to share data generated through specific flow interactions. They’re working with the Frictionless Data spec and some other tooling to make it easier to analyze data generated from flows.

Broadly, this seems relevant to systems like the 2 Way Texting and covid education systems that use CHT-Core for apps that health workers interact with in a web browser or smartphone, and RapidPro to support messaging with individual patients or health workers. The latest RapidPro-CHT integration docs can be found here. Accessing and analyzing data from RapidPro+CHT integrated systems is a challenge that has come up in this work. It seems like the CHT community is looking for better technical approaches on this front and the flow interop project might help solve these problems.

I’ll be following the flow interop project to see what we can learn from it. I’d be curious to hear if others in the CHT community are also following this work. If so, what do you think about it’s relevance to the CHT community? How does the level of priority for this kind of standards/interoperability effort compare to other product priorities that live CHT deployments are facing?

isaacholeman · January 12, 2021, 3:51pm

@lucas @erika as I read more about the Frictionless Data project, I also came across the goodtables.io “data validation as a service” open source project. I thought it might be interesting for you and collaborators from DataKind to look into and see if we can learn from or use it in the context of the Data Confidence initiative.

lucas · January 12, 2021, 7:16pm

Thanks for flagging this, goodtables.io seems really cool for validating data files, it could be worth looking at the sort of data quality checks they support for ideas for our data confidence initiative. The python api could definitely be useful also to validate data as we query them for miscellaneous analyses

erika · January 16, 2021, 5:36pm

We used goodtables when building the matching tool at DSaPP. It’s designed primarily for testing data at ingestion – the data have to be loaded into Python and available in RAM. It would be a great tool for the purpose Lucas suggests (testing data being ingested into modeling pipelines, etc.) and also for testing data being uploaded into a repository before attempting to write it to a database table. For testing data resting in or moving through the database, it’s probably quite a bit slower than using tools like DBT that can interact with the data in the database without extracting them.

isaacholeman · January 19, 2021, 7:09pm

That totally makes sense, it’s good to learn something new every day!