At my last few companies, we've ran into problems with engineers changing something on the site and causing inaccurate data in our downstream analytics tools. This wasn't an obvious problem. We have a user id and an account id internally. The account id is what we track on Segment (and we connect that to SFDC for attribution). I can't really blame the engineer, since the naming wasn't clear at all and this seems hard to automatically verify. This mistake cost a lot of time and confusion (and lost $$$ since our attribution models depend on this data).
What are the best ways to make sure this data is sent property and matches across all places are data are stored?
For just this example, the user id and account id both look similar (they are numbers), and we require that it's set using Segment Protocols. However, this isn't enough for me to really feel safe, since we occasionally need to move tools outside of Segment if the API doesn't support it.
What I really wish was that I could easily know if ids are the correct format (and match SFDC/hull.io/etc), that we aren't stopping sending data to certain places, and get alerted if things break. I can recover from a day of missing data, but usually these problems take weeks or even months to notice and then resolve.
Am I crazy? Do other people run into these problems? How do you solve this? What's the best way to make sure that my data is consistent across platforms and being sent as expected?