On ethics using analytics

on captainepoch's log

DISCLAIMER: How analytics can be used for good is based on my personal observations in the real world and in the projects I worked where I had to put them into the software.

If you think this can be improved or want to start a conversation about this, send an email to my public inbox, you have the link down bellow!

The first thing I did when I started working as a developer was to fix some analytics the project had badly put (meaning tracking was not accurate). I fixed them, and then I started doing more interesting stuff.

Later on I had to go over analytics in another project, and then in another, and then help a couple of workmates adding them in a project I did from scratch. You can guess what we were using… Of course, the mighty Google’s Firebase.

Then I looked into Free Software/Open Source projects to see if they were using analytics, and no, they were not. But a few of them use ACRA for having a live error reporting system inside the application (e.g. F-Droid, by email). Most of those projects rely in users reporting issues, but normally the description of those issues are so vague developers do not know what to do.

Having analytics in a project

I do not think that having analytics in a project is a bad thing. You cannot always rely in users to get reports of bugs or unexpected crashes with a full report on how to reproduce the bug. Of course, there are a lot of users willingly to do so, but it is not the majority.

The problem is how the data is recollected, used and not deleted once it fulfilled its purpose. Most of the companies do not care about this, the data is there and it is not normally deleted after using it.

For example, Firebase is an excelent tool. It provides a really insight when you need to know how an user uses your application. But it is made by Google, with all of the implications it has.

Knowing what platforms really do

We do not really know what all the companies that provide analytics do with our data. You can read a ton of Privacy Policy and Terms of Use links, but in the end, we do not really know.

The only way to know how the data is stored, used and deleted is if you self-host your own instance of this kind of software.

Privacy first

You should have in mind that privacy should come first in anything related to this topic. I cover a couple of points in the next section related to privacy for the users.

Analytics used for good

For development purposes analytics can be really insightful. You can then release the production version of your software without them, or opted-out by default (if you can pay a server and scale the backend to suit your needs to allow opting in).

There are a few points that I think it could make analytics more ethical than they are perceived right now:

  • Opt-in by default.

    Let the user choose what to do in that matter. Or remove analytics from the project.

    Or, if you can, make two versions: one with analytics (opted-out by default) and another without anything related to analytics at all.

  • Randomize UUIDs

    When you are implementing analytics, one of the things you have to do is to asign an UUID to an user. It could be random or you can make it manually with your own parameters. In each version of your software, you should randomize it, so the same user does not have the same UUID.

  • Use a backend that is Free Software/Open Source software.

    There are great software that can be user for this purpose. You have to self-host it or you might want to look at price plans instead:

    • Matomo: it is a web application that it is mostly used for websites, but it can be used in a lot of programming languages. Its main use is being a substitute of Firebase Analytics, but it can also catch exceptions in Java software (for example, it can be used for Android like Firebase Analytics and Firebase Crashlytics).

    • ACRA: it is an error reporting tool for Android. It catches exceptions that are not handled by a try-catch block and allows you to report them by email, sending a post to a website, and a couple of options more.

    • Sentry: it is a realtime monitoring software for your applications. You can see errors and where they are produced, so you can fix them easily. It is mandatory to point out that Sentry is not Open Source software, but the code is open in its GitHub repository.

  • Specify how you use the data recollected.

    Elaborate a document where you put your intentions with the data, how you recollect it, what you recollect, and when you delete the old data you do not need anymore.

  • Always delete old data

    Anything that can identify an user must be removed when it is not useful anymore or in a closed timespan, e.g. 90 days.

  • Put the source code of the project in a public repository

    Whether you prefer to use SourceHut or GitHub, put the code to the public eye. Be clear what classes/methods/functions/however-is-called-in-your-preferred-language are used for analytics purposes.

    In a way, having your source code in the open also allow users to see any changes you make in that part of the code.

Conclusion

This is something that I could get knowledge after working as an Android developer for almost two years and see how analytics are implemented in the projects I worked on.

I really think, if correctly used, analytics can be a good thing, but we have to be very careful about how we use them and not get addicted to them. Software also needs careful implementations and testing.


UPDATE: as pointed out by the user icy in this comment, Sentry is not really Open Source software at the time this post has been written. I updated the reference accordingly. Thank you very much!