Sometimes you have to write code to do some ad-hoc things in order to make programs run in a variety of environments, but as time goes by, it could result in a tragicomical situation where ad-hoc features are built on other ad-hoc features. It is easy to identify that kind of problem, but in many cases, nobody can fix it. I wanted to share my own story here because I had such an experience.
I'm writing a linker called lld as part of my work. Linkers are programs that concatenate compiler-generated binary files to create final executables or DLLs. I'd think that many people even don't know about its existence, but at the end of every build, linkers are always run to generate final outputs. lld is becoming popular mainly because it is a few times faster than other linkers, which makes overall build times shorter. Some operating systems, including FreeBSD, are trying to switch to lld. Some large-scale programs such as Chromium or Firefox are trying to switch to lld individually, too.
For individual programs, compatibility issues are not that problematic because we can fix either the linker or the target program. More difficult compatibility issues are likely to occur when you are adopting lld as part of the standard build system of an operating system which includes numerous, wide-ranged programs. The issue described here has occurred in FreeBSD.
If you've ever built a program on Unix, I think you've had experience running the "./configure" script. Since Unix has various flavors such as Linux, FreeBSD, macOS, etc., many programs come with a script to gather information of the system environment to create a build file, so that the successive "make" command can build the program accordingly. The script, for example, checks whether or not the "strnlen" function is available in a build environment by creating a source file containing that function call and trying to compile it.
The problem that the people working on FreeBSD found is that if they tried to run configure in an environment in which lld is installed as the standard linker, lld would be determined by configure as if it were an ancient Unix linker like 30 years ago. Further investigation revealed that the configure script runs the linker in the background with the --help option, and determines it as a modern linker only when the displayed help message contains "GNU" or "with BFD". What this means is that only GNU linkers are considered modern in the environment, and all the other linkers are considered terribly outdated.
This problem is a bit troubling. GNU linkers are fine because they contain something like "GNU ld 2.23" in their help message, but since we have nothing to do with the GNU project, our help message naturally does not contain "GNU".
There were two possible solutions. One was to fix the configure script. However, the configure scripts are generated by a set of tools named autotools, and the last release of autotools was a few years back, so even if we fixed the autotools, it would be hard to expect that an improved version would be released soon and become widely used in the near future. Also, since we cannot update the existing configure scripts that are already generated and distributed as part of other programs, even if we improve autoconf, it would take many years until the problem would be resolved. Therefore, even though this may have been the "right" solution, it was not realistic.
The solution we ended up choosing was to add the string "compatible with GNU linkers" to our linker's help message. This string is not too odd for humans to understand, and since it contains the string "GNU", it is also friendly to configure. It is not a beautiful solution. It supports the erroneous assumption rather than correcting it. But it was practical.
When I was fixing the problem, I was thinking about the User-Agent string of the web browser. HTTP requests sent by browsers contain a browser identification string in the User-Agent field. It has been repeated in history that every time some browser made improvements and started using some new name in User-Agent, other browsers would catch up and add the same string to their User-Agent strings. As a result, all browsers now identify themselves as "Mozilla/5.0". Because a myriad of websites were thinking that non-Mozilla/5.0 requests were coming from ancient browsers like the 90's and sending back very shabby pages, all browsers had no choice other than pretending to be Mozilla/5.0.
Both the problem we faced and the solution we adopted were the same as the User-Agent problem of the web browser. If you write a program to deal with other programs that have already spread around the world, these types of compatibility issues tend to arise, and perhaps nobody is able to solve them cleanly. As a result, the browser still includes "Mozilla/5.0" in every request which is almost pointless now, and our linker prints out a slightly strange string in the help message. This sad situation is simultaneously a bit funny to me. I think this kind of workaround is part of the reality that is inevitable in real software engineering.
Rui Ueyama — December 2017