Tutorial :Why do languages like Java use hierarchical package names, while Python does not?



Question:

I haven't done enterprise work in Java, but I often see the reverse-domain-name package naming convention. For example, for a Stack Overflow Java package you'd put your code underneath package com.stackoverflow.

I just ran across a Python package that uses the Java-like convention, and I wasn't sure what the arguments for and against it are, or whether they apply to Python in the same way as Java. What are the reasons you'd prefer one over the other? Do those reasons apply across the languages?


Solution:1

If Guido himself announced that the reverse domain convention ought to be followed, it wouldn't be adopted, unless there were significant changes to the implementation of import in python.

Consider: python searches an import path at run-time with a fail-fast algorithm; java searches a path with an exhaustive algorithm both at compile-time and run-time. Go ahead, try arranging your directories like this:

folder_on_path/      com/          __init__.py          domain1/              module.py              __init__.py      other_folder_on_path/      com/          __init__.py          domain2/              module.py              __init__.py  

Then try:

from com.domain1 import module  from com.domain2 import module  

Exactly one of those statements will succeed. Why? Because either folder_on_path or other_folder_on_path comes higher on the search path. When python sees from com. it grabs the first com package it can. If that happens to contain domain1, then the first import will succeed; if not, it throws an ImportError and gives up. Why? Because import must occur at runtime, potentially at any point in the flow of the code (although most often at the beginning). Nobody wants an exhaustive tree-walk at that point to verify that there's no possible match. It assumes that if it finds a package named com, it is the com package.

Moreover, python doesn't distinguish between the following statements:

from com import domain1  from com.domain1 import module  from com.domain1.module import variable  

The concept of verifying that com is the com is going to be different in each case. In java, you really only have to deal with the second case, and that can be accomplished by walking through the file system (I guess an advantage of naming classes and files the same). In python, if you tried to accomplish import with nothing but file system assistance, the first case could (almost) be transparently the same (init.py wouldn't run), the second case could be accomplished, but you would lose the initial running of module.py, but the third case is entirely unattainable. The code has to execute for variable to be available. And this is another main point: import does more than resolve namespaces, it executes code.

Now, you could get away with this if every python package ever distributed required an installation process that searched for the com folder, and then the domain, and so on and so on, but this makes packaging considerably harder, destroys drag-and-drop capability, and makes packaging and all-out nuisance.


Solution:2

Python doesn't do this because you end up with a problem -- who owns the "com" package that almost everything else is a subpackage of? Python's method of establishing package heirarchy (through the filesystem heirarchy) does not play well with this convention at all. Java can get away with it because package heirarchy is defined by the structure of the string literals fed to the 'package' statement, so there doesn't need to be an explicit "com" package anywhere.

There's also the question of what to do if you want to publicly release a package but don't own a domain name that's suitable for bodging into the package name, or if you end up changing (or losing) your domain name for some reason. (Do later updates need a different package name? How do you know that com.nifty_consultants.nifty_utility is a newer version of com.joe_blow_software.nifty_utility? Or, conversely, how do you know that it's not a newer version? If you miss your domain renewal and the name gets snatched by a domain camper, and someone else buys the name from them, and they want to publicly release software packages, should they then use the same name that you had already used?)

Domain names and software package names, it seems to me, address two entirely different problems, and have entirely different complicating factors. I personally dislike Java's convention because (IMHO) it violates separation of concerns. Avoiding namespace collisions is nice and all, but I hate the thought of my software's namespace being defined by (and dependent on) the marketing department's interaction with some third-party bureaucracy.

To clarify my point further, in response to JeeBee's comment: In Python, a package is a directory containing an __init__.py file (and presumably one or more module files). A package hierarchy requires that each higher-level package be a full, legitimate package. If two packages (especially from different vendors, but even not-directly-related packages from the same vendor) share a top-level package name, whether that name is 'com' or 'web' or 'utils' or whatever, each one must provide an __init__.py for that top-level package. We must also assume that these packages are likely to be installed in the same place in the directory tree, i.e. site-packages/[pkg]/[subpkg]. The filesystem thus enforces that there is only one [pkg]/__init__.py -- so which one wins? There is not (and cannot be) a general-case correct answer to that question. Nor can we reasonably merge the two files together. Since we can't know what another package might need to do in that __init__.py, subpackages sharing a top-level package cannot be assumed to work when both are installed unless they are specifically written to be compatible with each other (at least in this one file). This would be a distribution nightmare and would pretty much invalidate the entire point of nesting packages. This is not specific to reverse-domain-name package hierarchies, though they provide the most obvious bad example and (IMO) are philosophically questionable -- it's really the practical issue of shared top-level packages, rather than the philosophical questions, that are my main concern here.

(On the other hand, a single large package using subpackages to better organize itself is a great idea, since those subpackages are specifically designed to work and live together. This is not so common in Python, though, because a single conceptual package doesn't tend to require a large enough number of files to need the extra layer of organization.)


Solution:3

"What are the reasons you'd prefer one over the other?"

Python's style is simpler. Java's style allows same-name products from different organizations.

"Do those reasons apply across the languages?"

Yes. You can easily have top level Python packages named "com", "org", "mil", "net", "edu" and "gov" and put your packages as subpackages in these.

Edit. You have some complexity when you do this, because everyone has to cooperate and not pollute these top-level packages with their own cruft.

Python didn't start doing that because the namespace collision -- as a practical matter -- turn out to be rather rare.

Java started out doing that because the folks who developed Java foresaw lots of people cluelessly choosing the same name for their packages and needing to sort out the collisions and ownership issues.

Java folks didn't foresee the Open Source community picking weird off-the-wall unique names to avoid name collisions. Everyone who writes an xml parser, interestingly, doesn't call it "parser". They seem to call it "Saxon" or "Xalan" or something completely strange.


Solution:4

It's a great way of preventing name collisions, and takes full advantage of the existing domain name system, so it requires no additional bureaucracy or registration. It is simple and brilliant.

By reversing the domain name it also gives it a hierarchical structure, which is handy. So you can have sub-packages on the end.

The only downside is the length of the name, but to me that is not a downside at all. I think it is a pretty good idea for any language that would support it.

Why don't JavaScript libraries do it, for example? Their global namespace is a big problem, yet Javascript libraries use simple global identifiers like '$' which clash with other Javascript libraries.


Solution:5

Somewhere on Joel on Software, Joel has a comparison between two methods of growing a company: the Ben & Jerry's method, which starts small and grows organically, and the Amazon method of raising a whole lot of money and staking very wide claims from the start.

When Sun introduced Java, it was with fanfare and hype. Java was supposed to take over. Most future relevant software development would be on web-delivered Java applets. There would be brass bands and even ponies. In this context, it was sensible to establish, up front, a naming convention that was internet-based, corporation-friendly, and on a planetary scale.

OK, it didn't turn out quite as Sun hoped, but they planned as if they would succeed. Personally, I despise projects that can be undermined by success.

Python was a project by Guido van Rossum initially, and it was quite some time before the community was confident it would survive if van Rossum was hit by a bus. There were, as far as I know, no initial plans to take over the world, and it was not intended as a web applet language.

Therefore, during the formative stages of the language, there was no reason to want a vast hierarchy for a naming scheme. In that more informal community, one selected a more or less whimsical project name and checked to see if somebody else was already using it. (Naming a computer language after a British comedy show might be considered whimsical just to start.) There was no perceived need to cater to a big but unimaginative and clumsy naming scheme.


Solution:6

The idea is to keep name spaces conflict free. Instead of unreadable UUIDs or the like, the reverse domain name is unlikely to get in someone else's way. Very simple, but pragmatic. Moreover when using 3rd party libs it might give you a clue as to where they came from (for updates, support etc.)


Solution:7

Python does have it, it's just a much flatter hierarchy. Look at os.path, for example. And there's nothing stopping library designers making much deeper ones, e.g. Django.

Fundamentally, I think Python is designed on the idea that you want to get stuff done without having to specify or type too much in advance. This greatly helps with scripting and command-line use. There are several parts of 'The Zen of Python' that address the rationale for this:

  • Simple is better than complex.
  • Flat is better than nested.
  • Beautiful is better than ugly. (The Java system looks ugly to me.)

On the other hand, there's:

  • Namespaces are one honking great idea -- let's do more of those!


Solution:8

Java is able to do it like this, since it is a recommended Java standard practice, and pretty much universally accepted by the Java community. Python dos not have this convention.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »