Clojure: Scale those scalars

Now for some more details on fundamental data types. The fourth chapter of "The Joy of Clojure" 2nd ed. deals with some scalar data type topics. I'll cover integer overflow, keywords, symbols, metadata, regular expressions, and others.

Overflow and Promotion

By default, we get Longs:

user> (def regular 23)
user> (class regular)

For large values, the compiler chooses a datatype with more room:

user> (def supersize 90000000000000000000)
user> (class supersize)

Also, a variable gets promoted to a datatype with more expansion space when necessary:

user> (class (+ regular 90000000000000000000))

If, however, one wants a variable to overflow, one can use a set of "unchecked" operations:

user> (unchecked-add 1 2)
user> (unchecked-add Long/MAX_VALUE 1)
user> (unchecked-add Long/MAX_VALUE Long/MAX_VALUE)


As Functions

Keywords can be evaluated as functions. They look up their value in a dictionary. Here's a romantic mapping:

user> (def lovers {:montague "romeo" :capulet "juliet"})

Check which Capulet is a lover:

user> (:capulet lovers)

As Directives

Keywords can also be useful as directives to functions. For example, define a normal range of values plus a special value or edge case. Going on with romance:

user> (defn heart [howmuch]
         (= howmuch :forever)
         (while true (pr "<3 "))
         :else (pr (apply str (repeat howmuch "<3 ")))))
; regular
user> (heart 3)
"<3 <3 <3 "
; endless love
user> (heart :forever)
"<3 ""<3 ""<3 ""<3 ""<3
; Ctrl-C

Keyword Namespaces

You can prefix keywords with a namespace-style notation, but this is just reader sugar, the namespace doesn't even have to exist:

user> (ns foo)
foo> :bar/quux

Still can be useful for documentation:

foo> :bar/keeper
foo> :password/keeper

Symbol resolution

Symbols are distinct objects, even if they have the same name:

user> (identical? 'cat 'cat)

They do have a name which can be retrieved as a string:

user> (name 'cat)

Symbols of the same name have the same value:

user> (= 'cat 'cat)

They're only really identical if they in fact are the same object however:

user> (let [pussy 'cat, felix pussy]
        (identical? pussy felix))

The motivation why two symbols with the same name are not considered identical is metadata.

Metadata and Symbols

Several objects in Clojure can be tagged with metadata. Metadata basically is a map attached to an object.

The func with-meta returns a symbol with attached metadata:

user> (with-meta felix {:title "god emperor"})

And meta extracts metadata:

user> (meta (with-meta felix {:title "god emperor"}))
{:title "god emperor"}

While the idea of being able to associate metadata with regular variables sounds very cool indeed, I kinda wondered at first if it's really worth reserving language constructs for something which can easily be done by setting up the appropriate datastructure in the first place. That is, until I learned that the metadata machinery also is used for conveying type hints to the compiler, eg. (example from the docs):

(defn len [x]
  (.length x))

(defn len2 [^String x]
  (.length x))

user=> (time (reduce + (map len (repeat 1000000 "asdf"))))
"Elapsed time: 3007.198 msecs"
user=> (time (reduce + (map len2 (repeat 1000000 "asdf"))))
"Elapsed time: 308.045 msecs"

This seems certainly worthwhile.

Symbols, namespaces, Lisp-1

Clojure is a Lisp-1, which means there are no special resolution rules for functions. One consequence is that it's one should beware of unintentional shadowing of functions (just like in Python):

An innocent func f:

user> (defn f [] 23)
user> (f)

But inside the scope of let we redefine f and get a totally different result upon calling it:

user> (let [f #()] (f))

Again, this is not too surprising for the Python programmer, since Python treats functions and other values the same way as well.


I have a weak spot for regular expressions. That's maybe just a left-over from my Perl days. Sysadmins need to munge text all the time, and Perl is (or was) the Sysadmins tool of choice, so this made perfect sense. However, I wonder if built-in RE syntax make sense for Clojure though. For one, I wouldn't think of Clojure for Sysadmin tasks. On the other hand, regexes are useful, and having specific syntax for them makes for very concise code:

user> (re-seq #"^(\w\w\w) (\d\d) (\d\d):(\d\d):(\d\d)" "Jan 10 20:11:16")
(["Jan 10 20:11:16" "Jan" "10" "20" "11" "16"])

A long time ago I used to write C as an intern for a megacorps, and my job was to dissect reams and reams of logfiles (without regexes, and any of that fancy stuff). I think the two lines above would probably have saved me several hours of coding.

That's it for today. Next up will be collections -- lists, sets, maps, queues and persistency.