Just having some fun today. The code runs actually pretty fast: It reads over 1.5 million lines from 1.1GB of logs and sorts them by time-taken in 5 seconds. I bet there’s room for improvement but it was enough for my needs.
(ns iis_log
(:require clojure.contrib.duck-streams clojure.contrib.io clojure.string clojure.test))
(def #^{:doc "Where the log files reside."} logs-directory "D:/userdata/jnetto/Desktop/zina_inside/W3SVC93479707")
(def #^{:doc "How many lines to show from logs."} top-to-show 25)
;; IIS log lines starting with # are comments
(defn is-not-comment?
[^String s]
(not= (.substring s 0 1) "#"))
(defn remove-comments
[file-contents]
(filter is-not-comment? file-contents))
;; Values on logs are separated by a space. Therefore, "-" is shown as a value when there is nothing there, instead of a space,
;; which would mess up the separator.
(defn nil-if-hyphen
[s]
(if (not= s "-") s))
(defstruct token :uri :time-taken :username)
(defn relevant-line-info
"Returns a map with the path + querystring, time-taken and username from a log line."
[line]
(let [line-tokens (clojure.string/split line #" ")
cs-uri-stem (nth line-tokens 6)
cs-uri-query (nth line-tokens 7)
cs-username (nth line-tokens 9)]
(struct-map token :uri (str cs-uri-stem (nil-if-hyphen cs-uri-query))
:time-taken (read-string (nth line-tokens 20))
:username (nil-if-hyphen cs-username))))
(def accesses (ref []))
(doseq [child (.listFiles (clojure.contrib.io/file logs-directory))]
(println "Processing" child) ;; Just for debugging purposes, to show something is happening
(let [file-contents (clojure.contrib.duck-streams/read-lines child)
file-without-comments (remove-comments file-contents)
tokens (map relevant-line-info file-without-comments)]
(dosync (alter accesses concat (take top-to-show tokens))))) ;; We 'take 'top-to-show here as an optimization. If we will print
;; only top-to-show, why take more from each file?
;; Until here it's actually pretty fast. 'sort-by and 'reverse are actually the slowest parts of this script.
(doseq [line (take top-to-show (reverse (sort-by :time-taken @accesses)))]
(println line))
Comments
2 responses to “Clojure script to parse IIS logs from a directory and show slowest pages”
You should be able to write
tokens (map #(relevant-line-info %) file-without-comments)astokens (map relevant-line-info file-without-comments)since it’s a one argument function.
Well observed Matt. The code grew organically and I left the redundant #() there. Thanks for noticing, it is updated now.