For best viewing results, use font: Inconsolata Medium. Zoom first slide to screen size # Warning ╔══════════════════════════════════════════════════════════════════════════════╗ ║ ║ ║ ║ ║ ║ ║ ║ ║ WARNING WARNING WARNING WARNING WARNING WARNING ║ ║ ║ ║ mmm # ║ ║ m" " mmm mmm # m mmm ║ ║ # mm #" # #" # # m" # " ║ ║ # # #"""" #"""" #"# """m ║ ║ "mmm" "#mm" "#mm" # "m "mmm" ║ ║ ║ ║ # # m ║ ║ mmm # mm mmm mmm mmm# # ║ ║ " # #" # #" # " # #" "# # ║ ║ m"""# # # #"""" m"""# # # " ║ ║ "mm"# # # "#mm" "mm"# "#m## # ║ ║ ║ ║ WARNING WARNING WARNING WARNING WARNING WARNING ║ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║ ║ """ That was the geekiest presentation I have ever seen """ ║ ║ - An unnamed post Doc. from Germany ║ ║ ║ ╚══════════════════════════════════════════════════════════════════════════════╝ # Introduction > Ville Rantanen / Hautaniemi lab < ▐▌ ▌ ▌ ▄▀ ▄▀ ▄▄▄ ▄▄▄▄▄ ▄ ▄ █▐ █ ▐ ▐ ▄▀ ▄ ▀▄ █ ▄█ █ █ █ █ █ ▌█ ▌▐ ▐▀ █ █ ▀▀ █ ▀▀▄ █ ▀▄ ▄▀ █ ▐ ▐▌ ▌ ▀▄ ▀▄ █ █ ▄▄ ▀▀▀▄ █ █ ▀▄▀ █ ▀▄ ▀ ▄▀ ▄▀▀▀ █ ▀▄ ▄▀ ▀▀▀ ▀▀▀▀▀ ▀ * ncsv, or Nice CSV printer * [Project page]( https://bitbucket.org/MoonQ/ncsv ) * Depends only on Python basic modules * Installation ``` hg clone https://bitbucket.org/MoonQ/ncsv ln -s [folder_of_installation]/ncsv.py ~/bin/ncsv ln -s [folder_of_installation]/fast-ncsv ~/bin/fast-ncsv ``` * Debian/Ubuntu release exists: [repository]( http://anduril.org/linux/ ) # Motivation Delimited data is often hard to read * Columns do not match: $! less data.csv $! $! less data.tsv $! * Header forgotten in big files $! less -S GeneExpr_v64.csv $! ( Note to audience, "$" means it's a command you type in terminal ) # Motivation, why command line? Could've opened with gnumeric/soffice! * Sometimes X just doesn't work * 'Big Data' doesn't happen on laptop/PC, must use servers * SSH hops ┌──────┐ ┌────┐ │client│ ┌────┐ │┌────┐ ├┬┬┬┬┬┬┤ ══▶ │ S1 │ ══▶└│ VM │┐ ├┼┼┼┼┼┼┤ └────┘ └────┘│ └┴┴┴┴┴┴┘ └────┘ * Joint home folders + .Xauthority * SSH through Windows / Android / OSX, no X by default * Graphical tools overkill for looking at tabular numerical data * Interoperability with other command line tools * Scripting # NiceCSV ▐▌ ▌ ▌ ▄▀ ▄▀ ▄▄▄ ▄▄▄▄▄ ▄ ▄ █▐ █ ▐ ▐ ▄▀ ▄ ▀▄ █ ▄█ █ █ █ █ █ ▌█ ▌▐ ▐▀ █ █ ▀▀ █ ▀▀▄ █ ▀▄ ▄▀ █ ▐ ▐▌ ▌ ▀▄ ▀▄ █ █ ▄▄ ▀▀▀▄ █ █ ▀▄▀ █ ▀▄ ▀ ▄▀ ▄▀▀▀ █ ▀▄ ▄▀ ▀▀▀ ▀▀▀▀▀ ▀ Two operation modes: * Console output, pretty printing * Curses browser Use case examples: * console pretty printing * combine with command line tools * Reuse output * Modify input * browser mode # NiceCSV::Console Printing Common case: view a short table * Also e-mails: No need to open attachments, just include in text $> ncsv -c data.tsv $> $> ncsv -c -i, data.csv $> String formatting: $> ncsv -c -s %0.3f -a auto data.tsv $> $> ncsv -c -s %s,%s,%d,%d,%0.3f -a auto data.tsv $> Several console formats: $> ncsv --cf m -s %s,%s,%d,%d,%0.3f -a auto data.tsv $> $> ncsv --cf M -s %s,%s,%d,%d,%0.3f -a auto data.tsv $> $> ncsv --cf a -s %s,%s,%d,%d,%0.3f -a auto -W 8 data.tsv $> Also compressed files: $> ncsv -c -z data.tsv.gz $> # NiceCSV::More usecases View output of your process $> cat -n calculate.sh $> $> bash calculate.sh $> $> bash calculate.sh | ncsv --cf m -s %.3f -a auto $> Use case: Print a help page for a BASH program: $> cat -n program.sh $> $> bash program.sh $> # NiceCSV::Use output in other tools Since NiceCSV is a visualization tool, mostly it's output is NOT processed with other tools. However: Pandoc, a multi-tool of structured text formats $! ncsv --cf m -s %.3f data.tsv | pandoc -o data.pdf; xdg-open data.pdf $! $! ncsv --cf m -s %.3f data.tsv | pandoc -s -o data.html; xdg-open data.html $! PNG conversion: $! ncsv --cf e -s %.3f -a auto data.tsv | convert -font "Liberation-Mono-Regular" -pointsize 8 -density 300 -trim TXT:- data.png ; xdg-open data.png $! Statistics: $> ncsv --stat data.tsv | ncsv --cf m -a auto $> # NiceCSV::Browser mode By far the most used mode of NiceCSV See 'h' for help: $! ncsv data.tsv $! For those with color issues: $! xterm -rv -e 'ncsv --dc data.tsv' $! $! ncsv --dc data.tsv $! $! ncsv --nc data.tsv $! # NiceCSV::Feed input from other tools Large file: 200Mb, Cols:593, Rows:21k, 30s to parse $! head -n 10 GeneExpr_v64.csv | cut -f1,15,16,40-42 | ncsv $! $! grep ENSG0000025.12 GeneExpr_v64.csv | ncsv --nh $! $! ( head -n1 GeneExpr_v64.csv ; grep ENSG0000025.12 GeneExpr_v64.csv ) | ncsv $! * Also, see [csvkit](https://csvkit.readthedocs.org) A fork of NiceCSV, fast-ncsv * Implements only the parser * Reads input only a screenfull at a time * Uses `less -S` to handle movement $! fast-ncsv GeneExpr_v64.csv$! Other tools: $! sed s/Rowa/Cntrl/ data.tsv | ncsv $! $! wget -O - http://anduril.org/pub/home/vrantane/demos/double_positive/ch1Counts.txt | ncsv $! # Final remarks * Help: $! ncsv -h | less $! * Generic tool, can be used and misused for various purposes # Thank you! ▄██████████████████████▄ █░░░░░░░░░░░░░░░░░░░░░░█ █░▄██████████████████▄░█ █░█░░░░░░░░░░░░░░░░░░█░█ █░█░░░░░░░░░░░░░░░░░░█░█ █░█░░█░░░░░░░░░░░░█░░█░█ █░█░░░░░▄▄▄▄▄▄▄▄░░░░░█░█ In case you got interested, these slides █░█░░░░░▀▄░░░░▄▀░░░░░█░█ were displayed with markslider a █░█░░░░░░░▀▀▀▀░░░░░░░█░█ markdown slide engine. Also one of my █░█░░░░░░░░░░░░░░░░░░█░█ creations. Available on request. █▌ █░▀██████████████████▀░█ ▐█ █ █░░░░░░░░░░░░░░░░░░░░░░█ █ █ █░████████████░░░░░██░░█ █ █ █░░░░░░░░░░░░░░░░░░░░░░█ █ █ █░░░░░░░░░░░░░░░▄░░░░░░█ █ ▀█▄█░░░▐█▌░░░░░░░▄███▄░██░█▄█▀ ▀█░░█████░░░░░░░░░░░░░░░█▀ █░░░▐█▌░░░░░░░░░▄██▄░░░█ █░░░░░░░░░░░░░░▐████▌░░█ █░▄▄▄░▄▄▄░░░░░░░▀██▀░░░█ █░░░░░░░░░░░░░░░░░░░░░░█ ▀██████████████████████▀ ██ ██ ██ ██ ██ ██ ██ ██ Image source: textart4u.blogspot.com ▐██ ██▌ -- ville.rantanen@helsinki.fi