Tuesday, October 26, 2010

Tour of a real toy Haskell program, part 1

Haskell suffers from a problem I'll call the Fibonacci Gap. Many beginners start out with a bunch of small mathematical exercises, develop that skillset, and then are at a loss for what to study next. Often they'll ask in #haskell for an example of a "real Haskell program" to study. Typical responses include the examples in Real World Haskell, or the Design and Implementation of XMonad talk.

This is my attempt to provide another data point: a commentary on detrospector, my text-generating toy. It's perhaps between the RWH examples and xmonad in terms of size and complexity. It's not a large program, and certainly not very useful, but it does involve a variety of real-world concerns such as Unicode processing, choice of data structures, strictness for performance, command-line arguments, serialization, etc. It also illustrates my particular coding style, which I don't claim is the One True Way, but which has served me well in a variety of Haskell projects over the past five years.

Of course, I imagine that experts may disagree with some of my decisions, and I welcome any and all feedback.

I haven't put hyperlinked source online yet, but you can grab the tarball and follow along.

This is part 1, covering style and high-level design. Part 2 addresses more details of algorithms, data structures, performance, etc.

The algorithm

detrospector generates random text conforming to the general style and diction of a given source document, using a quite common algorithm.

First, pick a "window size" k. We consider all k-character contiguous substrings of the source document. For each substring w, we want to know the probability distribution of the next character. In other words, we want to compute values for

P(next char is x | last k chars were w).

To compute these, we scan through the source document, remembering the last k characters as w. We build a table of next-character occurrence counts for each substring w.

These tables form a Markov chain, and we can generate random text by a random walk in this chain. If the last k characters we generated were w, we choose the next character randomly according to the observed distribution for w.

So we can train it on Accelerando and get output like:

addressed to tell using back holes everyone third of the glances waves and diverging him into the habitat. Far beyond Neptune, I be?" asks, grimy. Who's whether is headquarters I need a Frenchwoman's boyfriend go place surrected in the whole political-looking on the room, leaving it, beam, the wonderstood. The Chromosome of upload, does enough. If this one of the Vile catches agree."

Design requirements

We can only understand a program's design in light of the problem it was meant to solve. Here's my informal list of requirements:

  • detrospector should generate random text according to the above algorithm.

  • We should be able to invoke the text generator many times without re-analyzing the source text each time.

  • detrospector should handle all Unicode characters, using the character encoding specified by the system's locale.

  • detrospector should be fast without sacrificing clarity, whatever that means.

Wearing my customer hat, these are axioms without justification. Wearing my implementor hat, I will have to justify design decisions in these terms.

Style

In general I import modules qualified, using as to provide a short local name. I make an exception for other modules in the same project, and for "standard modules". I don't have a precise definition of "standard module" but it includes things like Data.Maybe, Control.Applicative, etc.

The longest line in detrospector is 70 characters. There is no hard limit, but more than about 90 is suspect.

Indentation is two spaces. Tabs are absolutely forbidden. I don't let the indentation of a block construct depend on the length of a name, thus:

foo x = do
y <- bar x
baz y

and

bar x =
let y = baz x
z = quux x y
in y z

This avoids absurd left margins, looks more uniform, and is easier to edit.

I usually write delimited syntax with one item per line, with the delimiter prefixed:

{-# LANGUAGE
ViewPatterns
, PatternGuards #-}

and

data Mode
= Train { num :: Int
, out :: FilePath }
| Run { chain :: FilePath }

Overriding layout is sometimes useful, e.g.:

look = do x <- H.lookup t h; return (x, S.length t)

(With -XTupleSections we could write

look = (,S.length t) <$> H.lookup t h

but that's just gratuitous.)

I always write type signatures on top-level bindings, but rarely elsewhere.

Module structure

I started out with a single file, which quickly became unmanageable. The current module layout is:

Detrospector.
  Types      types and functions used throughout
  Modes      a type with one constructor per mode
  Modes.
    Train    train the Markov chain
    Run      generate random text
    Neolog   generate neologisms
  Main       command-line parsing

There is also a Main module in detrospector.hs, which simply invokes Detrospector.Main.main.

Modules I write tend to fall into two categories: those which export nearly everything, and those which export only one or two things. The former includes "utility" modules with lots of small types and function definitions. The latter includes modules providing a specific piece of functionality. A parsing module might define three dozen parsers internally, but will only export the root of the grammar.

An abstract data type might fall into a third category, since they can export a large API yet have lots of internal helpers. But I don't write such modules very often.

Detrospector.Types is in the first category. Most Haskell projects will have a Types module, although I'm somewhat disappointed that I let this one become a general grab-bag of types and utility functions.

The rest fall into the second category. Each module in Detrospector.Modes.* exports one function to handle that mode. Detrospector.Main exports only main.

Build setup

This was actually my first Cabal project, and the first thing I uploaded to Hackage. I think Cabal is great, and features like package-visibility management are useful even for small local projects.

In my cabal file I set ghc-options: -Wall, which enables many helpful warnings. The project should build with no warnings, but I use the OPTIONS_GHC pragma to disable specific warnings in specific files, where necessary.

I also run HLint on my code periodically, but I don't have it integrated with Cabal.

I was originally passing -O2 to ghc. Cabal complained that it's probably not necessary, which was correct. The Cabal default of -O performs just as well.

I'm using Git for source control, which is neither here nor there.

Command-line parsing

detrospector currently has three modes, as listed above. I wanted to use the "subcommand" model of git, cabal, etc. So we have detrospector train, detrospector run, etc. The cmdargs package handles this argument style with a low level of boilerplate.

The "impure" interface to cmdargs uses some dark magic in the operator &= in order to attach annotations to arbitrary record fields. The various caveats made me uneasy, so I opted for the slightly more verbose "pure" interface, which looks like this:

-- module Detrospector.Main
import qualified System.Console.CmdArgs as Arg
import System.Console.CmdArgs((+=),Annotate((:=)))
...
modes = Arg.modes_ [train,run,neolog]
+= Arg.program "detrospector"
+= Arg.summary "detrospector: Markov chain text generator"
+= Arg.help "Build and run Markov chains for text generation"
where

train = Arg.record Train{}
[ num := 4
+= Arg.help "Number of characters lookback"
, out := error "Must specify output chain"
+= Arg.typFile
+= Arg.help "Write chain to this file" ]
+= Arg.help "Train a Markov chain from standard input"

run = Arg.record Run{}
[ chain := error "Must specify input chain"
+= Arg.argPos 0
+= Arg.typ "CHAIN_FILE" ]
+= Arg.help "Generate random text"
...

This tells cmdargs how to construct values of my record type Detrospector.Modes.Mode. We get help output for free:

$ ./dist/build/detrospector/detrospector -?
detrospector: Markov chain text generator

detrospector [COMMAND] ... [OPTIONS]
  Build and run Markov chains for text generation

Common flags:
  -? --help        Display help message
  -V --version     Print version information

detrospector train [OPTIONS]
  Train a Markov chain from standard input

  -n --num=INT     Number of characters lookback
  -o --out=FILE    Write chain to this file

detrospector run [OPTIONS] CHAIN_FILE
  Generate random text

...

My use of error here is hacky and leads to a bug that I recently discovered. When the -o argument to train is invalid or missing, the error is not printed until the (potentially time-consuming) analysis is completed. Only then is the record's field forced.

To be continued...

...right this way.

74 comments:

  1. tiger shroff biography
    hii you are providing great content

    ReplyDelete
  2. The Manor House Rehab thought of an Ideal Alcohol Rehab focus fundamentally was on a remote location, a thousand miles from no place with no liquor or drugs anyplace to be found. Not an awful way of thinking since Rehab isn't a get-away, not Club Med, however Rehab and changing a portion of the propensities that lead to getting stoned or alcoholic is the outcomes planned, not a gathering.
    recovery quotes
    rehab quotes

    ReplyDelete
  3. Love your article that you shared with us.
    https://crackvip.com/blufftitler-ultimate-crack/

    ReplyDelete
  4. ???? ?????? ???? ????????? ??????? ???? ?????? ?? ??????? ??????? ????? ???? ????????
    ???? ?????
    ???? ????? ????? ?? ??????
    ????? ?????

    ReplyDelete
  5. https://greencracks.com/ccleaner-full-version/
    CCleaner Crack is a fast and simple way to use the program. It is also called ‘Crap Cleaner and runs on Microsoft Windows XP, the Mac OS X Snow Leopard. This program makes your system CCleaner is introduced by the ‘Piriform.’

    ReplyDelete
  6. https://activatorscrack.com/vmix-crack-full
    vMix Crack is a Software, Video Mixer, and Switcher that is used to give live HD video mixing by using the advances in a computer system. It also works as a live streaming software and allows the user to publish their live production to the internet directly.

    ReplyDelete
  7. https://activatorskey.com/tenorshare-reiboot-crack-torrent-2021/
    Tenorshare ReiBoot 8 Crack helps you to solve your system’s several problems. As you connect it to your iPhone, it displays the option of putting your phone to recovery mode. It is introduced by ‘Tenorshare co.

    ReplyDelete
  8. https://crackswall.com/reason-9-crack-key-free-download/
    Reason Crack is recording software with everything you need to create amazing music. It is a good software for recording all the music and video audio they easily manage with high quality. People like this software because this tool is more suitable for other tools.

    ReplyDelete
  9. CFosSpeed Activation Code Download ??improves performance and reduces pings. The CFosSpeed Keygen ??uses motion configuration to rearrange Internet data packets so that the emergency traffic is transferred first and the rest of the data is transferred later.cfosspeed

    ReplyDelete
  10. Bandlez – Bag of Bass Vol.1 Crack Download is an excellent audio editing package that includes over 800 samples and all the necessary tools that can be useful in your workflow.bandlezbagofbassvol1crack

    ReplyDelete
  11. DbVisualizer Key Free Crack: Connect to all master databases using this smart and powerful database engine built for efficiency.dbvisualizerprolicensecode

    ReplyDelete
  12. The 8dio Soul Series Christopher Young – Orchestral Touch Crack Download is a new product that incorporates the musical spirit of famous artists.8diosoulserieschristopheryoungorchestraltouch

    ReplyDelete
  13. Cisco Packet Tracer Crack Download is a powerful network simulation software that allows students to experiment with network behavior with excellent simulation, visualization, writing, evaluation, and collaboration capabilities to promote the teaching and learning of complex technological concepts.ciscopackettracertorrent

    ReplyDelete
  14. ProgDVB Professional 7.35.8 Crack is a powerful and unique program for watching digital TV as well as listening to various radio channels.progdvb

    ReplyDelete
  15. West Bengal Board of Secondary Education (WBBSE) is the Board responsible for Class X Examination in the State. West Bengal Class 10 Textbooks are given in the Official Website. WBBSE Board 10th Class Textbooks are available in PDF Format in their site. WB 10th Syllabus 2021-2022 WB Board Class X is also called as Madhyamik Pariksha. Syllabus, Model question papers are also available here. Subject wise Textbooks, Syllabus are provided below. WB Class 10th Textbooks are provided by the official site.

    ReplyDelete
  16. Adobe Premiere Pro Keygen is professional software developed by Adobe development team. They still release their new version every October 18.
    Adobe Premiere Pro Keygen

    ReplyDelete
  17. Camtasia Studio Keygen provides access to some unusual screens that capture premium features for free. Allows you to select the quality of the video output.
    Camtasia Studio Serial Key

    ReplyDelete
  18. I really love your amazing blog you did an excellent job dear.
    PassFab For RAR Activation Key

    ReplyDelete
  19. You want to say your work is outstanding. The clarity in your post is simply excellent and i can assume you’re an expert on this subject.
    DLL File Fixer Crack

    ReplyDelete
  20. Daemon Tools Crack Download can also work with already copied CDs. It is very reliable and better than other disk emulation tools. It can save you a lot of time by exchanging discs on your computer automatically.

    ReplyDelete
  21. WYSIWYG Web Builder Crack is basically an app for beginners because you don’t need to be proficient in HTML to use it. Everything can be controlled with the mouse, so if you don’t like typing a lot of symbols, that’s fine.
    WYSIWYG Web Builder Crack

    ReplyDelete
  22. Corel Painter Crack is a gradient art tool and an innovative art brush. In Paint, you will produce images and adjust the presentation by massaging acrylic strokes, water, and thick texture.
    Corel Painter Crack

    ReplyDelete
  23. https://www.techamster.com
    all computer related module available here.

    ReplyDelete
  24. https://mypcpatch.com
    all software patch available here.

    ReplyDelete
  25. Adobe Acrobat DC Pro Crack is magnificent software for greater results of your task. It provides the best features to its users.
    Also it has a simple and easy to use interface. I've been using it for a long time and it is the best one indeed.
    You can Download it free of cost.

    ReplyDelete
  26. Download Shadow Fight 2 mac (MOD, Unlimited Money) which is a mix of RPG and classic fighting games. This game lets you equip your character with countless and This Site

    ReplyDelete
  27. R-Studio
    I am very impressed with your post because this post is very beneficial for me

    ReplyDelete
  28. I have read this article. It is very well written. You can also check articles here Right Click Enhancer Pro 4.5.6.0 Crack + Serial Key Full Version Download is also a good article. Give it a read.

    ReplyDelete
  29. Trade FX At Home On Your PC: roboforex login Is A Forex Trading Company. The Company States That You Can Make On Average 80 – 300 Pips Per Trade. roboforex login States That It Is Simple And Easy To Get Started.

    ReplyDelete
  30. Amazing article mate, an everything contemplated befuddling article. I've end up being your fan happening to investigating your befuddling article. You may tragically checkout my work.

    Freemake Video Downloader Serial Key

    ReplyDelete
  31. Thanks For This Great and Very Informative Post Share With us....

    Activators 4 Windows

    ReplyDelete
  32. An amazing article companion, a confusing article that has been well thought out. I ended up falling in love with him by going to research his perplexing article. Tragically you can see my work.

    crackedway.com

    ReplyDelete
  33. Keep posting interesting articles, I enjoyed the article. Your Android device transforms into a portal to amazing digital worlds when you play the best Android games WinRAR Crack Full Version

    ReplyDelete

  34. HI Dear, I have appreciated the fact that you took take some good time to work on this very blog. It’s great and fantastic piece. Keep it up as I look forward to read more from this website
    Advanced SystemCare Pro Crack

    ReplyDelete
  35. hi sir,Found your post interesting to read. I cant wait to see your post soon. Good Luck for the upcoming update.This article is really very interesting and effective thank you.IPVanish VPN Crack


    ReplyDelete
  36. Hi dear, I really appreciate the fact that you took such a good time to work on this particular blog. It is a wonderful and wonderful piece. Keep it up while I wait to read more from this site. crackcut.com

    ReplyDelete
  37. hello sir,I truly appreciate this post. I?ve been looking everywhere for this! Thank goodness I found it on Bing. You have made my day! Thanks Mindjet Mindmanager Crack

    ReplyDelete
  38. Hi, I have to say I am impressed. I rarely come across such an informative and interesting blog,
    and let me tell you that you nailed it.
    Crackmypc

    ReplyDelete
  39. Hi, Thank you so much for taking the time for you to share such a good information. I actually prefer to reading your post. Keep it up!
    Corel PaintShop Pro Activation Key

    ReplyDelete
  40. Hi, I have to say I am impressed. I rarely come across such an informative and interesting blog,
    and let me tell you that you nailed it.
    Maxon CINEMA keygen


    ReplyDelete
  41. hi sir,Found your post interesting to read. I cant wait to see your post soon. Good Luck for the upcoming update.This article is really very interesting and effective thank you.

    https://patchhere.com/fontcreator-pro-crack-free-download/

    ReplyDelete

  42. Good Post. This is my first time i visit here and I found so many interesting stuff in your blog especially it's discussion, thank you.
    Musify Crack patch

    ReplyDelete
  43. Hi, Thank you so much for taking the time for you to share such a good information. I actually prefer to reading your post. Keep it up! Photopia director serial key

    ReplyDelete
  44. With a focus on achieving high search engine rankings with Forex Trading Seo , we offer services like search engine optimization, search engine marketing, and pay-per-click management.

    ReplyDelete
  45. Are you an online trader looking for help? Do you want to reach more customers with your Forex broker and spread your message across the internet ? Here we have a solution for you and it's called - " Forex Trading Seo Expert Services by Tradingzy"

    ReplyDelete
  46. Loginpal Is An Online Portal Created For Traders To Help Them Find The Best Brokers And Strategies For Online Trading. Along With Broker Review And Login Details At Loginpal We Offer Guest Post And Blog, Content Marketing Services, Link Building Services, And Much More. Also Get The Latest Press Releases, Articles, Industry News And Price Quotes That Might Effect Investment Decision.

    ReplyDelete
  47. Forex Trading Evo is your best source for forex trading broker reviews and information about the forex market. Browse through our list of top rated forex brokers, compare brokers and find out which one suits you the best. Make sure to use our unbiased forex broker reviews so you will know which ones to avoid and which ones to trade with. To find the most trusted online forex trading brokers, our team at Forex Trading Evo carefully reviews the largest online forex brokers in the industry.

    ReplyDelete
  48. Fortunately, I stumbled upon your site (crashed).
    I ordered the latest book! I have visited various blogs, Your style is very different from my other people's style.
    english short stories with moral value english stories

    ReplyDelete

  49. Thanks for sharing such an amazing post. Great Work. Love visiting your blog. I would like to thank you for sharing this post. Really Happy to Read. Hotspot Shield Vpn APK Cracked

    ReplyDelete
  50. If you want to download any crack software so lurkapc is a wonderful point.Its launch program latest version within all brand new features that is so attractive and easy to use.
    aster multiseat pro crack
    avast antivirus pro crack
    free netflix crack
    folx pro crack

    ReplyDelete
  51. Gutt Websäit : Zonahobisaya
    Gutt Websäit : One Piece
    Gutt Websäit : Zonahobisaya
    Gutt Websäit : Resep
    Gutt Websäit : One Piece
    Gutt Websäit : Zonahobisaya
    Gutt Websäit : Zonahobisaya
    Gutt Websäit : Terbesar

    ReplyDelete
  52. https://newcrackkey.com/avast-secureline-vpn-crack-2022-key/
    Avast Secureline VPN
    License response connects movable designed for initial, desktop processer in addition robot.

    ReplyDelete
  53. I like your all post. You have done really good work. Thank you for the information you provide, it helped me a lot. crackbay.org I hope to have many more entries or so from you.
    Very interesting blog.
    GraphPad Prism Crack

    ReplyDelete
  54. Main Is Usually A Function: Tour Of A Real Toy Haskell Program, Part 1 >>>>> Download Now

    >>>>> Download Full

    Main Is Usually A Function: Tour Of A Real Toy Haskell Program, Part 1 >>>>> Download LINK

    >>>>> Download Now

    Main Is Usually A Function: Tour Of A Real Toy Haskell Program, Part 1 >>>>> Download Full

    >>>>> Download LINK HP

    ReplyDelete
  55. Good Work At All. I Really impressed and got lots of information from your post and encourage me to work as best as i can. keep it up. You Can Check mine Mipony Pro Crack

    ReplyDelete
  56. I am very impressed with your post because this post is very beneficial for me and provides a piece of new knowledge to me. Visit us: Chardham Yatra by Helicopter

    ReplyDelete
  57. When will you get it rolling? You should be locked in and really adhere to these objectives for it to work. It is all of the an issue of watchfulness Cheap Weed. You could feel enraptured, yet don't be debilitated with yourself for this. Anyway extensive you don't surrender, you are made a beeline for quit taking part in weed at last

    ReplyDelete
  58. حديقة حيوان الرياض أو Riyadh Zoo عبارة عن أدغال أفريقية على أراضي المملكة العربية السعودية حيث تحتوي على الكثير من الحيوانات بمختلف الأنواع والجنسيات والتي تأتي  حدائق السعودية من مختلف قارات العالم فضلا  عرب دار عن أندر أنواع الحيوانات والتي شارف أغلبها على الإنقراض من افضل حدائق السعودية التي تجذب العديد من الناس

    ReplyDelete
  59. hengheng888 เว็บไซต์สล็อตรูปแบบใหม่ที่ใส่เกมสล็อตประสิทธิภาพ PG SLOT แตกง่ายจ่ายจริง มาพร้อมออฟชั่นเสริมกระบวนการทำเงินแบบใหม่ ทำให้คนใดก็เข้ามาร่วมโกยเงินเข้ากระเป๋าง่าย

    ReplyDelete
  60. pgsoft สล็อต คาสิโน ที่คุณสามารถทำเงินได้อย่างดียิ่งบน สล็อต ออนไลน์ ด้วยการเดิมพัน แบบกำหนดแผนการเงินที่รอบคอบมากพอ pgslot เพื่อวงล้อหมุนแล้วลุ้นเงินรางวัลเท่านั้นเองเลย

    ReplyDelete
  61. mega 168เข้าสู่ระบบ แนวทางที่ดีที่สุดในการท่องโลกของ Mega168 เข้าสู่ระบบ ปลดปล่อยพลังของแพลตฟอร์มนี้ด้วยข้อมูลเชิงลึก PG SLOT เคล็ดลับ และคำถามที่พบบ่อย เตรียมพร้อมที่จะปฏิวัติประสบการณ์

    ReplyDelete
  62. เกม สล็อต k9win เว็บไซต์ผู้ให้บริการเกม slot onlin ในลักษณะของอันดับที่หนึ่งในประเทศไทยด้วยระบบที่ล้ำยุคที่สุดกับ pgslot ที่ได้นำเกมสล็อตมากไม่น้อยเลยทีเดียวได้นำเกมสล็อต

    ReplyDelete
  63. ทางเข้า pg สล็อต world คู่มือที่ครอบคลุมนี้จะแนะนำคุณตลอดกระบวนการ โดยนำเสนอข้อมูลเชิงลึกจากผู้เชี่ยวชาญและคำตอบสำหรับคำถามที่พบบ่อย PG SLOT ค้นพบวิธีเพลิดเพลินกับความบันเทิงออนไลน์

    ReplyDelete
  64. I am an SEO Writer: Download any cracked Software at Maniscrack.com

    ReplyDelete