Updated after thinking twice about 2am attempts at creative writing…
Arteries have a nice structure. Each artery branches successively into smaller arteries. I’ve been thinking for a while that it would be cool to have that structure in a computer in a format that allows us to analyze it and visualize it easily.
Wikipedia’s infoboxes are probably a good starting point. They provide the data I’m looking for in a pretty standard format that might make it easy to extract.
Wikipedia also has some really nice API’s that allows people to operate bots that view and edit the content automatically. Something like this url will spit out page content in a way that is easy to read into a programming language like python. For really intense users, there’s a whole library out there to provide a nice interface for doing these things, but it looked tough enough to learn that I decided to stick with pythons built-in urllib2. I hadn’t worked with it before, but like everything in Python, it’s beautifully simple.
The final process basically involved getting a list of article titles from the “arterial tree” article with one script, using more of wikipedia’s API’s to clean and normalize the titles, and spending 5-6 hours writing a big script that uses regular expressions to scrape the “Source” from each artery article and save it as a file that can be easily read by python. I had to manually clean some of it up (~20 items), since that big script didn’t work perfectly on some exceptional cases, but I ended up managing to use google’s visualization playground to come up with a final visualization that I think looks pretty good.


2 Comments
If structured data about anatomy like this is of interest, you might want to have a look at the FMA ontology. http://sig.biostr.washington.edu/projects/fm/FAQs.html
Its big enough to be quite unwieldy to process, but it is very rich with information. There is a ‘lite’ version that might work for you.
Thanks! That is an incredible data set. I’m surprised I hadn’t heard of it before. I’ll keep in in mind for future projects.