Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

python, week 5

$
0
0

i used code from here to put all the words from the first executive order into a set, which is a way to get all the unique words in the document.

## from allison parrish's http://www.decontextualize.com/teaching/rwet/simple-models-of-text/ import sys words = set() for linein sys.stdin: line = line.strip() line_words = line.split() for wordin line_words: words.add(word) for wordin words: print word

which produced “words” separated by line breaks. here are some interesting sections:

all

United

burden

PENDING

out

purchasers

Patient

for

enforceable

availability

HOUSE,

health

imperative

Nothing

benefit

Human

repeal,

or

otherwise

individuals,

control

Constitution

unwarranted

fiscal

head

with

legislative

Procedure

CARE

me

commerce

agency

Act

authorities

such

WHITE

law

affect:

impair

does

In

the

insurers,

okay so obviously some of these would be neat poems, so i tried to join them:

import sys for linein sys.stdin: line = line.strip() output = " ".join(line) print output
python, week 5

hmmmm noooo…


python, week 5

hmmmm nooooooo… okay, new activity: replacing the executive order with these poems.

i’m doing this manually for now since it would involve a bunch of regex, but i’ll record the steps here:

replace all instances of “Minimizing the Economic Burden of the Patient Protection and Affordable Care Act Pending Repeal” with first poem above, “all United burden,” in the style in which the original text appears (so, with .title() or .upper()) when sections begin, keep the text naming the section as such (“Section 1”, “Sec. 2”, etc.) but replace body of the section with the next poem above. remove newlines from poems above so the words flow like sentences, but don’t change case, punctuation, etc. fill sections for as many poems as were originally picked out from the set. delete sections that don’t have an accompanying poem.

this feels very related to a project i did in jer’s class last year where i replaced “mortgage” language with “data” language in hank paulson’s 2008 announcement about the economy. python woulda helped with that/made it better. anyway, executive order results here , original here .


python, week 5
another thing i was working on was figure out how to clean up the file without going through manually. these are things i did in the interpreter. i wonder if there’s a way to say if 'space' char appears > or = 2 times, replace it with ' ‘? it’d also be cool to figure out how to split on html tags so i don’t have to manually delete those. maybe this will be useful later.

for linein lines: line = line.strip().replace('', ' ') line = line.strip().replace(' ', ' ') line = line.strip().replace('', ' ') line = line.strip().replace(' ', ' ') line = line.strip().replace('', ' ') line = line.strip().replace(' ', ' ') line = line.strip().replace('', ' ') line = line.strip().replace(' ', ' ') print line


Viewing all articles
Browse latest Browse all 9596

Trending Articles