'plugins/packages/extensions' that highlight all 'nouns, adjs, OR adverbs, or other 'parts of speech'?


#1

well if there is, it seems like one of these:

  1. nobody knows

  2. very hard to find out of many ‘plugins/packages/extensions’


#4

I don’t think there is any such thing. Also, I can imagine that parsing a file against a vocabulary of thousands of keywords would be very slow.

An average 20-year-old knows 42,000 words coming from 11,100 word families; an average 60-year-old knows 48,200 lemmas coming from 13,400 word families. (via)

The Second Edition of the 20-volume Oxford English Dictionary contains full entries for 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries. (via)


#5

The vocab size isn’t the biggest problem that has to be solved. A sophisticated caching scheme can reduce lookup calls to a minimum, because a lot of language is redundant, particularly over article- and book-length discussions on single topics. The hardest problem is determining what each word is. You can look up the word “Wicked”. Is that the past-tense of “to wick”, an adjective describing a mean person, a slang synonym for “cool”, or a musical? You can frequently tell, but you have to look at the words in front of and behind the word in question. Humans can understand the concepts indicated by words, but computers have to make a best guess based on tables of word pairing frequency. These data exist in multiple forms, but building the required dataset takes a lot of storage space and bandwidth, as well as access to a lot of texts that you can use to generate the dataset, so it’s a job that has to be done by people with the expertise and computing process to do it. The resulting data is not the easiest to work with, but fortunately that problem has already been solved. Enter projects like the Natural Language Toolkit, which have convenient APIs for decoding parts of speech. When I feed the entire text of Franz Kafka’s Metamorphosis to NLTK, it takes 5.8 seconds to return a parsed tree of the text. A package that cached its results and only re-processed those parts of the text that actually changed could run many searches on much smaller spans of text, then apply decorations in Atom to highlight specific words. There’s a JavaScript module for natural language processing, but the part-of-speech documentation makes it look like you have to write your own lexicon.


#6

walking or at least sitting encyclopedia… doesnt even need a place to save/store info/notes… insane just insane…

anyhow asked cos saw that this software

had the feature/function in their 2017 release, dunno how they do it tho, havent used as well