How to search for multiple words?

Hey,

For my project, I have a task to inspect a text which contains over 1 million characters. I need to search for a few hundreds of specific words.
How can I do that? Because as I use the regular search Ctrl+F, I can only type one word.

Thank you in advance!

One suggestion is to regard your text as a corpus of words and then apply concordance analysis.

This is just one concordance analysis tool.

http://neon.niederlandistik.fu-berlin.de/en/textstat/

Download TextSTAT beta 3 and in the folder run using the command …

python3 ./TextSTAT.py

Create a new corpus and then point to your file(s) and save corpus.
Search with search string * and you will see a list of words.
Then click on any word to see multiple concordances.

If you can create a list of words you can automate the process.

2 Likes

Thank you so much for your reply.
I downloaded the TextStat and it seems to work fine.
But as I know almost nothing about working with texts, could you please explain to me in detail how to create a list of words which I could search all at the same time?

When you write …

I need to search for a few hundreds of specific words.

you need to expand a bit further on your task.
What word types do you wish to use in your search?
And when you write

inspect a text which contains over 1 million characters

do you mean single characters … or do you mean words (bounded by space separators).
Otherwise we can only guess what you are trying to do.

Be precise in asking questions and you will get precise answers.

So to be more precise, I have a book in word format which contains almost 170k different words (and over 1 million different characters with spaces).
And I have a list of words. These words are all nouns (names of professions like engineer, professor, actor etc). In that list, there are around 2k nouns. And I need to search for every noun in my book.
I am trying to find the easiest way to search for as many nouns as possible at the same time.
Not every noun can be found in that book, so for me, it’s important to find which of these nouns are used in the text and how many time they appear in the text (for that I can use the frequency section in the textstat)

I don’t know if I explained​ clearly now.

Nonetheless, thank you for helping.

That is a much clearer explanation of your requirements. You are in the domain of semantic analysis. Since you started in the Atom discussion forum I assume that you would like to see this feature in an Atom package. There is a basic word count facility if you search in Packages …

https://atom.io/packages/search?q=wordcount

although I have not used these. For advanced text analysis I would look at R packages for text analysis. An R script might be written to run through your list of 2k nouns and run a parallel search. There is an Atom package R-exec which runs R scripts. And R offers a rich source of text analysis.

https://www.tidytextmining.com/tfidf.html

Returning to the earlier TextSTAT tool I suggested this does not offer multiple word searches which is what you need.

I refer now to another advanced tool AntConc where you can apply a long list of words and you might learn much from initial use of this tool then perhaps write an R script which you can run in Atom using R-exec package…

http://www.laurenceanthony.net/software/antconc/

https://research.ncl.ac.uk/decte/toon/assets/docs/AntConc_Guide.pdf

1 Like

I thought I might add this link to a simpler approach I found to try …

https://unix.stackexchange.com/questions/37313/how-do-i-grep-for-multiple-patterns-with-pattern-having-a-pipe-character

In a grep search each word from a set (e.g. your nouns) is separated by “|” pipe character.