Need help with regex find and replace



Can someone please help me write a find and replace regex for the following (please note that I have no background in coding or programming and so far I’ve been changing the names in my files manually):

I need a regular expression in order to automatically change the string:

gi|323508378|emb|CBQ68249.1| related to carbon source-regulated protein (putative arabinase) [Sporisorium reilianum SRZ2]

to the string:

CBQ68249.1_Sporisorium reilianum

I have about 4000 of these and I can’t do them all manually. Thanks in advance.


Search string …


Replace string …


Edit: If you ever decide to learn regexes I highly suggest this website. Of course that regex above looks ridiculous but when you break it down into parts it isn’t so bad.

Edit2: I simplified the regex a bit.

Edit3: I simplified the regex a bit more.


This one’s nice as well:


I just tried and I get the railroad diagram but I couldn’t find any place to put the test string. I could only find the place for the regex and the results box.


Yeah, confusingly, the text goes into the box underneath “results”. It’s especially useful for viewing branches and complex grouping.


It worked! Thank you so much!! :smile:
One question, the name of the species is in the square brackets

and I would like to keep only the first 2 words. Some of my files have a different ending and I don’t know how to fix the string so that it removes the end part. Ex.

gi|347009817|gb|AEO57303.1| glycoside hydrolase family 43 protein [Myceliophthora thermophila ATCC 42464]

I got

AEO57303.1_Myceliophthora thermophila ATCC

instead of

AEO57303.1_Myceliophthora thermophila


Try the edited version above.

Edit: So all species have two-word names?


Thanks for your reply, Mark. The edited version doesn’t work. Yes, all species have two word names. Here are some more examples of what I have and the bolded portions is what I would like to keep separated by underscore:

gi|113649137|dbj|BAF29649.1| Os12g0406100 [Oryza sativa Japonica Group]
gi|28924091|gb|EAA33248.1| predicted protein [Neurospora crassa OR74A]
gi|62952904|gb|AAY23175.1| putative xylosidase/arabinosidase, partial [Penicillium chrysogenum]
gi|392315282|gb|AFM57364.1| beta-xylosidase, partial [Phaeosphaeria avenaria f. sp. tritici 4 MM-2012]
gi|211581717|emb|CAP79831.1| endoarabinanase abnc-Penicillium chrysogenum [Penicillium rubens Wisconsin 54-1255]


That one was untested. I fixed it.


It worked perfectly! Thank you so much Mark, you are the best!!


OK, I must be too drunk or something. Can you tell me where in this I paste the text to be tested against?


After reload, there is a hint where to put the text: “My test data”. But the hint goes away quickly.


OK, so you put the test text in the Result box. Stupid me. I thought the Result box was for results. I’ll try again.