
Question:
I have many lines of the form
ko04062 ko:CXCR3 ko04062 ko:CX3CR1 ko04062 ko:CCL3 ko04062 ko:CCL5 ko04080 ko:GZMA
and would dearly like to get rid of the ko: bit of the right-hand column. I'm trying to use sed, as follows:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko\d{5}\)\tko:\(.*$\)/\1\2/'
which simply outputs the original string I echo'd. I'm very new to command line scripting, sed, pipes etc, so please don't be too angry if/when I'm doing something extremely dumb.
The main thing that is confusing me is that the same thing happens if I reverse the \1\2
bit to read \2\1
or just use one group. This, I guess, implies that I'm missing something about the mechanics of piping the output of echo into sed, or that my regexp is wrong or that I'm using sed wrong or that sed isn't printing the results of the substitution.
Any help would be greatly appreciated!
Solution:1
sed is outputting its input because the substitution isn't matching. Since you're probably using GNU sed, try this:
echo "ko05414 ko:ITGA4" | sed 's/\(^ko[0-9]\{5\}\)\tko:\(.*$\)/\1\2/'
- \d -> [0-9] since GNU sed doesn't recognize \d
- {} -> \{\} since GNU sed by default uses basic regular expressions.
Solution:2
This should do it. You can also skip the last group and simply use, \1
instead, but since you're learning sed and regex this is good stuff. I wanted to use a non-capturing group in the middle (:? )
but I could not get that to play with sed for whatever reason, perhaps it's not supported.
sed --posix 's/\(^ko[0-9]\{5\}\)\( ko:\)\(.*$\)/\1 \3/g' file > result
And ofcourse you can use
sed --posix 's/ko://'
Solution:3
You don't need sed for this
Here is how you can do it with bash:
var="ko05414 ko:ITGA4" echo ${var//"ko:"}
${var//"ko:"} replaces all "ko:" with ""
See Manipulating Strings for more info
Solution:4
@OP, if you just want to get rid of "ko:", then
$ cat file ko04062 ko:CXCR3 ko04062 ko:CX3CR1 ko04062 ko:CCL3 ko04062 ko:CCL5 some text with a legit ko: this ko: will be deleted if you use gsub. ko04080 ko:GZMA $ awk '{sub("ko:","",$2)}1' file ko04062 CXCR3 ko04062 CX3CR1 ko04062 CCL3 ko04062 CCL5 some text with a legit ko: this ko: will be deleted if you use gsub. ko04080 GZMA
Jsut a note. While you can use pure bash string substitution, its only more efficient when you are changing a single string. If you have a file, especially a big file, using bash's while read loop is still slower than using sed or awk.
Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
EmoticonEmoticon