Tutorial :RegExp: How to extract usernames out of Tweets (twitter.com)?



Question:

I have the following example tweet:

RT @user1: who are @thing and @user2?

I only want to have user1, thing and user2.

What regular expression can I use to extract those three names?

PS: A username must only contain letters, numbers and underscores.


Solution:1

Tested:

/@([a-z0-9_]+)/i  

In Ruby (irb):

>> "RT @user1: who are @thing and @user2?".scan(/@([a-z0-9_]+)/i)  => [["user1"], ["thing"], ["user2"]]  

In Python:

>>> import re  >>> re.findall("@([a-z0-9_]+)", "RT @user1: who are @thing and @user2?", re.I)  ['user1', 'thing', 'user2']  

In PHP:

<?PHP  $matches = array();  preg_match_all(      "/@([a-z0-9_]+)/i",      "RT @user1: who are @thing and @user2?",      $matches);    print_r($matches[1]);  ?>    Array  (      [0] => user1      [1] => thing      [2] => user2  )  


Solution:2

/(?<!\w)@(\w+)/  

The above covers the following scenario, which other answers in this thread do not:

  • An @ sign that is not supposed to be a username, e.g. "my email is test@example.com"
  • Still allows a username that is at the beginning of a string, e.g. "@username lorem ipsum..."


Solution:3

try an iterator (findall) with this regex:

(@[\w-]+)  

bye


Solution:4

This should do it (I used named captures for convenience):

.+?@(?[a-zA-Z0-9_]+):[^@]+?@(?[^\s]+)[^@]+?@(?[a-zA-Z0-9_]+)


Solution:5

Is a good idea include twitter text library [1] in your project to resolve this text issues.

twttr.txt.extractMentions("a very generic twitt with some @mention");  

[1] https://github.com/twitter/twitter-text-js


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »