regex - HIVE regexp_extract URL strings -


hi i'm trying parse large url's log using hive.

there particular value want extract url (strategy=??) values can hyphenated, not always.

i built sample query, returns nothing.

what doing wrong?

select regexp_extract('234=23234&werw=asdf&strategy=retargeting&asdf=fds23', '(strategy=)([-\w*]*)',2) vt; 

so value i'm expecting retargeting partial url string. 234=23234&werw=asdf&strategy=retargeting&asdf=fds23

any appreciated!!!

i believe regex work you:

strategy=((\w-?)+)

here's regexr link: http://regexr.com?35sbl. after matching, group 1 contains value of strategy. note regex match number of hyphens in value. fails if hyphen first character (though, in opinion, leading hyphen not make value 'hyphenated').

from can tell, method didn't return because of way group 2 set up: have [-\w*], says "match hyphen , number of alphanumeric characters (including 0)". rewrite [-?\w*]*, says "match or don't match hyphen, , number of alphanumeric characters (including 0)". however, match just hyphen, in case

strategy=-

which not want, think. safer way might setting group 2 [-?\w+]+, require @ least 1 \w character after equals sign. happy coding! :)


Comments

Popular posts from this blog

plot - Remove Objects from Legend When You Have Also Used Fit, Matlab -

java - Why does my date parsing return a weird date? -

Need help in packaging app using TideSDK on Windows -