python - UnicodeDecodeError: 'utf8' codec can't decode byte - Euro Symbol -
i build connection google finance api gives me stock quotes. working fine until switch courses europe. these contain € symbol , following error:
traceback (most recent call last): file "c:\users\administrator\desktop\getquotes.py", line 32, in <module> quote = c.get("sap","fra") file "c:\users\administrator\desktop\getquotes.py", line 21, in obj = json.loads(content[3:]) file "c:\python27\lib\json\__init__.py", line 338, in loads return _default_decoder.decode(s) file "c:\python27\lib\json\decoder.py", line 365, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) file "c:\python27\lib\json\decoder.py", line 381, in raw_decode obj, end = self.scan_once(s, idx) unicodedecodeerror: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte
the following code using. guess error appears while json trying processing string can not resolve euro symbol:
import urllib2 import json import time class googlefinanceapi: def __init__(self): self.prefix = "http://finance.google.com/finance/info?client=ig&q=" def get(self,symbol,exchange): url = self.prefix+"%s:%s"%(exchange,symbol) u = urllib2.urlopen(url) content = u.read() obj = json.loads(content[3:]) return obj[0] if __name__ == "__main__": c = googlefinanceapi() while 1: quote = c.get("msft","nasdaq") print quote time.sleep(30)
this how google finance gives me output sap stock containing euro symbol:
// [ { "id": "8424920" ,"t" : "sap" ,"e" : "fra" ,"l" : "56.51" ,"l_cur" : "€56.51" ,"s": "0" ,"ltt":"8:00pm gmt+2" ,"lt" : "aug 7, 8:00pm gmt+2" ,"c" : "-0.47" ,"cp" : "-0.82" ,"ccol" : "chr" } ]
i tried use function , instead of opener (content[3:]) part got same error, instead of utf-8 got ascii error.
json.loads(unicode(opener.open(...), "iso-8859-15"))
if has idea happy.
the document you're fetching appears encoded windows codepage 1252, euro sign character encoded \x80
. that's invalid byte in utf-8 , non-printing control character in iso-8859 variants. try:
obj = json.loads(content[3:], 'cp1252')
Comments
Post a Comment