----BEGIN CLASS---- [13:38] #startclass [13:38] Just realised my mic is damaged :( [13:38] Roll Call [13:38] i jast see myself [13:39] Robin Schubert [13:39] Gaurav Sitlani [13:39] Abhishek Singh [13:39] Priyanka Sharma [13:39] kumar vipin yadav [13:39] Kshitij [13:39] yurii, network issue [13:39] Nikita Kotak [13:39] Anu Kumari Gupta [13:39] and do not hear anything [13:39] Samridhi Agarwal [13:39] Ashwani Kumar Gupta [13:39] Devesh Verma [13:39] sayan ok [13:39] Sandesh Patel [13:39] Krishnanand Rai [13:39] Avik Mukherjee [13:39] Looks like many of us were having connectivity issues on call [13:40] Yurii Pylypchuk [13:40] sitlanigaurav[m], yes :( [13:40] Girish joshi [13:40] Now you all know why we do this training on IRC but not on video :) [13:40] Bhavin Gandhi [13:40] Anyway. [13:41] pr97, How many sessions have you attended before? [13:41] pr97: she did a bunch of them iirc [13:42] I have attended almost all, except from 2 weeks [13:42] pr97, You are not supposed to call us Sir. [13:42] pr97, We have names, just use that. [13:42] sorry kushal [13:42] pr97, No problem. [13:42] pr97, Just saying. [13:42] okay kushal [13:43] Harsh Vardhan [13:43] Who all had any problem with the last day [13:44] Who all had any problem with the last day's problem? [13:44] eeeks [13:44] program [13:44] anyway [13:44] Any one? Any question? [13:45] ! [13:45] next [13:45] Is it good idea to use more number of global variables inside functions? [13:45] bhavin192, Nope, if possible do not use any. [13:46] I'm trying to have one dict as global on which functions will perform operations [13:46] ! [13:47] next [13:47] How to differentiate between actual links and sectional links (with #) with BeautifulSoup? [13:48] Sorry but can i know what is going on? ffrom the last two weks since i am not able to attend those sesiond [13:48] oops sorry [13:48] pr97, The logs are online :) [13:48] pr97, right now I am just taking questions. [13:48] okay [13:48] ashwanig, find every link and then see if # is in them or not. [13:48] ashwanig, I don't know any other way. [13:49] Anyone else, any question with the problem given? [13:51] What is the function in requests module we used to download files? [13:51] requests.get [13:51] requests.get() [13:52] requests.get() [13:52] requests.get() [13:52] Now a quick function [13:55] brb [13:59] back [14:00] Write a function which will take any URL as an argument, and then check if there is any image in that URL, and if yes then save it in the current directory. [14:00] Use this as an argument https://dgplug.org/assets/img/header.png [14:03] kushal, An image can be in bmp, svg, png, jpeg only? [14:03] Any other formats possible/missed? [14:04] vharsh, To start with, just take png and jpg and svg files [14:04] Do the images given always end with the right extension? [14:04] yes [14:05] so we have to check if the url is of an image and if yes then download it, right? [14:05] bhavin192, That you decide :) [14:10] chandankumar, I didn't find link for pycon 2017 security speech video will you please send me the link. [14:15] kushal, done http://dpaste.com/17YGDSC.txt [14:15] vharsh, I was hoping for a github link :) [14:15] checking [14:15] kushal, I simply love the dpaste's API :) [14:16] I added `alias dpaste='curl -s -F "content=<-" http://dpaste.com/api/v2/'` in my .bashrc :) Simple life :) [14:16] vharsh, so what will happen if I pass https://kushaldas.in/asdf.png/rofl.txt ? [14:17] kushal, I'll probably add checks for header to see if it is actually an image [14:17] People can add sub-links to parts of an SVG, in that case it will get ignored. [14:17] vharsh, There are many ways to do it. [14:17] vharsh, Learn to keep problems simple. [14:18] vharsh, This is why git, you can see how one can improve the code. [14:18] kushal, because SVGs are HTML-like markup to draw pictures. [14:18] vharsh, It is XML [14:18] https://github.com/Schubisu/dgplug/blob/master/problems/20170911_crawl_images.py [14:19] vharsh, as I said, why don't you try to keep the problems simple? [14:19] schubisu, checking [14:20] schubisu, nope, I just want to pass https://dgplug.org/assets/img/header.png to the function, and it will download the header.png in the current directory. [14:20] schubisu, What you did is right, but a different problem :) [14:21] kushal: I must admit I tested with https://dgplug.org ;) will check and update again [14:21] schubisu, I want to pass one single image URL [14:21] and want to download that :) [14:22] kushal: ah, okay. [14:24] kushal, How about checking the response header for binary content, assuming the server supplies a MIME type. [14:24] vharsh, You can implement many things. [14:24] One by one. [14:24] However the dgplug's server says it is a document. [14:25] https://github.com/nikita1211/Python_Practise/blob/master/check_img.py [14:25] NikitaK3, checking [14:26] yes kushal [14:26] NikitaK3, is this line correct if extension == ".png" or ".jpg" or ".svg": ? [14:27] The code is working [14:28] kushal, The response header doesn't seem to have a Content-Type header. I got https://dpaste.de/oBj7 [14:28] vharsh, The whole idea of the session is to make people use github more. [14:29] python3 check_img.py [14:29] File "check_img.py", line 10 [14:29] print "an image" [14:29] ^ [14:29] SyntaxError: Missing parentheses in call to 'print' [14:30] NikitaK3, ^^^^ [14:30] kushal should it be like generalised for all images and I have taken only a few extensions? [14:30] NikitaK3, For at least 3 types [14:31] Kushal I was using python 2. I will change that and use parenthesis. [14:31] NikitaK3, We never used Python2 in these sessions. [14:32] Yes, I will update it immediately. [14:33] What about others? [14:33] Do you need help? [14:33] If yes, then please let us know. [14:34] kushal, how can I download the image? stuck there [14:34] sayan ping [14:35] kushal: https://github.com/sandeshpatel/summer-training/blob/master/grab_iff_photo.py [14:35] kushal done. What else can be improved in the code? [14:35] avik, You can use requests.get function and read the URL content, and then write it to the disk. [14:36] NikitaK3, san-D There can be better ways to detect if a file is an image or not. [14:36] Search about it. [14:36] kushal: it should do both now https://github.com/Schubisu/dgplug/blob/master/problems/20170911_crawl_images.py [14:37] kushal, sayan https://gist.github.com/userimack/9b9ba60d5067213b1f4fc92088cd77f1 [14:38] schubisu, now, before saving the file content, can you also check again if it is a real image file or not? [14:39] kushal: can I check that without being specific for any image file format? [14:39] schubisu, I guess no. [14:40] kushal, write() argument must be str, not Response [14:40] getting this error [14:40] avik, yup [14:40] avik: because requests.get returns a response [14:40] avik, few things. [14:40] looked up [14:40] avik, Images are binary data [14:40] avik, Not text. [14:41] response is the return type of get() [14:41] avik, Also, you have to open the file in binary mode, to write binary content in it. [14:41] https://en.wikipedia.org/wiki/List_of_file_signatures this can help everyone. [14:42] kushal, sayan, please have a look https://gist.github.com/userimack/9b9ba60d5067213b1f4fc92088cd77f1 [14:42] imack, checking [14:43] :) [14:43] imack, https://gist.github.com/userimack/9b9ba60d5067213b1f4fc92088cd77f1#file-get_image-py-L15 [14:43] imack, you can just write:: if response: [14:45] kushal, ok I will modify it, anything else i need to change [14:45] imack, Check the file signature too [14:45] kushal, can we use imghdr module ? https://docs.python.org/3/library/imghdr.html [14:45] bhavin192, Maybe, I don't know about it. [14:46] kushal, Okay trying to check file signature first [14:47] kushal: https://github.com/KrishnanandRai/dgplug_problems/blob/master/downloading_image/pr.py [14:47] kushal, can you please help on file signature [14:49] knrai, https://github.com/KrishnanandRai/dgplug_problems/blob/master/downloading_image/pr.py#L5 do you need to convert it to tuple? [14:50] imack, read the first few bytes, and match [14:51] kushal: endswith was not accepting list [14:51] Correct :) [14:51] knrai, tuple of strings [14:51] Just asked why :) [14:53] I was trying endswith using list and it worked! [14:53] kushal: sorry not getting your question [14:53] knrai, no question. [15:01] https://github.com/IamTechnotron/dgplugproblems/blob/master/save_if_image [15:01] kushal, finally did it! please have a look when have time! [15:04] kushal, ? [15:05] kushal, https://github.com/samridhiagarwal/dgplug-check-url-if-img [15:06] samikshan, avik checking [15:06] avik, https://github.com/IamTechnotron/dgplugproblems/blob/master/save_if_image this file will not work. [15:06] No sha-bang line. [15:07] avik, samridhia Both of you now try to add file signature check in the code. [15:08] kushal, ah yes, I forgot to remove them [15:10] kushal, can you please explain file signature check [15:12] avik, start with https://en.wikipedia.org/wiki/Magic_number_%28programming%29 [15:14] kushal, ok! [15:17] kushal, okay so is it same as the extension? I mean how else can I check? [15:17] avik: I think you have to scan the binary raw image for the magic number [15:18] schubisu, oo! but how? [15:19] avik, first think how to get the file in bytes [15:19] and learn how to compare bytes [15:19] like using == [15:21] kushal, the file I saved is in bytes already, isn't it? [15:22] avik, Images are binary files. [15:22] okay! yes. [15:23] Who ever is done with magic number checking, please start a thread in the mailing list with the link to the code. [15:24] Roll Call [15:24] Devesh Verma [15:25] Bhavin Gandhi [15:25] Krishnanand Rai [15:25] Samridhi Agarwal [15:25] Sandesh Patel [15:25] Robin Schubert [15:26] Ending the session, the people who did not say anything, please ask others about the code. It is okay to not being able to write code at the first chance. [15:26] Avik Mukherjee ----END CLASS----