----BEGIN CLASS---- [13:27] #startclass [13:27] Roll Call [13:27] Sandesh Patel [13:27] Krishnanand Rai [13:27] Ashwani Kumar Gupta [13:28] Bhavin Gandhi [13:28] Mahendra Yadav [13:28] Anyone else? [13:29] Nikita Kotak [13:29] Gaurav Sitlani [13:29] Deepika Upadhyay [13:32] wait for 5 minutes. [13:36] Let us discuss about writing code, instead of writing real code :) [13:36] roll call: Robin Schubert [13:37] Say, I tell you find the file type of a given file in your computer. [13:37] say, /tmp/asdasdfsa is the file path. [13:37] How do you plan to solve the problem? [13:37] talk freely. [13:37] solve with python or bash? [13:37] first I will check if the file exist [13:37] check extension? [13:38] using file command in bash? [13:38] check magic number [13:38] then I will check the extension if present and then the magic numbers [13:39] check if file exist and then magic number [13:39] schubisu, Python [13:39] So the first thing we want to make sure that the file exits, correct? [13:39] yes [13:39] yep [13:39] yes [13:39] yes [13:40] What are magic numbers ? [13:40] Yes [13:40] If we compare the file extension/magic number with one of every other type of file, wont the program be too slow? [13:41] sitlanigaurav[m], https://github.com/bhavin192/lpy-dgplug/blob/master/image_downloader/README.md [13:41] s/wont/won't [13:41] Learned about magic number ,yes 😃 [13:42] ashwanig: we can use dictionary for faster searching [13:42] ashwanig, how fast do you think your computer can run a for loop and an comparison inside of it? [13:43] ashwanig, don't know about speed, but it would be having list of all magic numbers, what about readability [13:43] kushal, almost instantly [13:44] bhavin192: you can save away lists like that in external files to keep your code clean. json for example [13:44] Now, after we checked that the file exists, what should we do next? [13:44] schubisu, ah! right that would be good idea. [13:46] We will check for the extension [13:46] and accordingly verify the magic number [13:48] kushal, if the file does not have an extension then we will have to check it with every other magic number? [13:48] extension does not matter. [13:48] mv song.mp3 file.jpg [13:48] ashwanig, ^^^ is that a jpg now? [13:48] no [13:49] So :) [13:49] So, the next step is to read the file. Means first we have to open the file. [13:50] we should check magic number rather than relying on extension right ? [13:53] knrai, yes [13:53] kushal, now we have to see if we have access to the file. [13:54] bhavin192, Yup, that step is before opening the file, or we can try to open and catch any error. [13:55] I was thinking about catching error [13:56] my problem would be to figure the length of the magic number, since they very [13:56] s/very/vary/ [13:57] schubisu, so read the biggest and try to match parts of it maybe [13:57] so maybe read a rather larger number find search for a closest match, than do a fine check? [13:57] Or [13:57] Can we create a hash function for faster search ? [13:58] deepika, it will be still slower than just access a dictionary most probably. [13:58] Who all searched and found https://filemagic.readthedocs.io/en/latest/ ? [13:58] Anyone? [13:59] no, I didn't search [13:59] same here [13:59] kushal, I found `file` command [14:00] same here [14:00] no [14:00] but that is shell command [14:00] I did't found [14:01] This is a python module. [14:01] Not a command. [14:01] Please read the documentation first, and tell me what did you understand? [14:07] kushal should we brainstorm a bit or gather some resources from g-search and then think of ways to use them ? [14:08] deepika, I asked to read a module documentation :) [14:09] It uses ctypes to make use of library libmagic to identify file types [14:09] interesting :) [14:09] kushal: okay :) [14:11] kushal, seems like I got answer to "How with statement works?" need to search about context managers [14:17] bhavin192: even I got that answer [14:17] So it uses the libmagic library to identify file types [14:17] means it will not work on Windows? [14:20] If the file has been read already it provides a buffer to identify it from a string [14:29] ashwanig, We are talking about Linux here. [14:29] Anyway, let us get to our problem. [14:29] Let us go back to an old problem. [14:30] Remember the problem where we asked to count who said how many lines from a irc log file? [14:30] Who all are still here? [14:30] me [14:30] me [14:30] me [14:30] me too [14:31] Now tell me what should be the first step for that log reading problem? [14:32] me [14:33] First we check if the log file exist or not [14:33] Decide how we are going to save the count for all nick (character, lines, word count) [14:33] first check if log file exist [14:33] Me [14:33] And if the log if provided through file then check if file exist [14:35] first check if log file exists then open fiile in try-except block [14:36] Then read it and use the split() function [14:37] if it exists then open the file [14:40] Do you want to read the whole file together or line by lines? [14:40] * line by line [14:40] line by line [14:40] line by line [14:40] kushal: line by line [14:41] Why? [14:41] same here [14:41] because the file maybe very large [14:42] reading line by line and processing is more lighter on memory [14:42] Processing the file can be easier [14:43] for easy processing [14:43] because ultimately we have to break it to lines, find nick and store result [14:44] Then file becomes iteratable with better memory management [14:46] Okay [14:46] Now, if we can find out the nick name from one line, we can find that out from every line, correct? [14:48] kushal, means? [14:48] [13:45] - git checkout -b sepia # Create a new branch -b=tells to create new branch [14:49] Yes, if that is a new nick name we can check from the existing ones or add that to the list containing all nicknames [14:49] ^^^ this is an example line from the log, if we can find out who said this line, we can repeat that for everyline. [14:49] ashwanig, ^^^ [14:49] kushal, yes [14:49] first we need to check or remove the line with no nick i.e. ----BEGIN CLASS ----- and -----END CLASS [14:51] knrai, those are exceptions, and there can be more [14:51] then we can find nics from all the lines right ? [14:51] To store the nicks I think we can use dictionary [14:53] sitlanigaurav[m], Yes. [14:53] because it will be fast to find any nickname there. [14:53] but we need to check every time right [14:54] sometimes lines might be big ,they might be so from every line maybe we won't find a nick , [14:54] by line do we mean ending with or a fullstop or before another nick spoke ? [14:54] Sorry stupid question [14:55] deepika: in logs each line ends with \n , no matter how long it is [14:56] it may be displayed in multiple lines according to display [14:56] so each line will contain nick [14:57] san-D: oh nice then , that would do :)👍 [14:57] sitlanigaurav[m], but anyway we have to process whole log file [14:57] deepika, why emojis ? :( [14:58] sorry bhavin192 ,won't repeat [14:59] :) [15:01] So, remember to break any problem into smaller problems. [15:01] Then things will become easier. [15:04] kushal: is it okay to copy paste code that we are not able to implement by ourself? [15:04] san-D, try to use modules/libraries written by others if required. [15:06] kushal: if it is a just code snippet , can it raise an copyright issue? [15:06] if we have to copy what is good practice? [15:06] san-D, Yes, it will be a copyright issue. [15:07] san-D, So, do not copy paste code from Internet. [15:30] oops [15:30] Ending the session and then going to talk about a few things. ----END CLASS----