programming – F(AWXM)

Question

By now I know pretty much all the basics and things like generators, list comps, object oriented programming, magic methods and etc. But I see people on GitHub writing extremely complicated code and stuff that just goes right over my head, and I wonder how they got so good. Most of the people just say code, code, code. I completely agree that helps in the beginning stages when you try to grasp the basics of python, it helped me a lot too. But I don’t see how you can continue to improve by only coding. Cause coding only reinforces and implements what you already know. Is just coding the projects you want to do, gonna get you up to the level that the professionals are at? How did they get so good? I kinda feel like I’ve hit a dead end and don’t even know what to do anymore. I’d like to know people’s opinion on this, and what it really takes to become a professional python developer, or even a good programmer as a whole whether it be python or not.

Response

This is a classic problem with people who self learn coding.

I’m a software engineer and Python is one of the languages I use. I’m not self taught but to get beyond where you are you need to start looking at computer science as a whole. You need to start looking into algorithms and data structures and also take a look at computational complexity (why your algorithm isn’t as fast as the other guys).

But I cannot stress how important algorithms and data structures are to breaking down that wall you’ve hit. Let’s say for example you have a sorted list of 1 million integers and you want to check if a number, lets say 1203, is in that list. You could start at the beginning of the list and work your way through the list. This is probably how you’d go about it now but this is really slow and bad. What you should do is use binary search. In computational complexity terms, the slow way runs in O(n) time while binary search runs in O(log(n)) time. Obviously the log of n is smaller than n so it must run faster. Knowing things like this is where you’ll get the edge over others.

I’ve seen questions like this being asked before and I’ve come up with a roadmap to follow to get you to a professional level, so I’ll leave it below again!Road-map

Here’s a Python road-map to take you from complete beginner to advanced with machine learning. I don’t know what area of computer science you’re interested in (AI, web dev, etc.) but I’d say do everything up to intermediate and then branch off. You’ll need everything up to AND INCLUDING intermediate to have any chance of passing a tech interview if you want to do this as a career. Hopefully, this provides some framework for you to get started on:Beginner

Data Types – Lists, Strings, Tuples, Sets, Floats, Ints, Booleans, Dictionaries
Control Flow/Looping – for loops, while loops, if/elif/else
Arithmetic and expressions
I/O (Input/Output) – Sys module, Standard input/output, reading/writing files
Functions
Exceptions and Error Handling
Basics of object oriented programming (OOP) (Simple classes).

Intermediate

Recursion
More advanced OOP – Inheritance, Polymorphism, Encapsulation, Method overloading.
Data Structures – Linked lists, Stacks, Queues, Binary Search Trees, AVL Trees, Graphs, Minimum Spanning Trees, Hash Maps
Algorithms – Linear Search, Binary Search, Hashing, Quicksort, Insertion/Selection Sort, Merge Sort, Radix Sort, Depth First Search, Breathe First Search, Prim’s Algorithm, Dijkstra’s Algorithm.
Algorithmic Complexity

Advanced – A.I. / Machine Learning/ Data science

Statistics
Probability
Brute Force search
Heuristic search (Manhattan Distance, Admissible and Informed Heuristics)
Hill Climbing
Simulated Annealing
A* search
Adversarial Search (Minimax & Alpha-Beta pruning)
Greedy Algorithms
Dynamic Programming
Genetic Algorithms
Artificial Neural Networks
Backpropagation
Natural Language Processing
Convolutional Neural Networks
Recurrent Neural Networks
Generative Adversarial Networks

Advanced – Full stack web development

Computer networks (Don’t need to go into heavy detail but an understanding is necessary)
Backend web dev tools (flask, django) (This is for app logic, interfacing with databases etc).
Front end framework (This is for communicating with the backend) (Angular 6+, React/Redux)
With frontend you’ll also need – HTML, CSS, Javascript (also good to learn typescript which is using in angular. It makes writing javascript nicer).
Relational database (MySQL, PostgreSQL)
Non-relational (MongoDB)
Cloud computing knowledge is good, (AWS, Google Cloud, Azure)

ResourcesBooks

Automate the boring stuff
Algorithms and Data structures in Python by Goldwasser (This should be the next thing you look at)
Python Programming: An Introduction to Computer Science
Slither into Python: An Introduction to the Python programming language
Fluent Python – Clear, Concise, and Effective Programming

Here’s some ones for other related and important topics:

Clean Code by Robert Martin (How to write good code)
The Pragmatic Programmer by Andrew Hunt (General software engineering / best practices)
Computer Networking: A Top-Down Approach (Networks, useful depending on the field you’re entering, anything internet based this stuff will be important)
The Linux Command Line, 2nd Edition (Install the Linux operating system and get used to using the command line, it’ll be your best friend).
Artificial Intelligence: A Modern Approach

Online courses:

I am not a fan of youtube for learning as you’re just being hand-fed code and not being given any exercises to practice with so I won’t be linking youtube video series here. In fact I’m not a fan of video courses in general but these two are good.

Udemy – Complete Python Masterclass (This is for beginners stage).
Coursera – Deep Learning Specialization by Andrew Ng (Advanced – A.I.)

Most importantly, practice, practice, practice. You won’t get anywhere just watching videos of others programming. Try dedicate an hour a day or 2 hours a day on the weekend if you can.

Source:  https://www.reddit.com/r/learnpython/comments/eim1x4/will_coding_endlessly_actually_make_you_better/

A more detailed explanation:

Avoid needless processing. This breaks down two ways
1. Feature design: Think hard before adding features that come with significant performance impacts — do you really need this feature? Is there a simpler way to do it which achieves most of your goals? Can you do it a simple way 90% of the time and only fall back to something more complex if needed? Can you skip several intermediate steps to get to the end result faster? (ex avoiding sorting a list)
2. Optimize execution by taking advantage of short-circuit evaluation and doing lazy fetching/evaluation. For conditionals, if you sometimes need to do an expensive check, but usually don’t, then see if there’s a way you can skip that check. Laziness: don’t fetch extra things from the filesystem until requested, if you often don’t need it.
  - Practical example: I optimized a routine (in Python) at work last month. We were processing text files a line at a time and removing control characters. To remove control characters we used a regex on each line (not the most efficient approach, fairly expensive). I added a quick check that iterated through the line of text and checked if any of the characters were within the control character range,and just returned the original string if not. Not as efficient as rolling a non-regex implementation, but since control characters are rare it avoids 90% of the performance cost and was much simpler & safer to implement.
Cache results of expensive operations to avoid repeating them unnecessarily. If you’re fetching info from the filesystem, cache it in memory if you are likely to reuse it (works well with lazy evaluation).
Batch it: if you’re doing a single operation often to many items, try gathering up the items to process and processing them in large groups. Often this is more efficient because it makes better use of caches (CPU/disk) and it permits you to write much tighter loops for processing. It permits reusing buffers, connections, SQL prepared statements, etc. It can improve branch prediction, permit use of SIMD instructions, etc where they would not work otherwise.
- Batching also makes it easier to fall back to something like the multiprocessing library to parallelize work.
Use software pipelining. This is kind of like batching: rewrite loops that run items through a series of steps/processes so you first do the same step to each item, then the next step. Sometimes can be evaluated much more efficiently by compilers/interpreters because it allows using SIMD instructions, avoids branch prediction misses, etc.
- May also mean using Unix/Linux pipelining as well: use a bunch of smaller utilities that pipe input from one to another. This is another application of the same principle, but has the extra advantage of being generally very efficient, and spreading work across multiple processors.
Use a lower-level language than Python to optimize the most performance-sensitive parts of the code. I.E. fall back to C bindings for intensive number crunching, crypto, etc. Optimized C can be several times faster than Python (or sometimes much more!). In general Pareto’s principle applies: 80% of your execution time comes from 20% of the code (and vice versa), so if you double the performance of the slowest 20% you can almost double your overall performance.

To Summarize

If you have a performance issue, you should try the following fixes in order (ie. try one and if that doesn’t solve it, go to the next possible fix):

Just don’t do whatever you’re trying to do. In other words, ask yourself if it’s really necessary/useful/something you might lose a client over.
Cache the results of previous calls. Maybe you can reuse them as-is, or partially.
Do a large number of calls in batch, maybe in advance or later, outside of peak operating hours. Or perhaps you need to set up a network connection to do what you’re doing, if that takes a while then don’t make a new connection for each request, bundle up a dozen and setup once.
Don’t add your stuff to an existing program, create a new and separate one that will take the output of a previous process. Decoupling, in a way.
If none of that works, only then should you look for a totally different way of doing it.

Further Explanation

There’s multiple explanations for these, which makes them deeper than they seem, but there’s a couple more parts:

Not doing work may include several ways of avoiding extra computation — lazily running expensive operations if not always needed, adding conditional checks before complex work rather than throwing exceptions, using short-circuit evaluation, or using more efficient algorithms / cutting out intermediate steps if you can get a result without them.
Yep.
Yep, but it doesn’t necessarily have to wait hours — batches of work can be handed off to other processes or utilities to process (makes better use of cores), and often you can write tighter loops that make better use of caches and reuse resources (connections, buffers, etc)
That’s Unix pipelining, and it’s good shit, but software pipelining is a more general version of the technique. Depending on your architecture one or the other may be more efficient — goes well with batching above though.
No, this is a reference to falling back on C bindings invoked from Python, and writing the really tricky bits highly optimized in a lower-level language. C can be several times as fast as Python (or more, with good use of SIMD instructions) if written efficiently.
- They didn’t do this often at Dropbox, because Python is faster to write and easier to maintain, but when they did this they got huge speedups.

Software pipelining

In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the case of hand written assembly code, by the programmer) instead of the processor. Some computer architectures have explicit support for software pipelining, notably Intel’s IA-64 architecture.

It is important to distinguish software pipelining, which is a target code technique for overlapping loop iterations, from modulo scheduling, the currently most effective known compiler technique for generating software pipelined loops.

Source:  https://www.reddit.com/r/Python/comments/eip48b/the_rules_of_perf_at_dropbox/

Tag: programming

Will coding endlessly actually make you better and better at Python?