If you’re not using automation you’re wasting your time and money

Most people don’t understand online automation. They don’t even realize that almost all their cyber-chores can be automated inexpensively. They think that Automation is too expensive and/or meant for larger enterprises. But fact is if any part of your business depends on the internet (And let’s face it, this is 2018. Everything does) chances are automation can save you a lot of time. Whether it’s automatically processing orders, keeping an eye on your competitors or just some cyber-chores.

Chainsaw approach most people take to everything rather than automation

Disclosure: I own two small businesses and also work as a freelance automation developer. Both of my businesses are highly automated and I’ve helped over 30 clients save more than a combined 100+ hours every day.

It’s hard to explain exactly what can be automated so I’ll instead give you an intuition by giving you a few examples:

Online car rental – One of my clients rented out cars via several online car rental websites. Each day he’d log into each website, browse various pages (Some websites with multiple accounts) and create an Excel spreadsheet of all the cars that have been booked, updated locations of each car etc. This took him about 1-2 hours per day. For $300, he now gets an updated spreadsheet in his Google Drive every 30 minutes with no action required by him.

Form generation – Another SMB client provided legal services. They would access data from an Excel sheet, fill it out on a PDF form then print it and mail it to a government office. He would then track the application online on their website to know the status of the application every day. Now a script automatically reads the Excel sheet, fills and prints out the form and also automatically tracks the status of every application and updates it in another Google Drive sheet.

Competitor watch – Another client had to check their competitor’s e-commerce websites regularly to keep an eye on their prices, this took them about 3-6 hours of work every week. Instead they now have a script that E-Mails them every time a price change is detected on a competitor’s website within 5 minutes of the price change happening.

This should give you an intuition for the kind of things that online automation can do for you. If you have any questions feel free to comment and I’ll try to give you as thorough an answer as possible!

Source: https://www.reddit.com/r/Entrepreneur/comments/8sqb7q/if_youre_not_using_automation_youre_wasting_your/

Most projects i work on have little to no maintenance cost. It’s usually a script that you run on your computer, you just click it on it and watch it do it’s magic.

Maintenance costs may come in one of two ways:

  1. When the script breaks – This happens sometimes, mostly in the case of web scrapers. A website may get a new design and some piece of info that the script is reading may not be where it used to be which confuses the script. In cases like these it’s usually a quick fix. I usually offer to fix it for free in case of very minor issues, otherwise a very small fee (Usually $10-30).
  2. Servers – Sometimes we have to rent servers in the cloud (So that the script can run on your server instead of your computer). Depending on how much firepower your script needs the server’s charges can be anywhere between $2 a month for a weak server to hundreds of dollars a month (For when you need a really powerful cluster of servers. This is an extremely rare case though). Most projects usually don’t need servers though and when they do, there are also some free-of-cost options like the Amazon Lambda free tier.

Kofax, Connotate and Mozenda are three service-based automation SaaS-based products that anyone, including non-technical people, can use to build and execute automation scripts to run on a schedule, on demand or as a part of an “if-this-then-that’ workflow. As a person who has built automation software as a part of my business for the last 8 years, these services are impressive, cost-effective and reliable for the average use case. Plus, they are massively parallel and include built-in IP masking as well as a simple user interface to design and maintain scrape templates.

I have no affiliation with any of these services but just passing it along as it’s something I had considered at one time or another.

An addendum as well: In probably 80% of the cases I’ve seen, Excel is not the right tool for the job for any given data needs. Spending a little on a database architect to properly come up with a plan to store your data can save you tons of time down the road. It makes automating tasks that involve your data even easier to implement. Source: I’m a database architect.

I’m by no means a database architect, but I’m a competent user of Access. I watch some of our data analysts build huge Excel spreadsheets with all sort of complexity, essentially trying to recreate database functionality in a spreadsheet. It takes forever, is prone to errors, and incredibly difficult to audit. The same task in a database takes seconds.

MS Access is my dirty little secret. I throw data into a database, analyse it, and spit out the results in minutes rather than days. No one else around me is familiar with access, and they’re blown away by how quickly I can do the number crunching and come up with a compelling story about what the data means.

So people usually use Excel for everything from storing data, munging, doing pivots, joins, and data visualizations. Often times, I see a single spreadsheet contain multi-dimensional data (e.g. they have cells A1:E40 as a ‘table’, G2:G30 as another ‘table’, etc.) and then similar data from the pull from 6 months ago will be in a separate Excel file with similar, but different storage convention.

A much better approach would be to store everything in a normalized format such that there is no redundancy. For example, if you are looking at survey data, you would ideally put all of your questions in one table, people in another table, and then a third table that joins the questions, people, and answers together. By doing this, you can easily compare, say, the same persons answer from the last 6 months instead of needing to go into two spreadsheets, figure out how both of them store the data, and then manually determine which questions they answered, etc. etc.

You don’t even need anything complicated to do this either. While you can setup a database like PostgreSQL (free), you could also use database containers like SQLite or a hybrid like Access (if the data are <2GB) or LibreBase.

That said, there are situations where Excel is great. Quick and dirty munging is where it excels (pun intended) at in particular. The import process into say, SQLite takes like 30 seconds to go through all the steps. In Excel, that process can be like, seconds. So if you’re doing a simple, add columns 1 and 2 together and that’s all, Excel definitely wins. BUT the cool part about Excel is that if you aren’t using it for data storage, you can still use it to access your data, which gives you the best of both worlds.

As far as visualization tools are concerned, it’s fine. Again, quick and dirty. PowerBI (also MS product) makes better visualizations, especially if the data are more complicated. Tableau makes better live visualizations (embedded, interactive visualizations are possible very easily). And something like Matplot (Matplotlib in Python) gives you much more fine grained control over the appearance of your visualizations. But the latter three have more of a learning curve too.

All about the right tool for the job, which sometimes Excel can be, but there are so many great technologies that are out there that do things better than Excel it’s definitely worth it to branch out!

Python can do everything

Yes. Python can do pretty much everything. Reddit is written in Python, for example. Python is also the most popular language for machine learning and data science. You can even program some of the more powerful microcontrollers with it. For automation in particular, the book Automate the Boring Stuff is highly recommended, though I haven’t read it myself. There’s also r/learnpython.

Be careful of the split between Python 2 and Python 3; not everyone was eager to update when 3 came out, so some libraries only support 2. On the other hand, some only support 3. Everyone’s gradually moving toward 3, so that’s what you should start with as a newcomer, so that you don’t have to switch to 3 after learning 2 (not that the differences are huge).

Interpreted vs. compiled and scripting vs. ‘real programming’ aren’t really meaningful distinctions anymore, if they ever were.

Screen scrapers to monitor competitors’ pricing, applications for automating tasks that were repetitive and monotonous (and therefore highly prone to user error), etc. It cannot be overstated just how much time can be saved through automating the simple little things.

For simple tasks I used to just write the scrapers in C#. For more complicated actions I used the iMacros API (I wouldn’t recommend it). These days I just use CefSharp for the complicated stuff.

I’d like to talk about a client who worked hard for their business. 10 hours a day! Their job? They would scour over 14 different mediums for trade alerts (Alerts for potentially good trades). These 14 were through different services. Some were over E-Mail, some over SMS, some on Slack chatrooms and others posted on websites.

Their day job was to filter through these alerts, find the most worthwhile ones and forward them on to their own users. On top of this they’d end up spending 1-2 hours a day just managing overhead. Subscriptions, adding them to their own E-Mail, SMS lists, verifying that the people already on the lists have paid for their subscription etc. Quite a tiresome process!The automation

One problem with the automation here was that the client needed to hand-pick the alerts that finally went out. They couldn’t provide a simple algorithm to do it, their customers were paying for 20 years of experience!

The job was simple. Build a bot that automatically: Reads E-Mails, SMS, Slack channels & constantly updating websites. Okay, maybe not so simple. An algorithm would then filter out the worst of these leads based on a few objective criteria. This’d eliminate about 70-80% of the alerts.

For the rest? They would get a notification on their mobile phone. The notification would have an ‘accept’ and ‘reject’ button right there in the notification bar. Accept it and the alert is forwarded to all their subscribers.

Their entire subscription management was also automated. A script would automatically add & remove subscribers and verify payments received.Lessons learned

Many of the times i work with clients, i end up automating 10-30% of their workload. Maybe some tool that allows them to pursue a new line of work that was previously too time consuming. Rarely do i have the opportunity to automate 80-90% of the work!

With one tool, this man got his life back. An entire workday spent monitoring different websites, E-Mail, slack and SMS changed to just going about your day and responding to mobile notifications every 5-10 minutes (Remember, most of the alerts are automatically filtered out, only a few actually go to him for review).

That’s the real power of automation. It gives you your life back. One day you’re working hard on your business. The next day you’re thinking hard about your next side hustle. Automation doesn’t just give you wings, it gives you an entire jet engine. Think hard about how much work you do for your business. Unless it requires that ‘insight’ garnered over years of experience, i could probably automate it. And even if it does, automation can take away most of the work as in this example.

Will coding endlessly actually make you better and better at Python?

Question

By now I know pretty much all the basics and things like generators, list comps, object oriented programming, magic methods and etc. But I see people on GitHub writing extremely complicated code and stuff that just goes right over my head, and I wonder how they got so good. Most of the people just say code, code, code. I completely agree that helps in the beginning stages when you try to grasp the basics of python, it helped me a lot too. But I don’t see how you can continue to improve by only coding. Cause coding only reinforces and implements what you already know. Is just coding the projects you want to do, gonna get you up to the level that the professionals are at? How did they get so good? I kinda feel like I’ve hit a dead end and don’t even know what to do anymore. I’d like to know people’s opinion on this, and what it really takes to become a professional python developer, or even a good programmer as a whole whether it be python or not.

Response

This is a classic problem with people who self learn coding.

I’m a software engineer and Python is one of the languages I use. I’m not self taught but to get beyond where you are you need to start looking at computer science as a whole. You need to start looking into algorithms and data structures and also take a look at computational complexity (why your algorithm isn’t as fast as the other guys).

But I cannot stress how important algorithms and data structures are to breaking down that wall you’ve hit. Let’s say for example you have a sorted list of 1 million integers and you want to check if a number, lets say 1203, is in that list. You could start at the beginning of the list and work your way through the list. This is probably how you’d go about it now but this is really slow and bad. What you should do is use binary search. In computational complexity terms, the slow way runs in O(n) time while binary search runs in O(log(n)) time. Obviously the log of n is smaller than n so it must run faster. Knowing things like this is where you’ll get the edge over others.

I’ve seen questions like this being asked before and I’ve come up with a roadmap to follow to get you to a professional level, so I’ll leave it below again!Road-map

Here’s a Python road-map to take you from complete beginner to advanced with machine learning. I don’t know what area of computer science you’re interested in (AI, web dev, etc.) but I’d say do everything up to intermediate and then branch off. You’ll need everything up to AND INCLUDING intermediate to have any chance of passing a tech interview if you want to do this as a career. Hopefully, this provides some framework for you to get started on:Beginner

  • Data Types – Lists, Strings, Tuples, Sets, Floats, Ints, Booleans, Dictionaries
  • Control Flow/Looping – for loops, while loops, if/elif/else
  • Arithmetic and expressions
  • I/O (Input/Output) – Sys module, Standard input/output, reading/writing files  
  • Functions
  • Exceptions and Error Handling
  • Basics of object oriented programming (OOP) (Simple classes).

Intermediate

  • Recursion
  • More advanced OOP – Inheritance, Polymorphism, Encapsulation, Method overloading.
  • Data Structures – Linked lists, Stacks, Queues, Binary Search Trees, AVL Trees, Graphs, Minimum Spanning Trees, Hash Maps
  • Algorithms – Linear Search, Binary Search, Hashing, Quicksort, Insertion/Selection Sort, Merge Sort, Radix Sort, Depth First Search, Breathe First Search, Prim’s Algorithm, Dijkstra’s Algorithm.
  • Algorithmic Complexity

Advanced – A.I. / Machine Learning/ Data science

  • Statistics
  • Probability
  • Brute Force search
  • Heuristic search (Manhattan Distance, Admissible and Informed Heuristics)
  • Hill Climbing
  • Simulated Annealing
  • A* search
  • Adversarial Search (Minimax & Alpha-Beta pruning)
  • Greedy Algorithms
  • Dynamic Programming
  • Genetic Algorithms
  • Artificial Neural Networks
  • Backpropagation
  • Natural Language Processing
  • Convolutional Neural Networks
  • Recurrent Neural Networks
  • Generative Adversarial Networks

Advanced – Full stack web development

  • Computer networks (Don’t need to go into heavy detail but an understanding is necessary)
  • Backend web dev tools (flask, django) (This is for app logic, interfacing with databases etc).
  • Front end framework (This is for communicating with the backend) (Angular 6+, React/Redux)
  • With frontend you’ll also need – HTML, CSS, Javascript (also good to learn typescript which is using in angular. It makes writing javascript nicer).
  • Relational database (MySQL, PostgreSQL)
  • Non-relational (MongoDB)
  • Cloud computing knowledge is good, (AWS, Google Cloud, Azure)

ResourcesBooks

  • Automate the boring stuff
  • Algorithms and Data structures in Python by Goldwasser (This should be the next thing you look at)
  • Python Programming: An Introduction to Computer Science
  • Slither into Python: An Introduction to the Python programming language
  • Fluent Python – Clear, Concise, and Effective Programming

Here’s some ones for other related and important topics:

  • Clean Code by Robert Martin (How to write good code)
  • The Pragmatic Programmer by Andrew Hunt (General software engineering / best practices)
  • Computer Networking: A Top-Down Approach (Networks, useful depending on the field you’re entering, anything internet based this stuff will be important)
  • The Linux Command Line, 2nd Edition (Install the Linux operating system and get used to using the command line, it’ll be your best friend).
  • Artificial Intelligence: A Modern Approach

Online courses:

I am not a fan of youtube for learning as you’re just being hand-fed code and not being given any exercises to practice with so I won’t be linking youtube video series here. In fact I’m not a fan of video courses in general but these two are good.

  • Udemy – Complete Python Masterclass (This is for beginners stage).
  • Coursera – Deep Learning Specialization by Andrew Ng (Advanced – A.I.)

Most importantly, practice, practice, practice. You won’t get anywhere just watching videos of others programming. Try dedicate an hour a day or 2 hours a day on the weekend if you can.

Source:  https://www.reddit.com/r/learnpython/comments/eim1x4/will_coding_endlessly_actually_make_you_better/ 

“The Rules of PERF” at Dropbox

Source: Dropbox

A more detailed explanation:

  1. Avoid needless processing. This breaks down two ways
    1. Feature design: Think hard before adding features that come with significant performance impacts — do you really need this feature? Is there a simpler way to do it which achieves most of your goals? Can you do it a simple way 90% of the time and only fall back to something more complex if needed? Can you skip several intermediate steps to get to the end result faster? (ex avoiding sorting a list)
    2. Optimize execution by taking advantage of short-circuit evaluation and doing lazy fetching/evaluation. For conditionals, if you sometimes need to do an expensive check, but usually don’t, then see if there’s a way you can skip that check. Laziness: don’t fetch extra things from the filesystem until requested, if you often don’t need it.
      • Practical example: I optimized a routine (in Python) at work last month. We were processing text files a line at a time and removing control characters. To remove control characters we used a regex on each line (not the most efficient approach, fairly expensive). I added a quick check that iterated through the line of text and checked if any of the characters were within the control character range,and just returned the original string if not. Not as efficient as rolling a non-regex implementation, but since control characters are rare it avoids 90% of the performance cost and was much simpler & safer to implement.
  2. Cache results of expensive operations to avoid repeating them unnecessarily. If you’re fetching info from the filesystem, cache it in memory if you are likely to reuse it (works well with lazy evaluation).
  3. Batch it: if you’re doing a single operation often to many items, try gathering up the items to process and processing them in large groups. Often this is more efficient because it makes better use of caches (CPU/disk) and it permits you to write much tighter loops for processing. It permits reusing buffers, connections, SQL prepared statements, etc. It can improve branch prediction, permit use of SIMD instructions, etc where they would not work otherwise.
    • Batching also makes it easier to fall back to something like the multiprocessing library to parallelize work.
  4. Use software pipelining. This is kind of like batching: rewrite loops that run items through a series of steps/processes so you first do the same step to each item, then the next step. Sometimes can be evaluated much more efficiently by compilers/interpreters because it allows using SIMD instructions, avoids branch prediction misses, etc.
    • May also mean using Unix/Linux pipelining as well: use a bunch of smaller utilities that pipe input from one to another. This is another application of the same principle, but has the extra advantage of being generally very efficient, and spreading work across multiple processors.
  5. Use a lower-level language than Python to optimize the most performance-sensitive parts of the code. I.E. fall back to C bindings for intensive number crunching, crypto, etc. Optimized C can be several times faster than Python (or sometimes much more!). In general Pareto’s principle applies: 80% of your execution time comes from 20% of the code (and vice versa), so if you double the performance of the slowest 20% you can almost double your overall performance.

To Summarize

If you have a performance issue, you should try the following fixes in order (ie. try one and if that doesn’t solve it, go to the next possible fix):

  1. Just don’t do whatever you’re trying to do. In other words, ask yourself if it’s really necessary/useful/something you might lose a client over.
  2. Cache the results of previous calls. Maybe you can reuse them as-is, or partially.
  3. Do a large number of calls in batch, maybe in advance or later, outside of peak operating hours. Or perhaps you need to set up a network connection to do what you’re doing, if that takes a while then don’t make a new connection for each request, bundle up a dozen and setup once.
  4. Don’t add your stuff to an existing program, create a new and separate one that will take the output of a previous process. Decoupling, in a way.
  5. If none of that works, only then should you look for a totally different way of doing it.

Further Explanation

There’s multiple explanations for these, which makes them deeper than they seem, but there’s a couple more parts:

  1. Not doing work may include several ways of avoiding extra computation — lazily running expensive operations if not always needed, adding conditional checks before complex work rather than throwing exceptions, using short-circuit evaluation, or using more efficient algorithms / cutting out intermediate steps if you can get a result without them.
  2. Yep.
  3. Yep, but it doesn’t necessarily have to wait hours — batches of work can be handed off to other processes or utilities to process (makes better use of cores), and often you can write tighter loops that make better use of caches and reuse resources (connections, buffers, etc)
  4. That’s Unix pipelining, and it’s good shit, but software pipelining is a more general version of the technique. Depending on your architecture one or the other may be more efficient — goes well with batching above though.
  5. No, this is a reference to falling back on C bindings invoked from Python, and writing the really tricky bits highly optimized in a lower-level language. C can be several times as fast as Python (or more, with good use of SIMD instructions) if written efficiently.
    • They didn’t do this often at Dropbox, because Python is faster to write and easier to maintain, but when they did this they got huge speedups.

Software pipelining

In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the case of hand written assembly code, by the programmer) instead of the processor. Some computer architectures have explicit support for software pipelining, notably Intel’s IA-64 architecture.

It is important to distinguish software pipelining, which is a target code technique for overlapping loop iterations, from modulo scheduling, the currently most effective known compiler technique for generating software pipelined loops.

Source:  https://www.reddit.com/r/Python/comments/eip48b/the_rules_of_perf_at_dropbox/