Tutorial :Dynamically change range in Python?



Question:

So say I'm using BeautifulSoup to parse pages and my code figures out that there are at least 7 pages to a query.

The pagination looks like

 1 2 3 4 5 6 7 Next  

If I paginate all the way to 7, sometimes there are more than 7 pages, so that if I am on page 7, the pagination looks like

 1 2 3    7 8 9 10 Next  

So now, I know there are at least 3 more pages. I am using an initial pass to figure out how many pages i.e. get_num_pages returns 7

What I am doing is iterating over items on each page so I have something like

for page in range(1,num_pages + 1):    # do some stuff here  

Is there a way to dynamically update the range if the script figures out there are more than 7 pages? I guess another approach is to keep a count and as I get to page 7, handle that separately. I'm looking for suggestions and solutions for the best way to approach this.


Solution:1

You could probably çreate a generator that has mutable state that determines when it terminates... but what about something simple like this?

page = 1  while page < num_pages + 1:      # do stuff that possibly updates num_pages here      page += 1  


Solution:2

Here's a code free answer, but I think it's simple if you take advantage of what beautiful soup lets you do:

To start with, on the first page you have somewhere the page numbers & links; from your question they look like this:

1 2 3 4 5 6 7 [next]  

Different sites handle paging differently, some give a link to jump to beginning/end, but on yours you say it looks like this after the first 7 pages:

1 2 3 ... 7 8 9 10 [next]  

Now, at some point, you will get to the end, it's going to look like this:

1 2 3 ... 20 21 22 23  

Notice there's no [next] link.

So forget about generators and ranges and keeping track of intermediate ranges, etc. Just do this:

  1. use beautiful soup to identify the page # links on a given page, along with the next button.
  2. Every time you see a [next] link, follow it and reparse with beautiful soup
  3. When you hit a page where there is no next link, the last # page link is the total number of pages.


Solution:3

I like John's while-based solution, but to use a for you could do something like:

pages = range(1, num_pages+1)  for p in pages:     ...possibly pages.extend(range(something, something)) here...  

that is, you have to give a name to the range you're looping on, so you can extend it when needed. Changing the container you're iterating on is normally frowned upon, but in this specific and highly-constrained case it can actually be a useful idiom.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »