Skip to main content

One Path to Bankruptcy for Replit

User trying to import a module that's not installed. Instead of bumping him and telling her to use a different workspace on which the module *is* installed, use compute resources to try and install.. nuts.. Starting with : from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer import gradio as gr model = TFAutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small") tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small") def gen_text(input_string, max_length):     inputs = tokenizer(input_string, return_tensors="pt")     outputs = model.generate(**inputs, max_length=max_length)     final_text = tokenizer.batch_decode(outputs[0], skip_special_tokens=True)     return (final_text) demo = gr.Interface(                                                          fn=gen_text,     ...

Frank Andrade and Python Automation : All That Can Go Wrong

https://www.youtube.com/embed/PXMJ6FS7llk. Marry why? So I know what's doable and can then go to freelancers on Upwork to build me stuff..

  1. If you haven't already, install lxml before you get thrashed by pd.read_html(). If you're on Python 3, you need to use pip3
>>> pd.read_html( "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population")

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\pandas\io\html.py:872, in _parser_dispatch(flavor)
    870 else:
    871     if not _HAS_LXML:
--> 872         raise ImportError("lxml not found, please install it")
    873 return _valid_parsers[flavor]

ImportError: lxml not found, please install it

Didn't work! Restarted the notebook. Didn't help!

Tried 

!pip3 install lxml

in the notebook and it shows a fresh install. And? Still doesn't work. 

Note that read_html *does* work in a console instance of python (IDLE). So why does it fail in the jupyter notebook?

Time for Upwork 😊 And?

By the power of Shehroz. Don't just go by what WSL's python shows you. You might have a completely different python version in your Jupyter notebook (I did - it was 3.9 - something to do with the install that Visual Studio Code had done) and WSL had 3.8. What should you do? In Jupyter, open a terminal (unless you like typing commands with a ! prefix in the notebook) and 
pip3 uninstall pandas
pip3 uninstall lxml
pip3 install pandas
pip3 install --upgrade lxml
And now you should be good to do. Bottom line - forget what you see in the terminal from which you launch Jupyter. Use the terminal within Jupyter.

Read all tables from a web-page into a list : 

tennis = pd.read_html("https://en.wikipedia.org/wiki/List_of_Grand_Slam%E2%80%93related_tennis_records")

  1. Ensure you're not shooting yourself in the foot by running (unintentionally) Jupyter on a different host than you think you are by using the same port as an already-running instance. This can happen if you start the first instance from a Git bash terminal (mingw64) and the second one from WSL.
IMO Frank (using PyCharm BTW) could work on his quality. Literally every step can take hours based on what can go wrong..

Within 15 minutes, he into reading tables from PDFs. This guy is obviously good, but with no thought given to what can go wrong, what value is he really contributing other than a glorified table of contents?

The thing is, this video is not that old - barely over a month. He gets this to work :
tables[0].export('/tmp/poker.csv')
And I get :
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [22], in ()
----> 1 tables[0].export('/tmp/poker.csv')

AttributeError: 'Table' object has no attribute 'export'
Could it be a Mac vs PC thing? If I do :
tables.export('/tmp/poker.csv', f='csv', compress=True)  # creates poker.zip
I get a zip file that has a CSV when unzipped. 

What I take issue with is just using only the pre-canned example from the Camelot documentation page. That one works fine.

Why doesn't this one work? Make a table in Excel. Paste that into PowerPoint and now Save As PDF. Open up that PDF and you can select text. That, per Camelot documentation is good enough. Try that with Camelot and you get zilch. Why? Go figure..

I can select text but Camelot fails to extract table from PDF

For a change, Google actually proves useful and I find that you to specify the "flavor". The default read_pdf assumes the table cells are demarcated with lines. Not true? Then specify flavor='stream'

In this particular case, if you do this, you win :
tbl2 = camelot.read_pdf('/home/ananth/win/junk/camelot_test.pdf', pages='1', flavor='stream')
tbl2.export('/home/ananth/win/Downloads/camelot_test.csv', f='csv')

Which makes you wonder about the guys who built this package. What are users supposed to know about the PDF? What if you wanted to scour all tables and process them and they're not all a uniform flavor? What then?

Moving on. Next, is a quick intro to XPath to extract tags. That playground is a fun thing - whoever set that up - thanks 😊

I didn't know you needed an independent install of Chrome in WSL2 to be able to use Selenium there. Probably enough reason (conserving disk space) for MS to come up with more integration. Anyhow, here's all I had to do. BLUF - no audio, and fixing *that* needs a true geek 😊
sudo apt-get update
sudo apt-get install -y curl unzip xvfb libxi6 libgconf-2-4
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
google-chrome --version
wget https://chromedriver.storage.googleapis.com/103.0.5060.134/chromedriver_linux64.zip
rm google-chrome-stable_current_amd64.deb
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/
sudo chown root:root /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver
chromedriver --version
which chromedriver
This bit is cool - if you get to seeing "Chrome is being controlled by automated test software", Congratulations 😊

All it takes is (this is NOT headless mode, so it will throw up a Chrome window) :
from selenium import webdriver
from selenium.webdriver.chrome.service import Service # wasn't needed in SE3, but needed in SE4

website = "https://finance.yahoo.com/"
cdrv = "/usr/bin/chromedriver"

service = Service(executable_path=cdrv)
driver = webdriver.Chrome(service=service)
driver.get(website)
It would be nice to know how to gracefully shutdown the Chrome and the driver, wouldn't it?😊
OK, more surprises..
Python doesn't like :
elements = driver.find_elements(by="xpath", value='//h2[contains(@class,"Fz")]/text()')
------------------------------------------
InvalidSelectorException: Message: invalid selector: The result of the xpath expression "//h2[contains(@class,"Fz")]/text()" is: [object Text]. It should be an element.
  (Session info: chrome=103.0.5060.134)

Nice

Comments

Popular posts from this blog

Align an Embedded Image in Jupyter Markdown

Nice thing is that you don't have to depend on the image existing as a separate file that you can refer to. You can embed it like an image in an email - you get the idea. Jupyter takes care of this for you in the .ipynb file. But, by default, the image is aligned center and is default size. What if you want to set the size? If it were an external file, then you can just resort to standard HTML. But, you want a fully self contained notebook. So? In one cell, above this one, NOT markdown, but code, have an HTML magic where you specify CSS that applies to this TAG. In the cell of interest, where you insert the image after doing Edit > Insert Image, change the "alt text" inside the [] to something the CSS style can refer to and you're done So, (1) looks like : %%html <style>     img[alt=bad_pie]{         float : left;     } </style> And, the cell with the image, when in edit mode, will look like : ![bad_pie](attachment:Capture.PNG) Than...

openCV : Really Filtering by Color

The free openCV crash course : img_NZ_bgr = cv.imread('New_Zealand_Lake.jpg', cv.IMREAD_COLOR) b,g,r = cv.split(img_NZ_bgr) plt.figure(figsize=[20,5]) plt.subplot(141);plt.imshow(r, cmap='gray');plt.title("Red") plt.subplot(142);plt.imshow(b, cmap='gray');plt.title("Blue") plt.subplot(143);plt.imshow(g, cmap='gray');plt.title("Green") # merging imgMerged = cv.merge((b,g,r)) # original code : b,g,r plt.subplot(144);plt.imshow(imgMerged[...,::-1]);plt.title("Merged") Gives you : Coolie McVoolie. But, wait a minute! Are you really going to fall for that? Remember those "3D" glasses you got in magazines as a kid that let you see the page in 3D by using filters (each eye sees the picture from the required angle)? Meaning, if you're looking at the Red channel, you want to see : This! Right? How? Easy Make a blank channel (basically using NumPy zeros) Use that blank channel for the filtered channels, ...

One Path to Bankruptcy for Replit

User trying to import a module that's not installed. Instead of bumping him and telling her to use a different workspace on which the module *is* installed, use compute resources to try and install.. nuts.. Starting with : from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer import gradio as gr model = TFAutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small") tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small") def gen_text(input_string, max_length):     inputs = tokenizer(input_string, return_tensors="pt")     outputs = model.generate(**inputs, max_length=max_length)     final_text = tokenizer.batch_decode(outputs[0], skip_special_tokens=True)     return (final_text) demo = gr.Interface(                                                          fn=gen_text,     ...