Skip to main content

One Path to Bankruptcy for Replit

User trying to import a module that's not installed. Instead of bumping him and telling her to use a different workspace on which the module *is* installed, use compute resources to try and install.. nuts.. Starting with : from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer import gradio as gr model = TFAutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small") tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small") def gen_text(input_string, max_length):     inputs = tokenizer(input_string, return_tensors="pt")     outputs = model.generate(**inputs, max_length=max_length)     final_text = tokenizer.batch_decode(outputs[0], skip_special_tokens=True)     return (final_text) demo = gr.Interface(                                                          fn=gen_text,     ...

Ten Minutes with a World Class Data Scientist


I recently got access to a thought-leader who's in such big demand that he's literally in a different country every week. Here's how it went :

1. You've done a lot of online courses. If you had to pick just five to recommend to someone, which would they be.  The ones that are, in your opinion, the best return on investment

 i) Question formulation technique (QFT) from Right Thinking Institute in Boston - how to ask the right question..

 ii) Cambridge Advanced Leadership Programme

 iii) MIT : Inquisitive Data Science (couldn't find this. Do I have it right?)

 iv) Coursera : Philosophy of Science

 v) Google's Digital Marketing Certificate (how to use social media, etc)


2. You mentioned in the .. interview there are 500k datasets available on Americans. Could you mention a few that are worth getting comfortable analyzing through python.

 i) The US Population Census is a gold mine of data

 ii) Bureau of Labor Statistics datasets


3. Do you have any side projects that you are not able to devote enough time to that I could maybe help out with so I gain experience?

A : I am interested in the future of work - what kinds of jobs will gain in the future. How trends are shifting in the pre/post COVID landscape. How the job market has changed?


4. Do you know anyone else who is trying to ramp up to get to your level that you would recommend I connect with?

A : Abhishek Thakur is a 4x Kaggle GM - big in ML. (I obviously wasn't clear enough :)


5. What are some mistakes you made or you see others making on their learning journey that cost time and effort?

A : Biggest regret was not letting go of the stick-it-out do-not-give up mentality which caused me to stay much longer in toxic work-environments than I otherwise would have.


6. You mentioned you do still spend time doing actual data analysis. How much time do you spend on data cleaning?

A : 80% (comment : that's interesting that your clients don't take the time to clean the data ahead of time so they get more value from your time)


7. What are some utilities you wish you had?

A  : It would be good to have utilities that :

  i) check if the dataset has been fabricated - if data follows a mathematical pattern and is not truly random or is random, but the nature of the distribution does not match the typical nature of the underlying phenomenon.

  ii) perform fact checking - or cross validation of key metrics derived from the data with other available sources. For example, retrieve from the population data that the average age of the population is 80, but flag that by checking with established data sources. Basically, establish the trustworthiness of the data.

Follow up :

Regarding the dataset fabrication-flagging and cross-checking, you might be interested in Kristin Sainani's (Stanford) Webinar "How to Be a Statistical Data Detective" : https://www.youtube.com/watch?v=JG_gCIGFaQI. She mentions statcheck.io and GRIM as two tools to use to analyze published research. They operate on the publication and not the dataset. 

Comments

Popular posts from this blog

Align an Embedded Image in Jupyter Markdown

Nice thing is that you don't have to depend on the image existing as a separate file that you can refer to. You can embed it like an image in an email - you get the idea. Jupyter takes care of this for you in the .ipynb file. But, by default, the image is aligned center and is default size. What if you want to set the size? If it were an external file, then you can just resort to standard HTML. But, you want a fully self contained notebook. So? In one cell, above this one, NOT markdown, but code, have an HTML magic where you specify CSS that applies to this TAG. In the cell of interest, where you insert the image after doing Edit > Insert Image, change the "alt text" inside the [] to something the CSS style can refer to and you're done So, (1) looks like : %%html <style>     img[alt=bad_pie]{         float : left;     } </style> And, the cell with the image, when in edit mode, will look like : ![bad_pie](attachment:Capture.PNG) Than...

openCV : Really Filtering by Color

The free openCV crash course : img_NZ_bgr = cv.imread('New_Zealand_Lake.jpg', cv.IMREAD_COLOR) b,g,r = cv.split(img_NZ_bgr) plt.figure(figsize=[20,5]) plt.subplot(141);plt.imshow(r, cmap='gray');plt.title("Red") plt.subplot(142);plt.imshow(b, cmap='gray');plt.title("Blue") plt.subplot(143);plt.imshow(g, cmap='gray');plt.title("Green") # merging imgMerged = cv.merge((b,g,r)) # original code : b,g,r plt.subplot(144);plt.imshow(imgMerged[...,::-1]);plt.title("Merged") Gives you : Coolie McVoolie. But, wait a minute! Are you really going to fall for that? Remember those "3D" glasses you got in magazines as a kid that let you see the page in 3D by using filters (each eye sees the picture from the required angle)? Meaning, if you're looking at the Red channel, you want to see : This! Right? How? Easy Make a blank channel (basically using NumPy zeros) Use that blank channel for the filtered channels, ...

One Path to Bankruptcy for Replit

User trying to import a module that's not installed. Instead of bumping him and telling her to use a different workspace on which the module *is* installed, use compute resources to try and install.. nuts.. Starting with : from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer import gradio as gr model = TFAutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small") tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small") def gen_text(input_string, max_length):     inputs = tokenizer(input_string, return_tensors="pt")     outputs = model.generate(**inputs, max_length=max_length)     final_text = tokenizer.batch_decode(outputs[0], skip_special_tokens=True)     return (final_text) demo = gr.Interface(                                                          fn=gen_text,     ...