Learning action point:<\/strong> Importing modules and files in Python has always been a bit of black magic to me. I need to spend some time to really understand how the import process works.<\/p>\nInitially I was just going to store the image data as blob in the database, but this is never a good idea and even this little learning\/toy project it just wouldn’t feel correct. So after a bit of googling it seemed that one option would be to use SQLAlchemy-ImageAttach. But this also turned out to be more of time sinkhole than me actually learning the bits I wanted to learn about.<\/p>\n
So instead I will just have a column in the table to keep reference to where the image lives on disk after it has been downloaded. Simples.<\/p>\n
Altering the Database<\/h2>\n I will be altering the table directly in MySQLWorkbench and just update the models.py file.<\/p>\n
<\/p>\n
<\/p>\n
Add a new column called ImagePath as VARCHAR(4096) which is the max path length on Linux.<\/p>\n
ALTER TABLE `openfaasdb`.`legosets` \nADD COLUMN `ImagePath` VARCHAR(4096) NULL AFTER `ImageURL`;\n<\/code><\/pre>\nUpdated the models.py file and ran the local test (python indev.py)<\/p>\n
class LegoSet(Base):\n __tablename__ = 'legosets'\n id = Column('pkID', Integer, primary_key=True, nullable=False)\n legoID = Column('LegoID', Integer, unique=True, nullable=False)\n description = Column('Description',String(200))\n productURL = Column('ProductURL',String(4096))\n imageURL = Column('ImageURL',String(4096))\n imagePath = Column('ImagePath',String(4096))\n<\/code><\/pre>\nGet the list of Lego sets that require an image download<\/h2>\n The scheduled function will need a list of the Lego sets that still require the image to be downloaded.<\/p>\n
The query will simply just return Lego sets that have an ImageURL set and that does not have an ImagePath set (returning max number specified by the limit parameter). For now we don’t care if the image is already in process of being downloaded or if the image doesn’t exist and thus we need to just give up after X retries.<\/p>\n
# models.py\nfrom sqlalchemy import and_\n...\ndef get_legosets_that_need_an_image_download(session, limit=10):\n legosets = session.query(LegoSet).filter(\n and_(\n LegoSet.imageURL.is_not(None),\n LegoSet.imagePath.is_(None)\n )\n ).limit(limit)\n return legosets\n<\/code><\/pre>\nTo test that this query returns the expected sets I added some dummy data.<\/p>\n
INSERT INTO `openfaasdb`.`legosets` (`LegoID`, `Description`, `ProductURL`) VALUES ('6000', 'Does not have an ImageURL', 'Blah');\nINSERT INTO `openfaasdb`.`legosets` (`LegoID`, `Description`, `ProductURL`, `ImageURL`, `ImagePath`) VALUES ('6001', 'Does have both', 'Blah', 'blah', 'blah');\nINSERT INTO `openfaasdb`.`legosets` (`LegoID`, `Description`, `ProductURL`, `ImagePath`) VALUES ('6002', 'Have path but not image url', 'Blah', 'blah');\n<\/code><\/pre>\nAdding a new API end point that the scheduler will use to download images<\/h3>\n The cron scheduler will be invoking a PUT to \/legosets-download-images. This function will get the list of Lego sets that require images and will then use a task queue to do the actual work. The response will simply be the Lego ID and the Image URL that will be processed next.<\/p>\n
… and yes I know this is not really RESTful.<\/p>\n
# handler.py\n\n# Added to the hacked router\nelif event.method == 'PUT':\n if event.path == '\/legosets-download-images':\n response = download_legoset_images()\n...\ndef download_legoset_images():\n session = create_database_session()\n legosets = get_legosets_that_need_an_image_download(session, limit=2)\n result = []\n\n for legoset in legosets:\n result.append({\n 'legoID': legoset.legoID,\n 'imageURL': legoset.imageURL\n })\n\n session.close()\n return {"sets": result}\n<\/code><\/pre>\nTested locally<\/p>\n
$ python indev.py\n\/legosets-download-images\n{'sets': [{'imageURL': 'https:\/\/live.staticflickr.com\/65535\/49698377257_be3f773feb_b.jpg',\n 'legoID': 21322},\n {'imageURL': 'https:\/\/images.brickset.com\/sets\/AdditionalImages\/21325-1\/21325-1_95850.jpg',\n 'legoID': 21325}]}\n<\/code><\/pre>\nNext actions<\/h2>\n Tomorrow I will start exploring a task queue. I have used Celery in the past but recently I learned about Redis Queue so I am keen to explore this.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"
Photo by Sangga Rima Roman Selia on Unsplash […]<\/p>\n
Read more →<\/a><\/p>\n","protected":false},"author":2,"featured_media":294,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[37],"tags":[36,23],"yoast_head":"\n100 Days of Learning: Day 15, 16 & 17 - Andr\u00e9 Jacobs<\/title>\n \n \n \n \n \n \n \n \n \n \n \n \n\t \n\t \n\t \n \n \n \n \n \n\t \n\t \n\t \n