CHEMICAL BLOGSPACE HEADLINES

Monday, November 26, 2007

60 day old blog posts still important?

Last week some of you noted that the Chemical blogspace was no longer nicely updated every X hours. The code is not optimized for performance and the number of old posts (about 10k in 6 months!) made the update_posts.pl break down. Or really, the DreamHost service killed the process after taking too long. Fair; it's the online way to keep the systems from not breaking down over one website. I already had split up processing the first 75 blogs and the rest into two separate jobs.

So, now I made a more radical change, and deleted old posts older than 60 days. That's ancient history, not?

Wednesday, November 7, 2007

Blog Awards with two Cb candidates

There is a weblog 2007 award voting thingy, where two blogs participate in the Best Science Blog section which are aggregated on Chemical blogspace:

The can use some help:


You have to hurry, the deadline is the 8th.

Sunday, October 21, 2007

New Blogs #8

It has been a bit quite with new blogs items during the summer, but 15 new blogs have been added since New Blogs #7. These are the new blogs that entered Chemical blogspace in the last month:


Quite a few temporary, thematic blogs this time; that's a new evolution of blogspace.

Monday, September 24, 2007

InChIKey now added to Chemical blogspace

Using the InChI webservices as introduced by Anthony earlier, I added InChIKeys to the Chemical blogspace Molecules section:

BTW, the molecules were picked up because the (1-3)-beta-D-glucan - How moulds can make you wheeze and sneeze and What I'm up against... items linked to the glucose entry in wikipedia.

Thursday, September 20, 2007

IUPAC/InChI joins the Microsoft BioIT Alliance

On the CHMINF-L mailing list it was reported that the IUPAC InChI/InChIKey project joins Microsoft BioIT Alliance. Quoting:

    The establishment of the BioIT Alliance in April 2006 by Microsoft and leading organizations in the life science industries was very much a reflection of this scenario, and the Alliance has now been extended to include a major Scientific Union, the International Union of Pure and Applied Chemistry (IUPAC). The importance of IUPACs contribution to the enterprise lies primarily in the responsibility of this organization for establishing standards for transmitting chemical information.

I am political quite illiterate, and have no idea why the BioIT Alliance could not use InChI or InChIKey without this mashup. However, I am comforted in the knowledge that the IUPAC InChI/InChIKey project will make sure the Microsoft will not use the InChIKey as drug identifier, which would give very nice (but letal) Millenium-bug like situations when that unlike key clash was found ;)

Wednesday, September 19, 2007

10.000 chemistry blog posts!

Chemical blogspace is about to hit the 10.000 blog post since the start less than a year ago! Cheers to all bloggers!

Saturday, August 25, 2007

Clean up of RSS feeds

The number of registered RSS feeds are slightly below 150 chemistry oriented blogs; a number of them have ceased to exist, or moved to a different host, or simply changed URL. I just spent an hour or so, debugging these changes, as the Cb software does not really report that. But things are now updated again.

Friday, August 17, 2007

cb.openmolecules.net DNS problem?

In an attempt to get rdf.openmolecules.net online, it seems that the DNS system got confused; At least, the cb.openmolecules.net server is not reachable... more on this soon.

Update: OK, the DNS seems to be updated, maybe fixed itself, or some propagation problem. Anyway, now the MySQL server has gone down. So, still no running Cb :(

Thursday, August 9, 2007

RDF in Chemical blogspace

I have been playing with RDF-ing molecular space a bit, and results are getting slowly in shape:



This screenshot shows the HTML created for this RDF file on the fly by the web browser. The current version extracts information from Chemical blogspace and ChEBI (not much yet, though :).

The molecule pages in Chemical blogspace link to this RDF, and that's really what I wanted to show right now:



Why I do this? Among others, give me all boiling points for some compound with one (SPARQL) query, instead of having to browse several web resources manually. Oh, BTW, the HTML view of the RDF document uses chemical RDFa.

Tuesday, July 17, 2007

Userscripts you forget about...

Peter writes about a Greasemonky userscript pop up:

    What are the Pg and Cb all over the TOC. When you bring up the page they aren’t there! What’s happened? Well the chemical blogosphere has posted about several articles here and mentioned their DOIs. The Blue Obelisk has developed a Gresemonkey script (which is a Firefox plugin) which reads the TOC and sees if any DOIs have been mentioned in the Chemical Blogosphere. And, in this case, three articles have been.

This is the screenshot he made:

For the obligatory statistics: Cb now discusses 1184 articles. Because I have trouble accessing the Postgenomic website, I cannot give that number :(

Wednesday, July 4, 2007

The rise of Chemical blogspace

The Chem Blog wrote about WTF is up with the Science blogosphere?, discussing a podcast on the chemical blogosphere, and wondering why it is larger than that of other natural science fields and mathematics. It is surprising indeed, because chemists generally are very conservative when it comes to anything to do with a monitor, mouse and keyboard. The argument put forward in the podcast is that there are a few strong voices amongst our blogs.

I am hoping the Chemical blogspace helps a bit too here. With about 40 unique users a day it is somewhat lower than it has been before the move to the new server, which was around 60 visitors each day, but I am sure this will recover.

The number of chemistry related blogs is rather large indeed, and the Cb counter is at over 136 now. Not every blog is equally active, but both the absolute number of entries and the number of active blogs per week are continuously increasing:


Interestingly, the blog had a very nice plot of the blogosphere interconnectivity. It is good practice to link to many other blogs and resources in ones entries, to keep discussions going, provide further information etc. Like Peter I was hoping that the plot would refer the Chemical blogspace, but it does not. This interconnectivity information is available from the Cb database, and I will try to create such a plot.

Wednesday, June 27, 2007

RDFa Operator in action on Cb

I reported yesterday on my efforts (and Mike's help) to get RDFa for chemistry going. I did not have time to add the new HTML code to Cb, but have done so now:

Read my Chem-bla-ics blog for the details.

More molecules in Cb

Because the use of chemical microformats and RDFa is not yet picking up, I extended Cb to detect molecules via Wikipedia. This is paying off, even though a lot of Wikipedia entries do not list InChIs: the list is much longer now, and covers a much larger set of blogs. Thanx to all who link to Wikipedia when naming a chemical compound!

New Blogs #7

These are the new blogs that entered Chemical blogspace in the last month:

Suggestions are most welcome, and thanx to those who did in the last month.

Thursday, June 14, 2007

Cb moved to new location...

Because the CUBIC has shutdown, and my desktop machine there will be removed shortly, I moved the Chemical blogspace homepage to:


http://cb.openmolecules.net/

This is replacing the temporary hosting at SourceForge. The webpage is now hosted by the Geoff of OpenBabel.

Userscripts
If you run one of the script which adds comments from Cb to either DOIs or InChIs, you need to adapt the script you have installed and replace in all URLs wiki.cubic.uni-koeln.de/pg/ or wiki.cubic.uni-koeln.de/cb/ with cb.openmolecules.net/.

Tuesday, May 29, 2007

Uploaded the source code to SF SVN

You go on holiday (Sweden), just for two weeks (get to be only 2m apart from the Japanese emporer, no glas or whatsoever!), and not even back in your home country but being at a workschop (Bioclipse), you discover that the machine that runs the Cb MySQL database has been abducted!

Got resolved by agreeing to hand in the machine in some three weeks, and the machine is back at his old place. Making backups now. I will move the website to the SF project for the Blue Obelisk, and just moved my copy of the postgenomic.com software to the Subversion repository.

Thursday, May 10, 2007

New Blogs #6

The previous New Blogs was not the regular month ago, but will be on holiday for the next two weeks, and there have been nine new blogs anyway:

Suggestions (like these from Derek) are most welcome, and thanx to those who did in the last month.

Tuesday, May 8, 2007

Special Markup Howto #1

Because the Postgenomic.com website has been rather slow for me (Euan, what is causing that? Too many users?), and upon user request, I start a series of Special Markup articles here, which will give the most up to date description of the supported markup.

Marking up Conferences
Conferences can be markup by using the @rel="conference" attribute for the <a> element. See an earlier blog item.

Marking up Articles
While the software will pick up most literature automatically, you can mark up a paper as being reviewed. Just add the @rel="review" attribute to the <a> element linking to the journal article webpage.

Marking up Molecules
This has been described in this earlier blog item. The softeware will pickup the markup for SMILES and InChI's.

Microformats
There are currently no microformats supported, but this is anticipated. For example, hCalender and hReview are likely candidates.

Thursday, April 26, 2007

Why Cb is slow: SQL query trouble

Yesterday and today I am back at the CUBIC to meet up with my former colleagues, which gives me the opportunity to proper debugging of the Cb server instabilities. Earlier I turned on this useful: MySQL option:

log_slow_queries = /var/log/mysql/mysql-slow.log
which pointed out a serious problem:
# Time: 070427  8:30:04
# User@Host: pg[pg] @ wiki.cubic.uni-koeln.de [134.95.151.115]
# Query_time: 426 Lock_time: 0 Rows_sent: 50 Rows_examined: 4096407
SELECT t1.tag AS tag, COUNT(DISTINCT posts.blog_id) AS count FROM tags AS
t1, tags AS t2, posts WHERE t1.post_id = t2.post_id AND t1.tag != t2.tag
AND t2.tag='Visualization' AND posts.post_id = t1.post_id GROUP BY t1.tag
HAVING count >= 1 ORDER BY count DESC LIMIT 50;

The time consumed in this example, 426 seconds, is already stupidly long, but it can be even worse. Now the problem really seems to be in the number of rows examined which is slightly over 4 milion, while the tags table really only has about 35 thousand entries. The reason why it actually is slow, is that during this query it massively reads and write from the harddisk. That is, 20-30 MB a second for about the time it takes to do the query. It is obvious that that leads to server instabilities.

Next step is to understand what this query is supposed to do, and why the hell it is actually making so many entries. Euan, if you are tuned in, the blog_id and post_id columns only contains NULL, which might cause the row explosion?

Update: I tracked this query down to the functionality get_similar_tags and disabled that for now, until I get it fixed.

Monday, April 9, 2007

Broken Cb PipeLine...

Last week, someone reported me that his blog items were not showing up. He was right. There was an invalid XML file created at some point, which broke processing of the blog items using a Perl XML module. This is fixed now, and resulted in a big blog of new items today.

Tuesday, April 3, 2007

Database server instabilities...

The machine that is running the database behind the Chemical blogspace is having trouble keeping alive. This is happening since the (scheduled) power outage of this weekend. Now, since the CUBIC has shut down as organization, my postdoc contract has ended too. As a consequence, I will not have frequent access to the machine, and remote SSH access is having hickups too. So far, a (former) collegue (Miguel) is helping me out, by rebooting every now and then, but I will work on a more permanent solution.

The best solution would be to get some permanent hosting somewhere, but without a university position where the machine could run, that is not cheap. Not too expansive either, but for a hobby... In the short term, I am considering moving the database to frontend machine, which has somewhat higher load, but is more stable too.

To be continued...

Wednesday, March 28, 2007

The JACS TOC featuring your review?

Yes, that is now possible: your JACS paper review from your blog in the TOC (and most other journals):


Noel adapted a Greasemonkey script by Pedro that retrieves blog items that discuss the item from Chemical blogspace and inserts them in the TOC. If you hover over the Cb icon, the comments will pop up (Noel, the bubble is a bit to comic for my taste):


The details on this and other chemical user scripts can be found here. Oh, just to make sure for those who do not yet know how this works: the script modifies the TOC only locally. That is, only browsers that have the userscript installed will see the pimped TOC.

Sunday, March 11, 2007

New Blogs #4

Already the fourth in the series (after #1, #2 and #3), and there are ten new blogs, summing up to 96 blogs now:


Suggestions are most welcome, and thanx to those who did in the last month.

Thursday, March 8, 2007

Chemical blogspace getting physical at the ACS?

Lamentations on Chemistry is proposing to meet at the next ACS meeting in Chicago. I think this is an excellent idea, and hope to meet the other bloggers in chemical blogspace in person. Mitch replied that a similar thing was done at the previous ACS meeting.

Interestingly, the program of the meeting contains talks on blogging too:

CHED 25: "Teaching organic chemistry with blogs and wikis" Jean-Claude Bradley
... The evolution in the use of blogs and wikis in the teaching of undergraduate organic chemistry classes will be described, both as convenient tools to deliver course material and as platforms for student assignments. ...

CHED 21: "Blogging the culture of chemistry" Michelle M. Francl
... The Culture of Chemistry blog explores the relationship between chemistry, chemists and everything else. The connections between science content blogs and science culture blogs will be explored using the Culture of Chemistry as a jumping off point. ...

What about Monday or Wednesday afternoon?

Tuesday, February 20, 2007

Updated CMLRSS feed

After some questions from Jean-Claude, I worked a bit on the CMLRSS feed. It now links to a page with blog items for that specific molecule. When doing that, I also noted that it was not listing the right molecules, so fixed that too; it now again shows the molecules most recently blogged about.

Sunday, February 11, 2007

Latest blogged molecules on front page

I finally found a bit of time to further work on the molecules part of the postgenomic.com software, and replaced the 'hot tags' tab on the front page, by the eight latest blogged about molecules. Additionally, the system now allows deep linking to molecules in the database via the InChI, for example for N,N,N',N'-tetramethylnaphthalene-1,8-diamine [1]. This is what the front page now looks like:



1.InChI=1/C14H18N2/c1-15(2)12-9-5-7-11-8-6-10-13(14(11)12)16(3)4/h5-10H,1-4H3

Fewer 'Unknown titles'

Journal titles are extracted by the postgenomic.com software from CrossRef by default. However, ACS journals are not listed there, resulting in a situation that article titles are not given. Now, based on an example script from Euan, I set up a handle_acs.pl script to extract article titles differently. Over the weeks, however, more and more 'Unknown Title's showed up. Today, I added a few more ACS journal titles, and used the script to write equivalents for PubMed (which are not always picked up correctly) and BioMed Central. Most of the unknown titles are now correctly given again.

Saturday, February 10, 2007

Trouble in blogspace...

I noticed last Wednesday that not all blog items are reaching the webpage... there are found and stored in the database, but then nothings happens... But I have not had time to further explore the cause, but will look at this tomorrow again. Sorry for the inconvenience.

Update: I tracked down the problem to a RDF which was no longer a RDF feed, but now showed HTML content that the page was no longer found.

Saturday, February 3, 2007

New Blogs #3

Chemical blogspace is getting more media attention, not the least from C&EN's article Bloggers Anonymous. Here's the list of recently added blogs:


I would like to everyone aggregated in Cb to verify that I have the classification correct. Just leave corrections here as comment.

Friday, January 26, 2007

Cb gets a CMLRSS feed

My Chem-bla-ics blog contains a couple of blog items about CMLRSS, that is RSS feeds enriched with Chemical Markup Language. Some time ago, I wrote a plugin for CMLRSS for Bioclipse, as replacement for the Jmol and JChemPaint plugins shown in the CMLRSS article (DOI:10.1021/ci034244p).

Now, since I was looking at some bug reports to fix for the upcoming Bioclipse 1.0.1 release, I discovered that the plugin did not work well with Atom 1.0 feeds from blogger.com. Because Chemical blogspace uses Atom 1.0 too, I wanted to extend the feed with latest molecules with CML content; that is, to make it a nice CMLRSS feed.

Molecules in Chemical blogspace

I mentioned the InChI extension of Chemical blogspace before, and worked a bit more on it today. For example, I only today discovered that the 'pagination' was not working. And, for some days, I wanted the Molecules page to show all molecules being discussed, not only those that are known in PubChem.
Finally, I fixed the sorting, so that the molecules are now sorted based on the post date of the blog item that cites the molecule. It now looks like this (note the molecule count, and that the second InChI does not have an image):



The CMLRSS feed of Chemical blogspace

The next step was to actually include CML in this feed. The CML is created with OpenBabel from a MDL molfile downloaded from PubChem. BTW, this CML can be accessed from the Molecules page too, as can be seen in the above screenshot.

After some tweaking of the Atom feed of Chemical blogspace, and having Bioclipse work with URLConnections for web servers that do not return a Content-Length: field, this was the result:

Tuesday, January 16, 2007

Red/Yellow, FDA alerts and other properties

Yvonne Martin recently wrote about, in her experience, what works and what does not work in chemoinformatics/computational chemistry (DOI:10.1002/qsar.200610102). She based her conclusions on the 17 month usage of services provided via webpages at Abbott. In that period, almost 3 million (unique?) molecules were processed, for which the following properties were requested most, among a few others:

  • red/yellow alerts
  • FDA mutagenicity alerts
  • logP
  • Rule-of-Five
  • pKa
  • total polar surface area
  • solubility
Some of these can be calculated easily, and I tend towards adding those to the Molecules section of Chemical blogspace. The red/yellow and FDA mutagenicity alerts require me to define a list of substructures, and suggestions are most welcome.

Martin discusses the limited accuracy of the computational calculation of LogP, pKa and other properties, but concludes that they are nevertheless used in deciding which compounds to synthesize.

Organic Chemists?

Now, these properties are mostly pharma oriented, and, therefore, I would like the organic chemists on our blogspace what properties you would like to see added.

Thursday, January 11, 2007

New Blogs #2

Since New Blogs #1, I have added these new blogs: