ProsePoint Express: hosted newspaper website content management software

using node_import to import CSV archived data

18 replies [Last post]
Matthew
Matthew's picture
User offline. Last seen 28 weeks 3 days ago. Offline
Joined: 07/04/2009
Posts:

Beng, we've been talking about this some back in the day before this forum when it was just email back and forth.  But I'm still stymied.  I'm using node_import to import a CSV file full of about 1300 records of past stories from a previous web site.  After massaging the database of the original web site's data I've been able to make it cooperate with importing every field - the original date created becomes the "authored on" date for the story node in prosepoint, the title goes into the title area, the body text goes into the body text area, and even the channels are imported from old channels correctly. I've even added a user "archivist" in the old database that will import to be the source for the story in prosepoint.  Everything works just fine on test imports, except one thing.

Its the custom Date field, when the story is supposed to be published.  The original date field from the CSV file gets imported as the "Authored on:" date for the node, which I guess is one of the core node values.  The "Date" custom field in Prosepoint, the one that you can use to assign a date that the content is listed as published with a drop-down calendar, does not have accurate data.  For some reason all stories are reported as being published on <date note_import was run>, 0009.  I guess that's 6 years after the birth of Christ.  This is just not acceptable for a magazine site with 5 years worth of monthly-released stories.

The "Date" custom field in question is not even an option to be set when running node_import, so I guess its some sort of custom CCK field that is not supported yet.  However, I have waited 4 months for node_import to support my problem and I'm just out of time.  I think what I basically need is a way to take the date in the "Authored on:" field of a Story node and copy it over to the "Date" published date field CCK thingiemabob.  Right?  Any clue on how I can do that, and just do it on stories with the source named "archivist" ?

beng
User offline. Last seen 1 week 3 days ago. Offline
Joined: 27/02/2009
Posts:
It's just the Date CCK field which is causing trouble?

Hi Matthew,

The last time we discussed this, we agreed that Node Import didn't support CCK fields. Is it the case now that it does support CCK fields sufficiently, but it's just the Date CCK field which is causing trouble?

I think some custom code or module is probably the way to go. It wouldn't be more than a few paragraphs of code, I think.

I'm more worried about scalability. If you have thousands of nodes, processing all of them in one page request may hit the PHP execution time or memory limit.

And if you are going to do this in stages, you'd need some way to remember which nodes have been done.

I can write a module to do the date conversion. The question is whether I can justify spending the time on it.

Beng

Matthew
Matthew's picture
User offline. Last seen 28 weeks 3 days ago. Offline
Joined: 07/04/2009
Posts:
They have since added support

They have since added support for some CCK fields, but I don't see it extending to the Date CCK field.

Would it be possible to fix this all up at the database level rather than the pho level.  (haha, I'm going to leave that typo in because i just realized I need some pho.  Mmmm, pho.)  Anyways, would it be possible to just run an SQL script to update everything once?

beng
User offline. Last seen 1 week 3 days ago. Offline
Joined: 27/02/2009
Posts:
Ah, good.

> They have since added support for some CCK fields, but I don't see it extending to the Date CCK field.

Ah, good. That's most of the required functionality done then.

> Would it be possible to fix this all up at the database level

I'm not entirely certain, but maybe, although the query would be somewhat complex. I haven't done a complex mysql query for a long time, so I'll have to go brush up a bit.

Matthew
Matthew's picture
User offline. Last seen 28 weeks 3 days ago. Offline
Joined: 07/04/2009
Posts:
well, I'll also try to look
well, I'll also try to look into this, now that i have a few days respite from the magazine publishing grind again.
beng
User offline. Last seen 1 week 3 days ago. Offline
Joined: 27/02/2009
Posts:
Try this tentatively

Hi,

Try this query:

UPDATE `content_type_story` content_type_story SET `field_date_value` =
  (
    SELECT DATE_FORMAT(FROM_UNIXTIME(n2.created), '%Y-%m-%dT00:00:00') FROM `node` n2 where n2.vid = content_type_story.vid
  )
  WHERE content_type_story.vid in
  (
    SELECT node.vid FROM `node` node LEFT JOIN `content_field_source` content_field_source ON node.vid = content_field_source.vid WHERE node.type = 'story' AND content_field_source.field_source_nid = 9
  )

The '9' at the end should be the node id of your archivist source node.

Please, please, do this on a backup system. Preferably on a small test set first. It's only a first cut, so there might be errors in the query.

The syntax used is from phpMyAdmin. You may have to adjust the quoting for your situation (and table prefixes if you are using that).

Let me know how you go.

beng
User offline. Last seen 1 week 3 days ago. Offline
Joined: 27/02/2009
Posts:
And, this assumes the current

And, this assumes the current timezone of your server. And you might need a semicolon at the end.

beng
User offline. Last seen 1 week 3 days ago. Offline
Joined: 27/02/2009
Posts:
Any luck?

Any luck?

Matthew
Matthew's picture
User offline. Last seen 28 weeks 3 days ago. Offline
Joined: 07/04/2009
Posts:
i haven't had a chance to try
i haven't had a chance to try it out yet, but I'll let you know if it works when i do!
Matthew
Matthew's picture
User offline. Last seen 28 weeks 3 days ago. Offline
Joined: 07/04/2009
Posts:
well crap, since i upgraded

well crap, since i upgraded to Prosepoint 0.21 I can no longer set the status of the stories directly to archived during import.  It seems like I can't set any workflow-related things during import.

Also, the latest node_import or 0.21 prosepoint (not sure which) do not cooperate anymore to recognize the By.  So I'm getting stories without any author at all.  I'll have to look up the value for no author and use that in the SQL statement.  I'm guessing NULL.  or is it '' ?

Matthew
Matthew's picture
User offline. Last seen 28 weeks 3 days ago. Offline
Joined: 07/04/2009
Posts:
the query executed

the query executed successfully, but affected 0 rows with NULL.  also affected 0 rows with ''.  Affected some rows when using a real source ID.  What can i put in there to identify only stories with no source field?

Or, alternatively, the "authored by:" value is archivist.  how can I reference that?  Sorry about asking so many questions about SQL.  Normally I think I understand databases but drupal's underlying database is a labyrinthine warren of confusion and despair.