Convert a huge txt-file into a datasetHow do you deal with very large datasets in Mathematica?Dealing with a huge datasetHow can I add a column into a existing Dataset?how to create Dataset after importing txt fileHow to SemanticImport Multiple Excel SheetsHow to convert this .txt data into a list of pointsconvert from a dataset to listImport Stackoverflow data and convert it into datasetConvert Matrix into a long form DatasetWhat's the best way to import such dataset?
Make me a minimum magic sum
Would a legitimized Baratheon have the best claim for the Iron Throne?
Searching for a sentence that I only know part of it using Google's operators
Which "exotic salt" can lower water's freezing point by 70 °C?
What's the difference between "ricochet" and "bounce"?
My C Drive is full without reason
Why doesn't a particle exert force on itself?
What does “two-bit (jerk)” mean?
Select list elements based on other list
What detail can Hubble see on Mars?
My large rocket is still flipping over
Do the Zhentarim fire members for killing fellow members?
In a series of books, what happens after the coming of age?
Where do 5 or more U.S. counties meet in a single point?
Magical Modulo Squares
Why doesn't increasing the temperature of something like wood or paper set them on fire?
Is it safe to keep the GPU on 100% utilization for a very long time?
Why was Gemini VIII terminated after recovering from the OAMS thruster failure?
A♭ major 9th chord in Bach is unexpectedly dissonant/jazzy
How to get the decimal part of a number in apex
Why did Dr. Strange keep looking into the future after the snap?
Assuming a normal distribution: what is the sd for a given mean?
Why is the episode called "The Last of the Starks"?
Bash prompt takes only the first word of a hostname before the dot
Convert a huge txt-file into a dataset
How do you deal with very large datasets in Mathematica?Dealing with a huge datasetHow can I add a column into a existing Dataset?how to create Dataset after importing txt fileHow to SemanticImport Multiple Excel SheetsHow to convert this .txt data into a list of pointsconvert from a dataset to listImport Stackoverflow data and convert it into datasetConvert Matrix into a long form DatasetWhat's the best way to import such dataset?
$begingroup$
My friend has this huge txt-log of sea levels. He wants to organize it into a dataset.
After importing it this file a used StringSplit to separate it into rows, then to singular elements
rawData = Import["rawData.txt"];
splitRawData = StringSplit[rawData, "%%"];
dataIwant = splitRawData[[19]];
FullForm[dataIwant];
splitDataIntoRows = StringSplit[dataIwant, "n"];
splitData1 = StringSplit[splitDataIntoRows, " "];
I want to use this function to split the data into 6 columns.
convertListToAssociation =
list [Function]
AssociationThread["Time (kyr BP)", "Sea level (m)", "T_NH(deg C)",
"T_dw (deg C)", "delta_w", "delta_T", list]
What are further steps to be taken?
string-manipulation data dataset data-structures
$endgroup$
add a comment |
$begingroup$
My friend has this huge txt-log of sea levels. He wants to organize it into a dataset.
After importing it this file a used StringSplit to separate it into rows, then to singular elements
rawData = Import["rawData.txt"];
splitRawData = StringSplit[rawData, "%%"];
dataIwant = splitRawData[[19]];
FullForm[dataIwant];
splitDataIntoRows = StringSplit[dataIwant, "n"];
splitData1 = StringSplit[splitDataIntoRows, " "];
I want to use this function to split the data into 6 columns.
convertListToAssociation =
list [Function]
AssociationThread["Time (kyr BP)", "Sea level (m)", "T_NH(deg C)",
"T_dw (deg C)", "delta_w", "delta_T", list]
What are further steps to be taken?
string-manipulation data dataset data-structures
$endgroup$
add a comment |
$begingroup$
My friend has this huge txt-log of sea levels. He wants to organize it into a dataset.
After importing it this file a used StringSplit to separate it into rows, then to singular elements
rawData = Import["rawData.txt"];
splitRawData = StringSplit[rawData, "%%"];
dataIwant = splitRawData[[19]];
FullForm[dataIwant];
splitDataIntoRows = StringSplit[dataIwant, "n"];
splitData1 = StringSplit[splitDataIntoRows, " "];
I want to use this function to split the data into 6 columns.
convertListToAssociation =
list [Function]
AssociationThread["Time (kyr BP)", "Sea level (m)", "T_NH(deg C)",
"T_dw (deg C)", "delta_w", "delta_T", list]
What are further steps to be taken?
string-manipulation data dataset data-structures
$endgroup$
My friend has this huge txt-log of sea levels. He wants to organize it into a dataset.
After importing it this file a used StringSplit to separate it into rows, then to singular elements
rawData = Import["rawData.txt"];
splitRawData = StringSplit[rawData, "%%"];
dataIwant = splitRawData[[19]];
FullForm[dataIwant];
splitDataIntoRows = StringSplit[dataIwant, "n"];
splitData1 = StringSplit[splitDataIntoRows, " "];
I want to use this function to split the data into 6 columns.
convertListToAssociation =
list [Function]
AssociationThread["Time (kyr BP)", "Sea level (m)", "T_NH(deg C)",
"T_dw (deg C)", "delta_w", "delta_T", list]
What are further steps to be taken?
string-manipulation data dataset data-structures
string-manipulation data dataset data-structures
asked 6 hours ago
Artem AnisimovArtem Anisimov
342
342
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
You actually should work with arrays in this case as the dataset is quite large. You can import the table in one go as follows.
data = Import[
"rawData.txt",
"Table",
"HeaderLines" -> 19
];
columns = Transpose[Developer`ToPackedArray[N[data]]];
I extracted only the data columns without column titles so that they can be stored in a packed array. This should speed up considerably further working with the data.
$endgroup$
add a comment |
$begingroup$
A slightly different approach is to split the data into lines first, then split each line into fields. Since we know the data begins on line 20, we can do this
rawData = Import["rawData.txt", Path -> NotebookDirectory[]];
textLines = StringSplit[rawData, "n"];
dataIwant = ToExpression[StringSplit /@ textLines[[20 ;;]]];
We used ToExpression
to convert from text strings to numbers. Now we can put the numbers into an association. We probably want to use the first column, time, as our key, but floating point numbers are not good keys. So don't do this
poor = Association @@ (First[#] -> Rest[#] & /@ dataIwant);
poor[-39999.8]
If you get the right answer, it was just luck. A better way to treat this data is to convert the time from floating point kiloyears to integer centuries. Then we can create a better association like this
better = Association @@ (Round[10 First[#]] -> Rest[#] & /@ dataIwant);
Now our keys are exact numbers, but we still want to use kiloyears, so we write a function that converts our time in kiloyears to centuries and rounds off for us, like this
getData[kyr_] := better[Round[10 kyr]]
getData[-3999.8123]
(* 68.766, 27.806, 4.047, -1.184, 2.377 *)
Alternate versions of getData
could interpolate the data or just give specific columns.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197834%2fconvert-a-huge-txt-file-into-a-dataset%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You actually should work with arrays in this case as the dataset is quite large. You can import the table in one go as follows.
data = Import[
"rawData.txt",
"Table",
"HeaderLines" -> 19
];
columns = Transpose[Developer`ToPackedArray[N[data]]];
I extracted only the data columns without column titles so that they can be stored in a packed array. This should speed up considerably further working with the data.
$endgroup$
add a comment |
$begingroup$
You actually should work with arrays in this case as the dataset is quite large. You can import the table in one go as follows.
data = Import[
"rawData.txt",
"Table",
"HeaderLines" -> 19
];
columns = Transpose[Developer`ToPackedArray[N[data]]];
I extracted only the data columns without column titles so that they can be stored in a packed array. This should speed up considerably further working with the data.
$endgroup$
add a comment |
$begingroup$
You actually should work with arrays in this case as the dataset is quite large. You can import the table in one go as follows.
data = Import[
"rawData.txt",
"Table",
"HeaderLines" -> 19
];
columns = Transpose[Developer`ToPackedArray[N[data]]];
I extracted only the data columns without column titles so that they can be stored in a packed array. This should speed up considerably further working with the data.
$endgroup$
You actually should work with arrays in this case as the dataset is quite large. You can import the table in one go as follows.
data = Import[
"rawData.txt",
"Table",
"HeaderLines" -> 19
];
columns = Transpose[Developer`ToPackedArray[N[data]]];
I extracted only the data columns without column titles so that they can be stored in a packed array. This should speed up considerably further working with the data.
answered 5 hours ago
Henrik SchumacherHenrik Schumacher
61.8k585172
61.8k585172
add a comment |
add a comment |
$begingroup$
A slightly different approach is to split the data into lines first, then split each line into fields. Since we know the data begins on line 20, we can do this
rawData = Import["rawData.txt", Path -> NotebookDirectory[]];
textLines = StringSplit[rawData, "n"];
dataIwant = ToExpression[StringSplit /@ textLines[[20 ;;]]];
We used ToExpression
to convert from text strings to numbers. Now we can put the numbers into an association. We probably want to use the first column, time, as our key, but floating point numbers are not good keys. So don't do this
poor = Association @@ (First[#] -> Rest[#] & /@ dataIwant);
poor[-39999.8]
If you get the right answer, it was just luck. A better way to treat this data is to convert the time from floating point kiloyears to integer centuries. Then we can create a better association like this
better = Association @@ (Round[10 First[#]] -> Rest[#] & /@ dataIwant);
Now our keys are exact numbers, but we still want to use kiloyears, so we write a function that converts our time in kiloyears to centuries and rounds off for us, like this
getData[kyr_] := better[Round[10 kyr]]
getData[-3999.8123]
(* 68.766, 27.806, 4.047, -1.184, 2.377 *)
Alternate versions of getData
could interpolate the data or just give specific columns.
$endgroup$
add a comment |
$begingroup$
A slightly different approach is to split the data into lines first, then split each line into fields. Since we know the data begins on line 20, we can do this
rawData = Import["rawData.txt", Path -> NotebookDirectory[]];
textLines = StringSplit[rawData, "n"];
dataIwant = ToExpression[StringSplit /@ textLines[[20 ;;]]];
We used ToExpression
to convert from text strings to numbers. Now we can put the numbers into an association. We probably want to use the first column, time, as our key, but floating point numbers are not good keys. So don't do this
poor = Association @@ (First[#] -> Rest[#] & /@ dataIwant);
poor[-39999.8]
If you get the right answer, it was just luck. A better way to treat this data is to convert the time from floating point kiloyears to integer centuries. Then we can create a better association like this
better = Association @@ (Round[10 First[#]] -> Rest[#] & /@ dataIwant);
Now our keys are exact numbers, but we still want to use kiloyears, so we write a function that converts our time in kiloyears to centuries and rounds off for us, like this
getData[kyr_] := better[Round[10 kyr]]
getData[-3999.8123]
(* 68.766, 27.806, 4.047, -1.184, 2.377 *)
Alternate versions of getData
could interpolate the data or just give specific columns.
$endgroup$
add a comment |
$begingroup$
A slightly different approach is to split the data into lines first, then split each line into fields. Since we know the data begins on line 20, we can do this
rawData = Import["rawData.txt", Path -> NotebookDirectory[]];
textLines = StringSplit[rawData, "n"];
dataIwant = ToExpression[StringSplit /@ textLines[[20 ;;]]];
We used ToExpression
to convert from text strings to numbers. Now we can put the numbers into an association. We probably want to use the first column, time, as our key, but floating point numbers are not good keys. So don't do this
poor = Association @@ (First[#] -> Rest[#] & /@ dataIwant);
poor[-39999.8]
If you get the right answer, it was just luck. A better way to treat this data is to convert the time from floating point kiloyears to integer centuries. Then we can create a better association like this
better = Association @@ (Round[10 First[#]] -> Rest[#] & /@ dataIwant);
Now our keys are exact numbers, but we still want to use kiloyears, so we write a function that converts our time in kiloyears to centuries and rounds off for us, like this
getData[kyr_] := better[Round[10 kyr]]
getData[-3999.8123]
(* 68.766, 27.806, 4.047, -1.184, 2.377 *)
Alternate versions of getData
could interpolate the data or just give specific columns.
$endgroup$
A slightly different approach is to split the data into lines first, then split each line into fields. Since we know the data begins on line 20, we can do this
rawData = Import["rawData.txt", Path -> NotebookDirectory[]];
textLines = StringSplit[rawData, "n"];
dataIwant = ToExpression[StringSplit /@ textLines[[20 ;;]]];
We used ToExpression
to convert from text strings to numbers. Now we can put the numbers into an association. We probably want to use the first column, time, as our key, but floating point numbers are not good keys. So don't do this
poor = Association @@ (First[#] -> Rest[#] & /@ dataIwant);
poor[-39999.8]
If you get the right answer, it was just luck. A better way to treat this data is to convert the time from floating point kiloyears to integer centuries. Then we can create a better association like this
better = Association @@ (Round[10 First[#]] -> Rest[#] & /@ dataIwant);
Now our keys are exact numbers, but we still want to use kiloyears, so we write a function that converts our time in kiloyears to centuries and rounds off for us, like this
getData[kyr_] := better[Round[10 kyr]]
getData[-3999.8123]
(* 68.766, 27.806, 4.047, -1.184, 2.377 *)
Alternate versions of getData
could interpolate the data or just give specific columns.
answered 4 hours ago
LouisBLouisB
4,6991717
4,6991717
add a comment |
add a comment |
Thanks for contributing an answer to Mathematica Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f197834%2fconvert-a-huge-txt-file-into-a-dataset%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown